URLs by using  spidering  software that can  crawl  the lists of pages produced by the
previous four methods, following their links downward to bring back the pages to which
they link (and the pages to which those pages link, and so on, but usually down only a
few levels).  This spidering software uses the same type of technology that commercial
Web search engines use.  
While useful in expanding the number of relevant URLs, the ability to retrieve
additional pages through this approach is limited by the architectural feature of the Web
that page to page links tend to converge rather than diverge.  That means that the more
pages from which one spiders downward through links, the smaller the proportion of new
sites one will uncover; if spidering the links of 1000 sites retrieved through a search
engine or Web directory turns up 500 additional distinct adult sites, spidering an
additional 1000 sites may turn up, for example, only 250 additional distinct sites, and the
proportion of new sites uncovered will continue to diminish as more pages are spidered.  
These limitations on the technology used to harvest a set of URLs for review will
necessarily lead to substantial underblocking of material with respect to both the category
definitions employed by filtering software companies and CIPA's definitions of visual
depictions that are obscene, child pornography, or harmful to minors.  
b.  The  Winnowing  or Categorization Phase
Once the URLs have been harvested, some filtering software companies use
automated key word analysis tools to evaluate the content and/or features of Web sites or
pages accessed via a particular URL and to tentatively prioritize or categorize them.  This
60




Untitled Document




TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved.