Web sites and pages to harvest for classification. These methods include: entering
certain key words into search engines; following links from a variety of online directories
(e.g., generalized directories like Yahoo or various specialized directories, such as those
that provide links to sexually explicit content); reviewing lists of newly registered domain
names; buying or licensing lists of URLs from third parties; mining access logs
maintained by their customers; and reviewing other submissions from customers and the
public. The goal of each of these methods is to identify as many URLs as possible that
are likely to contain content that falls within the filtering companies' category definitions.
The first method, entering certain keywords into commercial search engines,
suffers from several limitations. First, the Web pages that may be harvested through
this method are limited to those pages that search engines have already identified.
However, as noted above, a substantial portion of the Web is not even theoretically
indexable (because it is not linked to by any previously known page), and only
approximately 50% of the pages that are theoretically indexable have actually been
indexed by search engines. We are satisfied that the remainder of the indexable Web,
and the vast Deep Web, which cannot currently be indexed, includes materials that
meet CIPA's categories of visual depictions that are obscene, child pornography, and
harmful to minors. These portions of the Web cannot presently be harvested through the
methods that filtering software companies use (except through reporting by customers or
by observing users' log files), because they are not linked to other known pages. A user
can, however, gain access to a Web site in the unindexed Web or the Deep Web if the
58
Untitled Document
|
|
TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved. |