process may be characterized as winnowing the harvested URLs. Automated systems
currently used by filtering software vendors to prioritize, and to categorize or tentatively
categorize the content and/or features of a Web site or page accessed via a particular URL
operate by means of (1) simple key word searching, and (2) the use of statistical
algorithms that rely on the frequency and structure of various linguistic features in a Web
page's text. The automated systems used to categorize pages do not include image
recognition technology. All of the filtering companies deposed in the case also employ
human review of some or all collected Web pages at some point during the process of
categorizing Web pages. As with the harvesting process, each technique employed in the
winnowing process is subject to limitations that can result in both overblocking and
underblocking.
First, simple key word based filters are subject to the obvious limitation that no
string of words can identify all sites that contain sexually explicit content, and most
strings of words are likely to appear in Web sites that are not properly classified as
containing sexually explicit content. As noted above, filtering software companies also
use more sophisticated automated classification systems for the statistical classification of
texts. These systems assign weights to words or other textual features and use algorithms
to determine whether a text belongs to a certain category. These algorithms sometimes
make reference to the position of a word within a text or its relative proximity to other
words. The weights are usually determined by machine learning methods (often
described as artificial intelligence ). In this procedure, which resembles an automated
61
Untitled Document
|
|
TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved. |