plaintiffs offered, and the Court qualified, Nunberg as an expert witness on automated
classification systems.
7
When compiling and categorizing URLs for their category lists, filtering software
companies go through two distinct phases.  First, they must collect or  harvest  the
relevant URLs from the vast number of sites that exist on the Web.  Second, they must
sort through the URLs they have collected to determine under which of the company's
self defined categories (if any), they should be classified.  These tasks necessarily result
in a tradeoff between overblocking (i.e., the blocking of content that does not meet the
category definitions established by CIPA or by the filtering software companies), and
underblocking (i.e., leaving off of a control list a URL that contains content that would
meet the category definitions defined by CIPA or the filtering software companies).  
a.  The  Harvesting  Phase
Filtering software companies, given their limited resources, do not attempt to
index or classify all of the billions of pages that exist on the Web.  Instead, the set of
pages that they attempt to examine and classify is restricted to a small portion of the Web. 
The companies use a variety of automated and manual methods to identify a universe of
7
  Geoffrey Nunberg (Ph.D., Linguistics, C.U.N.Y. 1977) is a researcher at the Center
for the Study of Language and Information at Stanford University and a Consulting Full
Professor of Linguistics at Stanford University.  Until 2001, he was also a principal
scientist at the Xerox Palo Alto Research Center.  His research centers on automated
classification systems, with a focus on classifying documents on the Web with respect to
their linguistic properties.  He has published his research in numerous professional
journals, including peer reviewed journals.
57




Untitled Document




TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved.