and directories combined.  No currently available method or combination of methods for
collecting URLs can collect the addresses of all URLs on the Web.
The portion of the Web that is not theoretically indexable through the use of
 spidering  technology, because other Web pages do not link to it, is called the  Deep
Web.   Such sites or pages can still be made publicly accessible without being made
publicly indexable by, for example, using individual or mass emailings (also known as
 spam ) to distribute the URL to potential readers or customers, or by using types of Web
links that cannot be found by spiders but can be seen and used by readers.   Spamming 
is a common method of distributing to potential customers links to sexually explicit
content that is not indexable.
Because the Web is decentralized, it is impossible to say exactly how large it is.  A
2000 study estimated a total of 7.1 million unique Web sites, which at the Web's
historical rate of growth, would have increased to 11 million unique sites as of September
2001.  Estimates of the total number of Web pages vary, but a figure of 2 billion is a
reasonable estimate of the number of Web pages that can be reached, in theory, by
standard search engines.  We need not make a specific finding as to a figure, for by any
measure the Web is extremely vast, and it is constantly growing.  The indexable Web is
growing at a rate of approximately 1.5 million pages per day.  The size of the un 
indexable Web, or the  Deep Web,  while impossible to determine precisely, is estimated
to be two to ten times that of the publicly indexable Web.  
In addition to growing rapidly, Web pages and sites are constantly being removed,
32




Untitled Document




TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved.