and directories combined. No currently available method or combination of methods for
collecting URLs can collect the addresses of all URLs on the Web.
The portion of the Web that is not theoretically indexable through the use of
spidering technology, because other Web pages do not link to it, is called the Deep
Web. Such sites or pages can still be made publicly accessible without being made
publicly indexable by, for example, using individual or mass emailings (also known as
spam ) to distribute the URL to potential readers or customers, or by using types of Web
links that cannot be found by spiders but can be seen and used by readers. Spamming
is a common method of distributing to potential customers links to sexually explicit
content that is not indexable.
Because the Web is decentralized, it is impossible to say exactly how large it is. A
2000 study estimated a total of 7.1 million unique Web sites, which at the Web's
historical rate of growth, would have increased to 11 million unique sites as of September
2001. Estimates of the total number of Web pages vary, but a figure of 2 billion is a
reasonable estimate of the number of Web pages that can be reached, in theory, by
standard search engines. We need not make a specific finding as to a figure, for by any
measure the Web is extremely vast, and it is constantly growing. The indexable Web is
growing at a rate of approximately 1.5 million pages per day. The size of the un
indexable Web, or the Deep Web, while impossible to determine precisely, is estimated
to be two to ten times that of the publicly indexable Web.
In addition to growing rapidly, Web pages and sites are constantly being removed,
32
Untitled Document
|
|
TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved. |