returned 100 hits, of which 80 were in fact pictures of dogs, and the remaining 20 were
pictures of cats, horses, and deer, we would say that the system identified dog pictures
with a precision of 80%. This would be analogous to a filter that overblocked at a rate of
20%.
The recall measure involves determining what proportion of the actual members of
a category the classification system has been able to identify. For example, if the
hypothetical animal picture database contained a total of 200 pictures of dogs, and the
system identified 80 of them and failed to identify 120, it would have performed with a
recall of 40%. This would be analogous to a filter that underblocked 60% of the material
in a category.
In automated classification systems, there is always a tradeoff between precision
and recall. In the animal picture example, the recall could be improved by using a looser
set of criteria to identify the dog pictures in the set, such as any animal with four legs,
and all the dogs would be identified, but cats and other animals would also be included,
with a resulting loss of precision. The same tradeoff exists between rates of overblocking
and underblocking in filtering systems that use automated classification systems. For
example, an automated system that classifies any Web page that contains the word sex
as sexually explicit will underblock much less, but overblock much more, than a system
that classifies any Web page containing the phrase free pictures of people having sex as
sexually explicit.
This tradeoff between overblocking and underblocking also applies not just to
69
Untitled Document
|
|
TotalRoute.net Business web hosting division of Vision Web Hosting Inc. All rights reserved. |