Magpie's Blog: Week 11 Readings

The Hawking articles bring to light the "miracle" that is Web crawling and searching. It's really hard for me to grasp such incredibly huge numbers when we're talking about data structures containing billions of URLs and hundreds of terabytes of data (and that's in 2006!). But suffice it to say that I'm amazed by the design of crawling algorithms that manage to index what we know as the Web. Further, that these crawling machines "know" which URL's they're responsible for, and which to pass on to another machine; that they "mind their manners" with regard to waiting their turn to enter a request to a server so as not to overload it; that they are able to recognize duplicate content; and that they're able to detect and reject spamming is all rather like sci-fi to me.

I expected Bergman's article on the Deep Web to be daunting, if only because of its length, but I found it surprisingly accessible and interesting. The bottom line: there's a huge amount--like, 500 times--of higher-quality information buried beneath the "Surface Web." And 95% of that Deep Web information is available for free, if only we could get to it! I have to admit, in the early pages of this White Paper, I was thinking, "But how many of us really need such comprehensive information when we search? Isn't Google good enough?" Clearly I was one of those folks who didn't know there was better content to be found! Quality, however, is all in the eye of the beholder: As Bergman says, "If you get the results you desire, that is high quality; if you don't, there is no quality at all" (p. 23). Google, in fact, MAY be "good enough" for most users; the problem, I think, with using conventional search engines is that people don't know the right search terms to use (just like they don't know the right questions to ask during a reference interview). The Deep Web is apparently where one should go for quality information, and "when it is imperative to find a 'needle in a haystack'" (p. 25). I think librarians definitely need to know about the Deep Web--the threefold improved likelihood of obtaining quality results from the Deep Web vs. the Surface Web is convincing. The thing that worries me is that the article states, "The invisible portion of the Web will continue to grow exponentially before the tools to uncover the hidden Web are ready for general use" (12). So if the best information is in the Deep Web, then what are we "finding" from the Surface Web for patrons/users??? Yikes, people are making medical, financial, legal, etc. decisions based on Google searches????

The Open Archives Initiative (OAI) may be the next-best way to find quality information in the meantime, while we wait for the Deep Web to become accessible. Different communities (libraries, museums, and archives) have adopted the OAI protocol to provide access to their resources. (I personally found the Sheet Music Consortium to be a fascinating project!) Shreeves et al. discuss the shortcomings of OAI registries (completeness and discoverability), and improvements the group feels should be made to the registry in the future.

1 comment:

Adam BrodyNovember 22, 2010 at 10:08 AM
I am curious on where OAI is going as the information age advances. Do you think librarians will play a greater role in the actual development of OAI as the information age progresses? Is there anything that customers can demand which would enhance this as well? I think the sheet music consortium sounds interesting in terms of accessibility. Perhaps additional consortiums will come about as the technology and demand for information increases.

Adam Brody

Friday, November 19, 2010

Week 11 Readings

1 comment: