Using the Deep Web....by Steve Gruchawka

Site last revised: 11/5/2014…

The deep Web contains 99% of the information content of the Web; however, most of this information is contained in databases and is not indexed by search engines.

A complete approach to conducting research on the Web incorporates using surface search engines and deep web databases. Most users of the Internet are skilled in at least elementary use of search engines; however, skill in accessing the deep Web is limited to a much smaller population. A video made by the Office of Scientific and Technical Information describes one particular deep Web search engine developed for accessing multiple government databases; there are many others.

There are numerous books and articles on the deep Web (invisible Web or hidden Web) that do a terrific job of describing use of the deep Web for general audiences. The references cited on this website provide a good cross-section of these references. However, to the author's knowledge, there is nothing written about the deep Web that addresses the needs of the IT (information technology) professional and that is why this information was researched and presented on this website.

This website is intended as a pathfinder in locating IT information by describing some of the most useful search tools, portals and website available. The intention is not to be all inclusive but to limit selection to high quality resources. This approach is admittedly subjective and limited to the authors experience, research and input from readers. Reader input is welcome and encouraged to increase the usefulness of this ad-free, vendor-free website.

The deep Web is the fastest growing sector of the Web and it appears to be the “paradigm for the next generation Internet” (2005, Deep-Web FAQ, para. 35). It therefore is of key interest to many IT professionals. In fact, proper use of the deep Web can drastically reduce research time on a given project and yield higher quality information.

At present, the Internet is functionally divided into two areas:

·     The surface Web contains 1% of the information content of the Web. Search engines crawl along the Web to extract and index text from HTML (HyperText Markup Language) documents on websites, then make this information searchable through keywords and directories.

·     The deep Web contains 99% of the information content of the Web. Most of this information is contained in databases and is not indexed by search engines - technical and business reasons are obstacles. This information is made searchable by keywords only through the query engine located on the specific website of each database.

As the Web evolves the deep Web will become more easily accessible; however, at present to access deep Web information, one needs to go directly to the website containing the database of interest and use the website’s query engine. To do this, you need to know the URL of the deep Web site. Considering there are over 200,000 deep Web sites (Bergman, M., 2001, p. 1. para. 5), it is a challenge to know which sites to use for a given research topic. This presentation is intended to be a guide to this vast ocean from an IT perspective, with emphasis on administration of Windows networks.