Search Tools

Overview
This section will discuss various tools that help make researching more productive on both the surface Web and the deep Web.
The buttons to the right are links to various resources.

Search Engines
There are numerous search engines for the surface Web. Which one should you use? Notess, G. (2002) periodically compares search results on various search engines. His finding is surprising; there is little overlap between various search engines. For a thorough surface Web search, you need to use multiple search engines. According to Search Engine Watch (Sullivan, 2005) these are the major search engines:
                                                               

About

Useful summary articles

Ask Jeeves

High relevancy searches, owns Teoma

Gigablast

Small but useful statistical result display

Google

Large crawler and directory

LookSmart

Human-compiled, owns WiseNut

Teoma

Ask Jeeves-crawler, high relevancy

Yahoo!

Crawler and tabs for images, video, etc.


These are
derivates of the above search engines; they use the engines indicated:

AllTheWeb

Bought by Yahoo

AltaVista

Yahoo-crawler and tabs

AOL Search

Google-crawler

HotBot

Ask Jeeves-crawler or Google-crawler

Lycos

LookSmart directory, Yahoo crawler

MSN

Yahoo crawler, Microsoft crawler pending

Netscape

Google-crawler

WiseNut

LookSmart owns

Directory Browsing
Directory
browsing is another way of searching the surface Web. Directories are assembled by human beings who use editorial judgment to make their selections. To search directories, one clicks through a hierarchical set of hyperlinks. These are some of the major directories:

Metasearch Engines
Metasearch engines search several search engines simultaneously and combine the results. In theory it might seem you get broader coverage in this way. In practice, you loose precision because some metasearch engines cannot pass Boolean operators and most of the syntax does not work from the original engine (Schlein, 2004). These are popular metasearch engines:

Dogpile

Rated best

Kartoo

Visual output showing relations

Vivisimo

Rated second best, organizes results

Mamma

Crawlers, directories, specialty search sites

More metasearch engines can be found, with reviews, at the Search Engine Watch website. That website is also useful for finding various specialty search engines.

Copernic Agent
Copernic Agent is a tool the author has found useful. It comes in three versions: freeware, personal, and professional. It will search using up to 90 search engines in 10 categories, then combine results, eliminate duplicates, eliminate broken links and prioritize the output. It installs as a client on you computer and goes beyond what metasearch engines can do (Hock, 2004).

Specialized Search Engines

Specialized search engines search for databases by topic and help eliminate the “noise” associated with general search engines. In the "Case Studies" section of this presentation "Case Study #2" is an example showing how to use one of these specialty search engines. In the "Data Mining" section of this presentation many other "Specialized Search Engines" are listed that assist in finding websites with databases. Recall there are over 200,000 databases on the Web. This specialized search engines are a big help in finding databases of interest to your research.

Deep Web Search Tools

If you do nothing else with the deep Web, learn how to use the three websites described below.

CompletePlanetTM uses a query based engine to index 70,000+ deep Web databases and surface Web sites. Appendix A lists 60 of the largest deep Web databases which contain 10% of the information in the deep Web, or 40 times the content of the entire surface Web. These 60 databases are included in CompletePlanet’s indexes. CompletePlanet is sponsored by BrightPlanet®Corporation, a leader in deep Web searches. The interface is intuitive and easy to use. You can do a keyword search on all 70,000+ databases to find which databases to use for your search. You can also browse by category, and then search databases of interest.

ProFusion is a combination of query based engine and a deep Web directory portal. The directory structure is accessed by clicking on Specialized Searches. With an account, you can setup custom “My Search Groups” to search customized lists of websites and/or databases of your choice. For example, you could create a group called Technology and add all the databases and websites of interest to you. This group is saved to your profile. You could then, at any future time, search this group on a research topic with keywords. This is a great time saver. Their query based engine is called SmartDiscovery®.

SurfWax also uses a site's existing search capability as part of the meta-search process to tap the deep Web. They use proprietary algorithms to interpret the site's search criteria (Boolean, etc). With an account, you can also setup custom SearchSets to search customized lists of websites and/or databases of your choice. Surfwax also has a news accumulator feature with over 50,000 news topics in 84 categories. This news accumulator feature is a godsend providing high quality results. These are some useful news accumulator categories: all topics, networking, technology, telecommunication, and web services. In addition this site has WikiWax which takes the online encyclopedia Wikipedia to the next level. WikiWax does advanced look-aheads on Wikipedia searches to speed your keyword choices.

Finding Deep Web Resources
In addition to other methods discussed in this presentation,
Schlein (2004) shares several techniques to help the researcher find deep Web resources.

Pre-emptive search: to find deep Web databases, use a search engine or search a site containing both surface and deep Web content. For example, to find a database containing information on viruses use this search term (exact syntax may vary among search engines):

On Google or InfoMine search for:

virus (database OR repository OR archive)

Hock (2004) has this additional method specific for the Teoma search engine:

On Teoma search for:

virus (resources OR meta site OR portal OR pathfinder)

Reverse-Link Searching: Find out which pages link to a database you already find useful and see if those sites have further recommendations. To do this, use the “link” operator in the search engine. For example, Google uses “link:yourURL.” If you want to find out what sites link to NTIS, type this in the Google search bar:

link:http://www.ntis.gov

Find Experts: When you do a search with Teoma, experts and enthusiasts for your keywords are listed to the right of the results column. Go to these sites and see what resources are recommended to help you “mine” for deep Web resources.

Search by document type: Search engines are now indexing heretofore “deep” files, like PDF files. In Google, by preceding your search terms with "filetype:ext" (where “ext” is the 3 character file extension), only those files will appear in the results. These are some examples of searches done in the Google search bar:

filetype:pdf virus

returns PDF files with “virus” in the text

filetype:doc virus

returns Microsoft Word files with “virus” in the text

filetype:ppt virus

returns Microsoft PowerPoint files with “virus” in the text

filetype:jpg virus

returns jpg files with “virus” in the filename

More about Google: When you do a search, the results are not only in the window you are viewing, but also simultaneously in the associated windows under the topics listed at the top of the search page, namely, Web, Images, Groups, News, Froogle, Local, etc. For example, if you search for the word “virus,” under Web are the websites found for virus, under Images are the graphics found for virus, under Groups are the discussion groups on virus, etc. - all of this is available without you doing anything extra on your part other than click each topic link in succession.

Calishain (2005) gives these tips on Boolean modifiers using Google:

AND

is the default when using several keywords and is not needed

+

preceding a keyword means it must be included in results

-

preceding a keyword means it must not be included in results

|

between words means OR

~

preceding a keyword means search that keyword and its synonyms


In addition to Boolean modifiers, you can go to a search engine's "Advanced Search" features. Each search engine has advanced features described in their help section. The advanced search in Google, for instance, allows you to specify a date range, the file format, where keywords occur in results, language limitations, content filtering, topic specific searchers (government sites, university sites, Microsoft sites, Linux sites, etc.), etc. Google also has a built-in dictionary. If you type "define modifier" in the search bar. It will give you the dictionary definition of "modifier."

Go to the Google help section, for many more features.

EndNote
Endnote is the standard tool used by millions of researchers for collecting, organizing and formatting references. One great feature is you can search deep Web databases from within Endnote and instantly import annotated references of your choice – a great time saver.  

RSS Feeds
To get current news from your favorite websites delivered automatically to your desktop, setup an RSS feed (Rich Site Summary, RDF Site Summary or Really Simple Syndication). To get started, you can choose to use the highly-rated
Pluck RSS reader. It comes in a Web version and a client version. The Web version can be accessed from any computer but is much slower. The client version installs in your web browser and is fast. You can click the "Find Feed" button to select or "pluck" which feeds to use from a large directory.

Numerous other RSS readers are available. Wikipedia lists the websites of 130 readers and Earthweb provides the % market share of each of the most popular readers.

If you want to search for feeds rather than getting them from your favorite known websites or from the "Find Feed" button in Pluck, try these. Chordata allows you to drill-down a hierarchical directory structure to find quality-rated feeds.  Feedster is a search engine for locating feeds by keywords.

LISTSERV
LISTSERVis software by L-Soft for managing electronic mailing lists and discussion groups. Use the hyperlink shown to search for mailing lists by keywords. There are numerous IT mailing lists among the ~62,000 lists in this database.

Newsgroups
There is a newsreader built into Outlook Express (not on Outlook 2003).The first step is you MUST make Outlook EXPRESS the default news program thru Internet Options – otherwise the News menu item will vanish from the Outlook 2003 Go menu (KB902929).

1) Open control panel Internet Options >Programs tab
and set Outlook Express as the default for Newsgroups.
2) Drag the News command to the Go menu of Outlook 2003 using
Tool >Customize >Commands.
3) In Outlook Express open menu item
Tools >Accounts >News tab >ADD >News
and enter your news (NTTP) server information obtained from your ISP.
4) It will then take a minute or so to find all the newsgroups.
5)Pick the newsgroups you want to join.

Primary Research and Reference Librarians
Answers to every question will not always be found online. Some questions can only be answered by talking with the right people (primary research). Sacks, R. (2001) interviews 12 experts in his book that are skilled in primary research and reveals their secrets. In this book, you can learn from these experts how to find the right people, how to conduct interviews, and how each method of research has its relative value depending on the question.

Reference librarians at your local library or college can be of tremendous help. They are skilled in accessing information from a wide variety of resources including the deep Web.