The Internet and the Web are not synonymous - the Internet was born in 1970 while the Web began in 1990. The Web is one of many interfaces to the Internet. Some other interfaces are e-mail, FTP (File Transfer Protocol), telnet, newsgroups, file sharing, and databases. The Web is the graphical interface that has spurred the tremendous growth of the Internet. A very detail timeline of Internet history can be viewed at Hobbes' Internet Timeline. The Internet Society has a PowerPoint presentation of Internet history showing photos and short biographies of the inspired thinkers that helped create the Internet. Below a quick walk down memory lane for the Internet will help put Internet searching in perspective dates in orange indicate key transitions periods (Hock, 2004, pp. 3-6; LivingInternet, 2005, Internet History; Sherman & Price, 2001, pp. 2-13):
· 1844 The first telegraphic message was sent from near Baltimore to Washington - a distance of ~40 miles (About, n.d.).
· 1861 Western Union built its first transcontinental telegraph line in1861 (About, n.d.).
· 1895 Henry Lafontaine and Paul Otlet began development of the Universal Decimal Classification which wanted to go one step beyond the Dewey Decimal System which guides readers to a book but no further. The next step was to penetrate the boundaries of the books themselves, to unearth the substance, sources and conclusions inside. Hence, the first search engine (Wright, 2003, para.13).
· 1957 The Sputnik satellite was launched by the Russians.
· 1958 As a result of Sputnik, Americans felt we were loosing the space race and created ARPA (Advanced Research Projects Agency) to catch up and exceed the Russians.
· 1962 J.C.R. Licklider wrote paper envisioning a global connection of computers.
· 1966 Inspired by Licklider, Larry Roberts submitted a proposal to link computers.
· Pre-1969 Computers were stand alone machines or terminals on a mainframe
· 1969 Larry Roberts proposal lead to installation of the first node of the new computer network at UCLA founding ARPAnet (ARPA Network of the U.S. Department of Defense).
· 1970s Universities and defense contractors began connecting to ARPAnet.
· 1971 Fifteen Universities were now connected to ARPAnet.
· 1972-74 Commercial information databases like Dialog and Lexis went online with their dial-up services.
· 1973 DARPA (Defense Advanced Research Projects Agency) was initiated to communicate across linked networks. ARPAnet was just one network, whereas DARPA was a network of networks.
· 1979 CSnet (Computer Science Network) was created funded by the NSF (U.S. National Science Foundation) - to link universities not a part of ARPAnet.
· 1983 TCP/IP (Transmission Control Protocol / Internet Protocol) replaced NCP (Network Control Program) on ARPAnet.
· 1984 NSF started construction of five regional supercomputing centers.
· 1986 LISTSERV mailing list management software was written by Eric Thomas who later founded the L-Soft company in 1994.
· Pre-1990 Accessing a file required a Telnet connection to a known location, then FTP to fetch the file.
· 1990 Tim Berners-Lee, a contract programmer at the European Organization for Nuclear Research (CERN) high-energy physics laboratory in Geneva, Switzerland, created the tools that became the Web a web client he called WorldWideWeb, HTML and URLs (Universal Resource Identifiers).
· 1990 ARPAnet was retired and absorbed into NSFnet. NSFnet was soon connected to CSnet, and then to EUnet (European Network), which connected research facilities in Europe.
· 1990 Archie was created the first true search tool for files stored on FTP servers on the Internet.
· 1991 Gopher was created the first browsable directory of files on the Internet.
· 1991 WAIS was created a client on your computer that allowed you to search the Internet.
· 1992 Veronica was created a centralized Archie-like search tool for Gopher files.
· 1993 Legislation was passed allowing commercial access to NSFnet. In the period 1993-2000 the Clinton-Gore administration championed use of the Internet, creating E-rate ($6 billion) to fund Internet access for public schools and libraries, creating the 21st Century Research Fund ($45 billion) to fund civilian scientific research, persuading the WTO (World Trade Organization to allow duty-free Internet use, and overhauling the telecommunications act of 1934 to allow competition (Encyclopædia Britannica, 2005; State Science & Technology Institute, 2002 ; US Newswire 2000).
· 1993 Jughead was created adding keyword search and Boolean operator capabilities to Gopher search.
· 1993 The Mosaic web browser was released by Marc Andreessen and Eric Bina.
· 1994 The Netscape web browser was released.
· 1994 Web traffic on the Internet exceeded Telnet traffic for the first time.
· Pre-1994 People informed each other through e-mails about cool sites they found. There was no way to search directly.
· 1994 The first Web search engine, WebCrawler, was created by Brian Pinkerton. It was a software robot that collected the full text of web pages and stored them in a database that could be searched using keywords. As other robots were developed, they became known as crawlers or spiders searching the Web for websites.
· 1994 In addition to WebCrawler, EINst Galaxy, Lycos and Yahoo! search engines were created.
· 1995 Alta Vista, Excite, and InfoSEEK search engines were created. MetaCrawler, and SavvySearch metasearch engines were created metasearch engines search several search engines simultaneously.
· 1995 The number of web packets exceeded the number of FTP packets over NSFnet.
· 1995 The Internet Explorer web browser was released by Microsoft.
· 1995 The U.S. National Science Foundation transferred funds and control of the Internet backbone to the private sector. This event and advent of web browsers fueled the dot.com explosion of the late 1990s.
· 1996 HotBot and LookSmart search engines were created.
· 1997 NorthernLight search engine was created.
· 1998 Google search engine and InvisibleWeb.com were created.
It is interesting that Tim Berners-Lees friends at CERN gave him a difficult time saying his WorldWideWeb idea would never take off (Sherman & Price, 2001, p 11). For Berners-Lee the easy part was programming the tools, the hard part was convincing others to use the system. His tireless communication efforts to persuade others paid off, however, and the Web grew at an enormous rate as shown in Figure 1 (Internet Systems Consortium, 2005).
Search engines have been with us since 1994, and are a great improvement over their predecessors; however, until recently they suffered from the limitation of only finding and indexing web documents. One needs to use other methods to access a large share of the deep Web. Surface Web search engines have recently evolved so they are able to index PDF files and some of the dynamically generated content of the deep Web as well; however, as yet they only search a tiny portion of the deep Web.
One technical obstacle is the "spider trap." Through inadvertent or malicious programming, some query engines can capture spiders in endless loops wasting the resources of the search engine. Most search engines intentionally avoid query engines for this reason. A business obstacle is the time and money required to crawl the web. Even crawling "just" surface websites is not done to the full depth of larger websites. Each search engine makes a business decision on the depth it is willing to crawl each surface website to conserve time and money.