Internet History

The Internet and the Web are not synonymous - the Internet was born in 1970 while the Web began in 1990. The Web is one of many interfaces to the Internet. Some other interfaces are e-mail, FTP (File Transfer Protocol), telnet, newsgroups, file sharing, and databases. The Web is the graphical interface that has spurred the tremendous growth of the Internet. A very detail timeline of Internet history can be viewed at Hobbes' Internet Timeline. The Internet Society has a PowerPoint presentation of Internet history showing photos and short biographies of the inspired thinkers that helped create the Internet. Below a quick walk down memory lane for the Internet will help put Internet searching in perspective – dates in orange indicate key transitions periods (Hock, 2004, pp. 3-6; LivingInternet, 2005, Internet History; Sherman & Price, 2001, pp. 2-13):

·   1844                 The first telegraphic message was sent from near Baltimore to Washington - a distance of ~40 miles (About, n.d.).

·   1861                 Western Union built its first transcontinental telegraph line in1861 (About, n.d.).

·   1895                 Henry Lafontaine and Paul Otlet began development of the Universal Decimal Classification which wanted to go one step beyond the Dewey Decimal System which guides readers to a book but no further. The next step was to “penetrate the boundaries of the books themselves, to unearth the “substance, sources and conclusions” inside.” Hence, the first “search engine” (Wright, 2003, para.13).

·   1957                 The Sputnik satellite was launched by the Russians.         

·   1958                 As a result of Sputnik, Americans felt we were loosing the space race and created ARPA (Advanced Research Projects Agency) to catch up and exceed the Russians.

·   1962                 J.C.R. Licklider wrote paper envisioning a global connection of computers.

·   1966                 Inspired by Licklider, Larry Roberts submitted a proposal to link computers.

·   Pre-1969         Computers were stand alone machines or terminals on a mainframe

·   1969                 Larry Robert’s proposal lead to installation of the first node of the new computer network at UCLA founding ARPAnet (ARPA Network of the U.S. Department of Defense).

·   1970s               Universities and defense contractors began connecting to ARPAnet.

·   1971                 Fifteen Universities were now connected to ARPAnet.

·   1972-74           Commercial information databases like Dialog and Lexis went online with their dial-up services.

·   1973                 DARPA (Defense Advanced Research Projects Agency) was initiated to communicate across linked networks. ARPAnet was just one network, whereas DARPA was a network of networks.

·   1979                 CSnet (Computer Science Network) was created – funded by the NSF (U.S. National Science Foundation) - to link universities not a part of ARPAnet.

·   1983                 TCP/IP (Transmission Control Protocol / Internet Protocol) replaced NCP (Network Control Program) on ARPAnet.

·   1984                 NSF started construction of five regional supercomputing centers.

·   1986                 LISTSERV mailing list management software was written by Eric Thomas who later founded the L-Soft company in 1994.

·   Pre-1990         Accessing a file required a Telnet connection to a known location, then FTP to fetch the file.

·   1990                 Tim Berners-Lee, a contract programmer at the European Organization for Nuclear Research (CERN) high-energy physics laboratory in Geneva, Switzerland, created the tools that became the Web – a web client he called WorldWideWeb, HTML and URLs (Universal Resource Identifiers).

·   1990                 ARPAnet was retired and absorbed into NSFnet. NSFnet was soon connected to CSnet, and then to EUnet (European Network), which connected research facilities in Europe.

·   1990                 Archie was created – the first true search tool for files stored on FTP servers on the Internet.

·   1991                 Gopher was created – the first browsable directory of files on the Internet.

·   1991                 WAIS was created – a client on your computer that allowed you to search the Internet.

·   1992                 Veronica was created – a centralized Archie-like search tool for Gopher files.

·   1993                 Legislation was passed allowing commercial access to NSFnet. In the period 1993-2000 the Clinton-Gore administration championed use of the Internet, creating E-rate ($6 billion) to fund Internet access for public schools and libraries, creating the 21st Century Research Fund ($45 billion) to fund civilian scientific research, persuading the WTO (World Trade Organization to allow duty-free Internet use, and overhauling the telecommunications act of 1934 to allow competition (Encyclopædia Britannica, 2005;  State Science & Technology Institute, 2002 ; US Newswire 2000).

·   1993                 Jughead was created – adding keyword search and Boolean operator capabilities to Gopher search.

·   1993                 The Mosaic web browser was released by Marc Andreessen and Eric Bina.

·   1994                 The Netscape web browser was released.

·   1994                 Web traffic on the Internet exceeded Telnet traffic for the first time.

·   Pre-1994         People informed each other through e-mails about “cool sites” they found. There was no way to search directly.

·   1994                 The first Web search engine, WebCrawler, was created by Brian Pinkerton. It was a software robot that collected the full text of web pages and stored them in a database that could be searched using keywords. As other robots were developed, they became known as “crawlers” or “spiders” searching the Web for websites.

·   1994                 In addition to WebCrawler, EINst Galaxy, Lycos and Yahoo! search engines were created.

·   1995                 Alta Vista, Excite, and InfoSEEK search engines were created. MetaCrawler, and SavvySearch metasearch engines were created – metasearch engines search several search engines simultaneously.

·   1995                 The number of web packets exceeded the number of FTP packets over NSFnet.

·   1995                 The Internet Explorer web browser was released by Microsoft.

·   1995                 The U.S. National Science Foundation transferred funds and control of the Internet backbone to the private sector. This event and advent of web browsers fueled the explosion of the late 1990s.

·   1996                 HotBot and LookSmart search engines were created.

·   1997                 NorthernLight search engine was created.

·   1998                 Google search engine and were created.            

It is interesting that Tim Berners-Lee’s friends at CERN gave him a difficult time saying his WorldWideWeb idea would never take off (Sherman & Price, 2001, p 11). For Berners-Lee the easy part was programming the tools, the hard part was convincing others to use the system. His tireless communication efforts to persuade others paid off, however, and the Web grew at an enormous rate as shown in Figure 1 (Internet Systems Consortium, 2005).

Search engines have been with us since 1994, and are a great improvement over their predecessors; however, until recently they suffered from the limitation of only finding and indexing web documents. One needs to use other methods to access a large share of the deep Web. Surface Web search engines have recently evolved so they are able to index PDF files and some of the dynamically generated content of the deep Web as well; however, as yet they only search a tiny portion of the deep Web.

One technical obstacle is the "spider trap." Through inadvertent or malicious programming, some query engines can capture spiders in endless loops wasting the resources of the search engine. Most search engines intentionally avoid query engines for this reason. A business obstacle is the time and money required to crawl the web. Even crawling "just" surface websites is not done to the full depth of larger websites. Each search engine makes a business decision on the depth it is willing to crawl each surface website to conserve time and money.