Search Engines | #explore

16. September 2021

When it came to finding hotels, holidays or old school friends, looking up new cookery recipes or finding quick solutions to computer problems, in pre-digital times we had to make a pilgrimage to the travel agency or phone around half of our circle of friends. Nowadays, quick answers are readily available on the Internet. The average person types three to four search queries daily into the input fields of Google, Bing and the like. In our short history of the search engine, we relate how the search services came into being and what evolutionary stages they have gone through.

“Knowledge is power”: This saying has been attributed to English philosopher Francis Bacon. What the founding father of the Enlightenment meant by this was that, until people have sufficient knowledge, it is impossible for them to really understand and make sense of the things of this world. However, the great treasures of human knowledge will do you precious little good if you don’t know how and where to find them. This is why the development of methods of categorisation is as old as the practice of collecting books in libraries. No-one would claim that analogue indexing systems and physical catalogues could even begin to exhaust all the texts, books and questions in the world. It is therefore no wonder that the idea of a database for all the collected knowledge in the world is almost as old as humanity itself.

In 1945, American engineer Vannevar Bush published an article in which he outlined the concept for a universal knowledge machine. The idea behind the “Memex” – or “Memory Extender” – was the storage of huge amounts of text in a small piece of office furniture and the mechanisation and massive acceleration of the search for information. In his article, Bush laid the theoretical foundations for today’s search engines. In practical terms his idea is limited because it depends on analogue microfilm which the viewer has to painstakingly comb through for terms and key words.

The theoretical basis for today's search engines from 1945, "Memex", was based on analogue microfilm.

From A for Archie to V for Veronica

In the 1980s, knowledge became increasingly digitalised. Ever greater numbers of American universities were connecting their computers – forming the precursor to the Internet as we know it today. But the users were dependent on word of mouth to find out which interesting files might be found on which computer. This was something that Alan Emtage, Peter Deutsch and Bill Heelan from McGill University in Montreal were determined to change. Their idea was to create a central database in which all the data on the distributed computers could be found. In November 1990, they launched the world’s first search engine. This went by the name of “Archie” – derived from the word “archive”. Archie was capable of searching through files and folders in FTP directories, although reading continuous text was still beyond it. The search was also limited to the names of files and folders. If you wanted to find something using Archie, you needed, if at all possible, to know part of the file name. Moreover, the search query was limited to eight characters, six characters shorter than the term “search command”.

One year after Archie, developers at the University of Minnesota came up with “Gopher” – a network protocol for the retrieval of documents on the Internet. In its structure, Gopher was similar to the principle of the World Wide Web which was first unveiled by physicist Tim Berners-Lee in 1989. In 1992, the first search engine for Gopher was created at the University of Reno, Nevada: “Veronica” could index the titles of files and directories on all Gopher servers. In other words, it generated a virtual directory in which all the terms were stored along with references to which file names contained these terms. What distinguished it so decisively from Archie was the fact that the titles could consist of whole sentences; in other words, they were no longer limited to file names. And it was now possible to limit the search with such Boolean operators as “and”, “or” and “not”. So, anyone who was interested in golf as a sport would be able to exclude other hits by adding “NOTcar”.

Web crawlers, spiders, searchbots

In April 1993, the CERN institute placed the World-Wide-Web technology officially in the public domain, meaning that all and sundry could now use and develop this system free of charge. In June of the same year, the “World Wide Web Wanderer”, intended to measure the growth of the Internet, was created. The “Wanderer” was the first innovation to deploy a searchbot – an automated program which would soon go on to form the basis of virtually all search engines.

Searchbots are now known as web crawlers or spiders: These digital crawlers scurry from link to link and from one website to the next. They analyse their content and source code for particular terms and then create a search index which will be contacted in the event of later search queries. These automated information gatherers don’t just do this job once, but repeatedly. The search index is continually maintained and brought up to date: new websites are added, defunct ones deleted and changed sites updated.

These detective programs owe one of their names to the WebCrawler, the first public search engine with a full text index. This scoured complete documents for search terms and no longer restricted its activities to the titles. The WebCrawler, with a search index of over 4,000 Internet sites, went online on 20 April 1994. Seven months later it notched up its millionth search query. As search engine developer Brian Pinkerton later revealed, this was: “Nuclear Weapons Design and Research”.

Yahoo, Lycos and AltaVista

As the World Wide Web grew and grew, new search engines erupted like mushrooms from the ground. Yahoo was developed at Stanford University, California, in 1994. David Filo and Jerry Yang relied to start with on a manually compiled catalogue and sorted the websites recorded in it into categories; Yahoo would only go on to discover the web crawler principle for itself at a later date.

In April 1995, Lycos went live. The search engine with the solar eclipse in its logo used an algorithm which determined the frequency of the search terms in a document and evaluated the proximity of the words to one another. This boosted the chance of users of actually coming across relevant hits.

In December 1995, computer manufacturer DEC unleashed AltaVista onto the Internet. This search engine was initially intended to demonstrate the power of its own servers to all potential customers. And yet, AltaVista soon became one of the most popular search engines on the young WWW. This was not least due to an especially powerful crawler called “Scooter” which was able to index significantly more websites than most of its competitors.

In the following year, 1996, Larry Page and Sergey Brin launched the “BackRub” project at Stanford University. Their search engine was intended to deliver better results than all that had gone before by using a new method to determine the relevance of websites. On 4 September 1998, Page and Brin registered a new company under the name of Google – a play on words on “googol”, a mathematical term for the digit “1” followed by 100 zeroes. On 21 September 1999, their search engine officially exited its beta phase.

David Filo at his desk in Mountain View, California, in 1998. Yahoo was one of the pioneers of the early internet era.

Google goes from also-ran to head of the pack

At this time, Lycos was the world’s most visited and most used website. Alongside websites, AltaVista could be used to look for and find images, videos and audio; cards and a translation tool were also on offer. Nearly every search engine operator offered free e-mail. The market appeared to be largely saturated. And yet, this Johnny-come-lately with the two “O”s in its name soon became extremely popular. This was because, whereas the top dogs would often plaster their home pages with news and colourful advertising banners, Google made a tidy and clearly organised impression on the viewer. Most importantly of all, the search engine would spit out extremely good results more quickly on average than its competitors.

The reason for this was the PageRank algorithm which Page and Brin developed together. This made sure that, alongside the content of a website, Google also factored its popularity into its search results rankings by determining the number and quality of links to a site. The thinking behind this was that the more often people placed links to a site, the more relevant it was to them - and to other users who were interested in similar content. Alongside the quantity of the links, their quality also played a role. A site would be given a better ranking if other important sites – with high-quality links – also had links to it. One link from a major news portal such as the New York Times would, for instance, have significantly greater weight in terms of relevance than a link from Heidishaekelhimmel.de. As the principle borrowed from science has it, anyone who is cited by well-known researchers must in turn have something important to say. Google’s new algorithm would spit out much less spam and fewer obscure hits than the algorithms of the heavyweights which dominated the field at the time.

© Google/dpa
The Google logo from 1997, quite related to today's, existed before the company was founded.

From the Internet into the Oxford dictionary

Just one year after its official launch, Google had established itself as the world’s biggest search engine, with over one billion web documents. It soon also became synonymous with the Internet search. In 2006, the verb “to google” was added to the Oxford English Dictionary. In the same year, Microsoft used an update to steer its “MSN Search” into its beta phase. Where the company had previously taken its search results from rivals like AltaVista, it now sought to enhance its standing with its own search engine. MSN, the forerunner to Bing, had a market share in the US of 14 percent at the time. Google was at the same time answering some 43 percent of all search queries.

Google would go to cement and build on its advantage in the coming years. One way it did this was by continuously improving and refining its search function. The year 2001 saw the advent of the “did you mean” function, which would offer results for the correct versions of incorrectly entered search terms. The search engine has also been able to display synonyms for search terms since 2002. The autocomplete function would follow in 2004, which allowed users to add frequently found queries to their searches. On the other hand, Google would continue on its path of consistent development from simple searches for websites to a universal first port of call for any online content whatsoever.

© Kim Kulish/Corbis via Gettyimages
Larry Page (l.) and Sergey Brin (r.) founded today's global corporation in 1998. Here they are pictured in the server room of Google's headquarters in Mountain View.

Universal search engine

In 2001, Google launched its image search function. The occasion for this development is said to have been a photo of Jennifer Lopez in a revealing dress at the Grammy awards which led to a stratospheric spike in search queries, even though Google was still unable to deliver the photo to match the queries. By 2006, targeted searches for news, individual forums and price comparisons had been added to the product stable. In 2005, a desktop version of Google Maps was launched in the US, followed four years later by a fully-fledged navigation system. Today, Maps is the world’s most frequently used satnav app and also one of the world’s most popular apps overall. With free offers like the Gmail e-mail service, the Drive cloud storage system and collaborative tools like Google Docs, Google soon made itself indispensable in the lives of many users.

The free services were financed from advertising revenues. Google had paved the way to this development back in 2000. Just two years earlier, however, Page and Brin had offered a trenchant criticism of the commercial use of search engines in a paper written at Stanford. But they would go on to make a screeching U-turn: from that time on, Google started placing ads and began to connect advertising space to particular search terms, enabling ads to be displayed to specific target groups. Google’s business model – which became a major bone of contention – increasingly involved collecting user data to allow it to show its users personalised advertising that was as precisely tailored as possible.

Startpage, DuckDuckGo and data minimisation

As counterproposals to the big data gatherers, some alternative search engine projects emerged in the following years. “Startpage” arrived on the scene in 2006. Unlike Google, Bing and Yahoo, this search engine did nor record IP addresses, nor did it use cookies to identify users, as the consumer protection body the Stiftung Warentest officially attested in 2019. Its good search results were in turn down to the algorithm developed by its big brother from Mountain View. The users’ search queries were forwarded anonymously to Google. Instead of storing, exploiting and marketing data, Startpage finances its activities with non-personalised advertising. “DuckDuckGo”, which was launched in 2008, also made its money with non-personalised advertising and dispensed with tracking and data storage. Unlike Startpage, the duck-themed search engine used a hybrid model. The search results were taken from different sources, including Wikipedia, big search engines like Bing and its own web crawler, which went by the name of DuckDuckBot.

On 11 January 2021, DuckDuckGo crossed the 100-million daily search query mark for the first time. By way of comparison, Google records 228 million – per hour. The giant search engine processes roughly 5.5 billion requests every day. The company claims that about 15 percent of these are first-time queries. In other words, people in different parts of the world ask questions that have never been put to Google in this form over 800 million times a day.

From search engine to answering machine

Most small search services currently source their results from Bing or Google. After all, the development and operation of a search engine requires colossal resources. Under the surface of former giants Yahoo and Lycos, it is now Bing that does the work. German search engine “Ecosia” also uses Microsoft technology. What also connects the bigger search engines with the smaller ones is the fact that nearly all of them increasingly deliver direct answers to the results pages. Anyone who wants to know what the weather is going to be like in Birmingham tomorrow, what the time is in Tokyo, when Angela Merkel’s birthday is or how to replace a bike inner tube now generally no longer needs to click on a link. Search engines have increasingly morphed into answering machines.

These might interest you too:

Gettyimages

Data centres

Heating with data

How data centres could become more energy efficient.

Buzzard

Profile

Dario Nassal: Busting out of the echo chamber

The #Profile series presents exciting and inspiring people from the digital scene. In the spotlight this time: Dario Nassal, founder of Buzzard.

Google Trends as yardstick

Google Trends are seen as a yardstick of what particularly motivates people all over the world during particular periods. In 2020, it will come as no surprise to learn that “coronavirus” took the top spot, followed by the results of the US presidential election.

Google has been systematically recorded search queries since 2004. Interest in the “weather” has continuously increased since then. In June 2021, questions about the weather overtook the interest in “sex” as a search term for the first time. The two terms are now running more or less neck-and-neck. The greatest interest in the weather is shown in Pakistan, South Africa, the UK, Australia and Ireland.

In July 2014, search queries related to “Germany” hit an all-time high. This may well be due to football. It was in this month at the World Cup in Brazil that the German team beat the hosts to claim the title.