What is a search engine

How does a search engine work?

First of all, you have to realize that search engines do not search the entire Internet, but only part of it, the World wide web. Other services on the Internet are e.g. e-mail, data transfer (FTP), chats or Internet telephony. The WWW cannot be completely searched either. In principle, search engines can only find those websites that are publicly accessible. More information on the restrictions of search engines can be found in the next chapter "Completeness: Which pages does the search index list?".
Carrying out a search on the WWW is technically more complicated than it appears to the user at first glance. There are basically three processes involved in answering a search query:

1. Acquisition
Search engines are constantly collecting new and changed information on the World Wide Web. This is done by so-called crawlers (also called spiders or search robots). A crawler is a system of software and hardware that systematically and continuously searches the Internet, collects the information available on the Internet pages and then saves it on servers.

2. Information processing and indexing
This collection of data is now processed in such a way that an index can be created efficiently. The index is the core of every web search engine.
It is comparable to the index of keywords in a book. The search engine index contains index terms on the one hand, and information on which Internet pages the terms appear on on the other. The search index is a virtual directory that contains billions of terms and references to websites.

3. Provision of information
For each search query, the entire Internet is no longer searched, but only this index, since it can be searched particularly quickly and efficiently. Finally, the most relevant results for the search query are displayed in a list.

Five steps to the result

For the user, it doesn't really matter what happens behind the scenes, but it is still interesting for the understanding of search engines to know how a search actually works.

Step 1: Enter the search query

This step is visible to the user.
The search begins by entering a search term, e.g. [Berlin], in the search field.

Step 2: Finding the Right Data Center

This step is invisible to the user.
Search engine operators maintain data centers around the world with servers on which the index is stored. One or more data centers are selected immediately in the background. When choosing the right data center, proximity, speed and degree of utilization all play a decisive role.

Step 3: The search through the index

This step is invisible to the user.
In the case of large search engines, the index is specially built for parallel query. As a result, each server only has to carry out part of the search and together the entire index can be searched more quickly. Strictly speaking, a search engine does not look for a specific word, but for letter patterns.
After the search has been carried out and the first search result has been compiled, a further check must therefore be carried out. Perhaps the user made a mistake and wanted to look for something completely different? Can you also spell the word apart? The search engine tries to understand this and, if necessary, shows a better search suggestion. If you enter [berln], for example, [Berlin] is suggested.

Step 4: The search for the right document

This step is invisible to the user.
The index servers contain only words and addresses. They refer to the so-called doc servers. The title, text excerpts and other data from the documents stored in the information processing phase (see above) are stored on the doc servers. This information is useful on the one hand for the output (e.g. for the short text excerpts under the addresses, called snippets), on the other hand it is the basis for the PageRank algorithm (see "How does a ranking come about?").
One measure to speed up the search process behind the scenes despite this complex procedure is to temporarily save the results of frequent search queries such as [Berlin], i.e. not to search the index anew every time, but to access already "finished" search results.

Step 5: Output of the search results

This step is visible to the user.
In the last step, the search information must now be displayed in a user-friendly manner.
After all, the most important Internet pages for the entered search term should be recognizable at first glance. In general, the user clicks on one of the first ten search results, results that appear further down in the list (i.e. on subsequent pages) are clicked less often. The order of the hits ("ranking") is determined by an algorithm.