Basically, a search engine is a software program that
searches for sites based on the words that you designate as search terms.
Search engines look through their own databases of information in order to find
what it is that you are looking for.
search engines are not simple. They include incredibly
detailed processes and methodologies, and are updated all the time. This is a
bare bones look at how search engines work to retrieve your search results. All
search engines go by this basic process when conducting search processes, but
because there are differences in search engines, there are bound to be
different results depending on which engine you use.
- The searcher types a query into a search engine.
- Search engine software quickly sorts through literally
millions of pages in its database to find matches to this query.
- The search engine's results are ranked in order of relevancy.
How does a search work?
Words or combinations of words that you have entered in the
search box of a search engine are compared with the information in the search
engine's database. The search function tries to match your input with the
content of this information. The documents that are found are sorted, using a
couple of algorithms, but surely on relevance, and are presented in your
browser. The most relevant document is shown first, followed by other, less
relevant documents
The term "search engine" is often used generically
to describe both crawler-based search engines and human-powered directories.
These two types of search engines gather their listings in radically different
ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their
listings automatically. They "crawl" or "spider" the web,
then people search through what they have found.
If you change your web pages, crawler-based search engines eventually
find these changes, and that can affect how you are listed. Page titles, body
copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory,
depends on humans for its listings. You submit a short description to the
directory for your entire site, or editors write one for sites they review. A
search looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing. Things
that are useful for improving a listing with a search engine have nothing to do
with improving a listing in a directory. The only exception is that a good
site, with good content, might be more likely to get reviewed for free than a
poor site.
"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search engine
either presented crawler-based results or human-powered listings. Today, it
extremely common for both types of results to be presented. Usually, a hybrid
search engine will favor one type of listings over another. For example, MSN
Search is more likely to present human-powered listings from LookSmart.
However, it does also present crawler-based results (as provided by Inktomi),
especially for more obscure queries.
Crawler-Based Search Engine
Crawler-based search engines have three major elements. First
is the spider, also called the crawler. The spider visits a web page, reads it,
and then follows links to other pages within the site. This is what it means
when someone refers to a site being "spidered" or
"crawled." The spider returns to the site on a regular basis, such as
every month or two, to look for changes.
Everything the spider finds goes into the second part of the
search engine, the index. The index, sometimes called the catalog, is like a
giant book containing a copy of every web page that the spider finds. If a web
page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes that
the spider finds to be added to the index. Thus, a web page may have been
"spidered" but not yet "indexed." Until it is indexed --
added to the index -- it is not available to those searching with the search
engine.
Search engine software is the third part of a search engine.
This is the program that sifts through the millions of pages recorded in the
index to find matches to a search and rank them in order of what it believes is
most relevant.