How Search Engines Work

A search engine is a searchable database of websites. Search engines work by taking a "snapshot" of the web. They have special programs called "spiders" or "robots" that go out on the Internet looking for new information, which they bring back to be indexed in the database.

Search engines differ in how much of the Internet they look at, how often they index what they find, and how they organize and present their findings. Most search engines use mathematical formulas to rank or weight your results, based on factors like how often the words you searched for appear in the retrieved documents, or whether they are in the title or in the text, etc. No search engine covers every website that is out there. Because of this, you sometimes need to try your search in several search engines, or use a meta-search engine.

Search engines create temporary "databases" of internet sites: think of it as a "snapshot" of the web. Each search engine uses a different method to determine which sites to list when you do a search--that's why the results vary so greatly. In general, these are some of the factors that are taken into account:

Your search term is in the URL

When you search for a particular keyword, like weather, most search engines will look for that keyword in a site's URL. However, even if a site is at www.weather.com, that doesn't mean it will automatically be first on the list. Variations in search algorithms can make seemingly unrelated links float to the top and obvious links disappear. For example, we searched for Farmingdale University using Webcrawler and AltaVista, and none of our pages were the first ones listed. Our pages were listed in WebCrawler along with some unrelated pages that had the words "farmingdale" or "university" somewhere in the text.

Your search term is in the title

Another factor considered is the title of the page. This is the phrase that appears at the top of your browser, above the navigational buttons. In HTML, the title is defined using the <title></title> tag. In other words, whatever words are placed between <title> and </title> in an HTML document will be indexed by most search sites. Theoretically, if a site has the word movie in its title, it's more likely to be listed when you do a search for movie sites.

How often--and where--your search term appears in the text

Some search engines--like AltaVista, Infoseek, or Excite--send out spiders that actually retrieve the full text of the pages they visit. So when you do a search for a word, such as College, they look through the pages in their database and return a list of the ones that include the word College. One factor that determines how high a particular page appears on the list of results is how many times the word you searched for appears. So a document that mentions College seven times will probably be listed ahead of a page that mentions College once. If you search for a phrase, like New York, documents that include the two words close together should appear higher on the results list than a document that includes the word new and then the word york five sentences later.