Tuesday, August 28, 2012

Components of crawler-based search engines

There are three major components in Crawler-based search engines:

Crawler: The crawler is a computer program. It’s also called the spider. The crawler/spider visits a web page, reads it then scanning text and then follows links to other pages within the site. The crawler/spider will return to the site on a regular basis, such as every month or every fifteen days, to look for changes and the findings go into the index.

Index: Index is like a huge book with every page the spider has found. Everything the Crawler/spider finds goes into the second part of the search engine, the index. The index will contain a copy of every web page that the spider finds. It can take up to a few weeks for a spider to crawl and index a site. If a web page changes anything, then the index is updated with the new in sequence.

Search engine software: Search engine software program that accepts the user-entered query, understands it, and filter through the millions of web pages recorded in the index to find out matches and ranks them in order of most appropriate and presents them in a convertible manner to the browser. All crawler-based search engines have the basic parts illustrated, but there are distinctions in how these parts are tuned. That is why, the same search on different search engines often create different results.

1 comment:

