Sunday, July 6, 2008

What a 'Spider' does

Spider is a s/w program that search engines use to find what's out there on the ever chaning web. Many types of spider in use. Crawls is one of main type.
Crawls - oversimplified picture, but basically this program starts at a website,loads the pages, and follows the hyperlinks on each page. The spider crawls from one website to another.
When a "crawler" visits one of your web pages, it loads the page's contents into a database.
Once a page has been fetched, the text of your page is loaded into the search engine's index, which is a massive database of words, and where they occur on different web pages.
so basically 3 steps,
1. crawling - fetching pages
2. indexing - breaking them dowm into words for the index
3. final step where the links- web page address/urls that are found get fed back into the crawling program to be retrieved.

Lijit Search

Free web directory

No comments: