What is Web crawling ?
Web crawling in a few worlds
A Web crawler is an Internet bot which helps in Web indexing. Web indexing is a process of the detection of the links on the page. The bots crawl one page at a time through a website until all pages have been indexed.
Site indexation
Normally bot starts from the main page and works in a loop until it’s capable to find new links. Web crawlers help in collecting information about a website and the links related to them, and also help in validating the HTML code and hyperlinks.
Web crawler in the web search engine. The most famous are Google, Yahoo, Bing, etc.
Example
Imagine we would like to index the ebay.com site.
The crawler will send the request and analyze the code.
In the example below, we can see 4 new links :
- http://www.ebay.com/trending/?_trkparms=pageci%3A8641e8da-f189-11e8-8afd-74dbd1802a2f%7Cparentrq%3A5078e6ad1670ad790f62abbefff039ac%7Ciid%3A1
- http://www.ebay.com/trending/?_trkparms=pageci%3A8641e8da-f189-11e8-8afd-74dbd1802a2f%7Cparentrq%3A5078e6ad1670ad790f62abbefff039ac%7Ciid%3A1
- https://www.ebay.com/sch/i.html?_nkw=diamond+earrings
- https://www.ebay.com/sch/i.html?_nkw=nfl+ugly+sweaters
So nextly the bot will crawl every page from this list and try to find the other links.
Thank you for your attention
Thank you for your attention. If you have any question or advice please feel free to contact me. I‘ll be glad to help you.
LinkedIn, Twitter, Google+, lytvynov.anton@gmail.com, https://lytvynov-anton.com