Here is a detailed explanation of how Google search works.
Google receives information from many different locations, including:
- Web pages
- User-submitted content, such as Google My Business and Maps
- Scans of books
- Public databases on the Internet
- Almost everything that is public on the internet
Scan
Crawling is the process of finding new or updated pages to add to Google.
One of Google’s crawl engines crawls the page. The terms “crawl” and “index” are often used interchangeably, even though they cover different (but closely related) actions.
Crawling is the operation with which Google bot detects new and updated pages to add to the Google index.
The program that does this is called the Google bot (also known as a robot, bot, or spider). Google bot uses an algorithmic process: software programs determine which sites to crawl, how often and how many pages to retrieve from each site.
Google’s crawl process starts with a list of web page URLs, generated by previous crawl processes and integrated with sitemap data provided by webmasters.
When you visit each of these websites, Google bot detects the links on each page and adds them to its list of pages to crawl.
New sites, updates of existing sites and links that are no longer valid are recorded and used to update the Google index.
How does Google find a page?
Google uses several techniques to find a page, including:
- Follow links from other sites or pages
- Read the sitemaps
How does Google know which pages shouldn’t be crawled?
- Pages blocked in the robots.txt file are not crawled, but may be indexed if they are linked to other pages. Google can deduce the content of a page from the link that refers to it and index it without analyzing its content.
- Google cannot crawl pages that are not accessible to an anonymous user. Therefore, any request for access or authorization to protect the page will prevent it from being scanned.
Improve the scan
Use these techniques to help Google discover the right pages on your site:
- Submit a sitemap.
- Send scan requests for individual pages.
- Use a simple, readable and logical URL path for your pages and provide clear and direct internal links within the site.
- If you break long articles across multiple pages, it clearly indicates pagination to Google.
- If you use URL parameters for navigation on your site, for example if you indicate the user’s country on an international shopping site, use the URL parameters tool to communicate the important parameters to Google.
- Use the robots.txt file with criteria, for example to tell Google which pages you’d prefer it to know about or crawl first, in order to protect the server load. Do not use this as a method of preventing material from appearing in the Google index.
- Use href lang to link to pages in other languages.
- Clearly identify your canonical page and alternate pages.
- View index and scan coverage using the Index Coverage Report.
Indexing
A page is indexed by Google if it has been visited by the Google crawler (“Googlebot”), analyzed to understand its contents and meaning, and stored in the Google index.
Indexed pages can be shown in Google Search results (if they comply with Google’s webmaster guidelines).
Most pages are crawled before being indexed, but Google may index pages even without accessing their content (for example, if a page is blocked by a robots.txt instruction).
Indexing
Google bot processes each crawled page in order to compile a huge index of all the words found and their relative positions on each page.
It also processes information enclosed in key content tags and attributes, such as <title> tags and ALT attributes. Google bot can process many types of content, but not all. For example, it cannot process the contents of some media files.
Note that it does not crawl pages with a no index statement (header or tag). However, he must be able to see education; If the page is blocked by a robots.txt file, login page, or other device, the page may be indexed even if Google has not visited it.
Improve indexing
There are many techniques to improve Google’s ability to understand the content of your page:
Prevent Google from crawling or finding pages you don’t want to show using no index.
Do not use no index for pages blocked by the robots.txt file; if you do, the no index will not be seen and the page may be indexed.
Use structured data.
Follow the Google webmaster instructions.
Return of results
When a user enters a query, Google looks for matching pages in the index, then returns the results it deems most relevant.
Relevance is determined by taking into account over 200 factors.
Google considers user experience when choosing and ranking results, so make sure your page loads fast and is mobile-friendly.
Improve results
- If the results are aimed at users in specific geographies or languages, you can tell Google your preferences.
- Make sure your page loads fast and is mobile-friendly.
- Consider implementing Search results features for your site, such as product or item listings (Rich Snippet).
- Implement AMP for faster page loading on mobile devices. Some AMP pages are also suitable for additional search functions, such as First Page carousels.
- Google’s algorithm is constantly being improved; Instead of trying to guess the algorithm and design your page accordingly, focus on creating quality content that will appeal to users.