How Google’s Search Engine Work: Crawling, Indexing, and Ranking Your Site
According to Google, it has stored ‘hundreds of billions of webpages’ in its index which it uses to show results. In fact, these webpages amount to over 100,000,000 gigabytes of data that Google stores to deliver the best possible results for your search.
However, did you know that only 61% of the webpages on the Walmart website are recorded by Google? Not just that, ~20% of the URLs on the popular online publishing platform, Medium are not recorded or ‘indexed’ by Google. That means that even though those pages exist on the World Wide Web, they would never show up on Google Search Engine Results Pages (SERPs).
So, it’s quite likely that your website or several pages on it go unnoticed by Google. To ensure that doesn’t happen, you need to understand how Google works and how its bots find, record, and rank your webpages.
How Search Engines Work
To start with, any search engine out there, Google or otherwise, has three primary functions:
Finding your content: Crawling
A search engine uses robots or crawlers or spiders to comb through all the content/code that’s there on the world wide web looking over any new content added or any old content that’s been updated. It then scans through this new content to find information that needs to be added to its storage. This is known as crawling.
Storing your content: Indexing
All the content that is found during the crawling process is then stored and organized in an index. In the process of indexing, the search engine understands the content on your website and categorizes it accordingly. If your webpage is indexed by a search engine, it will be shown as a search result for relevant search queries.
Displaying your content: Ranking
Depending on the content of your webpage, the search engine will display your content on its SERPs for relevant keywords. This is known as search engine ranking. A search engine ranks content based on how relevant it is to the query wherein the higher the relevancy, the higher will your webpage be ranked.
How Google’s Search Engine Works- Crawling, Indexing, and Ranking by Google
There’s no denying that Google is the go-to search engine for the majority of the world population. As a result, 76% of all searches happen on Google. Clearly, it becomes imperative to understand how Google is finding your web content.
Crawling by GoogleBot
No discussion about crawling can be complete without talking about Googlebot. Googlebot is the popular name given to Google’s web crawlers or spiders that scan your content and figure out what part of your content should be added to Google’s Index.
Considering that there’s a variety of content that’s available on the web, Google has a different Googlebot for each. Some of them include:
Googlebot crawls your website at regular intervals and keeps adding new pages and content to its index when you update your website. But how do you know whether or not Google has added all your website pages to its index? In fact, how you ensure that your website is present in Google’s index in the first place.
How to check which pages has Google indexed for your website?
Google has made it quite easy for you to know which of your website pages are stored in its index. All you need to do is go to Google and search for “site:yourdomain.com” into the search bar. Google will show you all the pages that it has in its index for the specified site.
In the case where Google hasn’t crawled through your website and indexed it, it will tell you that the search ‘did not match any documents’.
Another scenario could be where Google lists out several pages from your website but not all of them. So, what could have possibly gone wrong?
Some of the possible reasons why your webpages did not appear in search results are:
- Google is yet to crawl your brand new site.
- Your site doesn’t have any backlinks.
- It’s too complicated for a robot to navigate through your site.
- Google has penalized you for spammy tactics.
So, what do you do next? Now, you can either wait for Google to do its job or you can do your bit in expediting the process and ensuring that your website is crawled right.
Here’s how you can ensure that Google crawls your website
Robots.txt is your way of telling Googlebot what to crawl and index on your website. It is their directory to your website making it quite easy for them to scan your website. You need to add Robots.txt in the root directory of your website (yourdomain.com/robots.txt) and Googlebot will adhere to it while deciding indexable and non-indexable pages for your website. You can even tell Googlebot how frequently they should crawl your website by adding the right robots.txt directives.
In addition to Robots.txt, using Google Search Console and creating a Sitemap can also help in crawling and indexing of your website pages. But before we discuss that, let’s quickly understand how Google indexes your pages.
Indexing by Google
Once Googlebot has found a webpage, Google then tries to understand what the page is about and stores it in its index. In the index, Google will categorize your content appropriately, classify images, and make a record of the videos on your website pages. However, just because Google has crawled your website does not guarantee that it will index them too.
Here are a few standard ways to assist Google in indexing your site better:
Google Search Console (GSC)
Google Search Console is Google’s aid for you to track the performance of your website on SERPs. It takes care of the technical aspects of SEO while you can focus on your SEO content. Google Search Console will tell you all about your site’s clicks, impressions, click-through rates (CTR) of different search results, and the average position of different pages on SERPs.[see image for reference]
But more importantly, Google Search Console provides an “Index Coverage” tab wherein you get to know which pages are in Google’s index, and which aren’t. Not just that, it also lets you know what errors and warnings made it difficult for Google to index your pages properly.
Adding your property to the GSC ensures that Google crawls your website. You just need to login to the platform, add your domain as property, and verify that you are the owner. Also, remember to add all possible domains that Google could index your site under. Google itself points out, “If you see a message that your site is not indexed, it may be because it is indexed under a different domain. For example, if you receive a message that http://example.com is not indexed, make sure that you’ve also added http://www.example.com to your account (or vice versa), and check the data for that site.”
XML Site Map
Another way of fool-proofing that Google crawls and indexes all the pages of your website that you want to be indexed is to submit an XML Sitemap to Google. An XML sitemap is basically a list of all the important pages on your website that aids Google in finding and crawling. Additionally, it also helps Google understand your website structure better.
All content management tools like WordPress, Wix, and Shopify have tools to help you generate your sitemaps. You can also use online tools like Screaming Frog to create a sitemap.
Source: Google Support
To submit your sitemap, you first add it to the root directory of your website and then submit is URL on Google Search Console through the following steps:
Go to Google Search Console > Sitemaps > Paste the Sitemap URL> Hit “Submit”
Robots.txt Meta Directives
As discussed earlier, Robots.txt can help Googlebot in crawling your site easily. To make it more worthwhile, you can add additional meta information to ensure that your website is indexed in a way that you want.
These are instructions that you give to the robots as they crawl through your site in the form of Meta tags that you add to the HTML of your Robots.txt. These tags include:
- Index/Noindex tags- These tags tell Googlebot whether a page crawled by Googlebot should be stored in the index. While you’d want Google to index all the pages, it is recommended you noindex pages that have thin content available only for specific visitors.
- follow/nofollow tags- These tags tell Googlebot whether or not to scroll through the links on your pages to give link equity to the following pages. By default, all pages have the follow tag. However, you should add the nofollow tag to your noindex pages.
- noarchive tag- This tag prevents search engines from saving a cached copy of your webpage. It is very useful for websites that have content that changes quickly. For instance, eCommerce websites keep updating their products and their pricing, so you’d not want Google to show the product page with the old price. So, a noarchive tag should be added there.
Blog Content (Pro-tip)
While the above tactics are quick hacks to up your indexing in Google, there’s a long-term play involved as well. Creating a blog for your website helps you in getting better indexing. Blog content not only gets crawled and indexed faster than static pages, but it also brings in more website traffic.
Ranking through Google Algorithm
To rank your content based on how relevant it is to the search query, Google uses Google Algorithms that rank content based on keywords, backlinks, user experience, and content quality, amongst others. Learn more about Google Algorithms and why they matter and the most important Google Algorithm updates that impact your website SEO rankings.
In The End- You can’t ignore the bots
While it should be your primary focus to make your website user-friendly and informative for your customers, you can’t be completely dismissive of the bots, especially the Googlebot crawling and indexing your website.
You need to put a dedicated effort in Googlebot optimization and ensuring that your website has all the required elements and tags to make it easier for robots to scan through your website. Only then, will your site be crawled, and indexed, and shown for search results. And trust us, the entire effort will be worth it once you start seeing remarkable results in terms of organic traffic.