Have you invested a lot of work in your content, have you optimized your site, developed a good logical structure and got a lot of relevant links for your content? Your site should actually have good positions on Google, but unfortunately you cannot identify rankings using your own search or relevant SEO tools such as Searchmetrics or Sistrix. In the following article, we have compiled reasons and tools with which you can check the reasons why your pages cannot be found on Google. To distinguish the problem we have to distinguish between two scenarios:
- The URL is new and unknown to Google
- The URL is old and unknown to Google
How do you analyze whether Google knows your URL?
You can use simple on-board tools to analyze whether Google knows your site. To do this, you copy your URL and then enter your URL into the Google search in combination with the “site: url” command. Here’s an example of what that might look like.
# 1 Google hasn’t crawled and indexed your page yet
Google also takes time. So if you’ve recently put your page live, it can happen that Google has simply not found and indexed your page yet. Typically, the Google Crawler follows known URLs and follows all new URLs that Google can find in this crawl process. The last known number (2016) is that Google has more than 130 trillion pages in its index.
And even if Google has one of the largest computing capacities in the world, Google prioritizes its crawling and indexing. What options do you have to make Google aware of a new URL on your site:
- You link your new URL internally from important sites
- You are trying to link your new URL from an external (trustworthy) site
- You submit your new URL via an XML sitemap
- You use the URL inspection tool in the Google Search Console
This is how you use the URL inspection tool : As described, there are a number of ways to make Google aware of a new URL. One possibility is the URL inspection tool in the new Google Search Console. You go to the Search Console and enter the URL at the top of the search window that you would like Google to crawl and index. In a first step, Google analyzes the URL and gives you feedback on the URL. In the screenshot below you can see what that might look like.
If you have a completely new URL, Google will give you the following feedback.
URL is not on google: this page is not in the index, but not because of an error. Coverage: URL is unknown to Google. In the next step, click on “Request indexing” on the right. This process takes about 1-2 minutes and you will then receive a message that your URL has been added to a preferred crawling queue.
Your canonical tag points to a different URL
The canonical tag (rel = canonical) can be used to ensure that only one URL (source) is used for indexing when there are several URLs with the same or similar content. This avoids the disadvantages of duplicate content. If, for example, the canonical refers to a different URL after a relaunch, this means that the original URL cannot be correctly indexed. Here, too, the path leads you to the Google Search Console. You use the URL inspection tool and check the URL that you would like to find in the Google index. For this example I have used a URL that we deliberately do not want to have in Google’s index. You see here too, Google says “URL is unknown” again, but the reason is:
If you indicated this on purpose via the so-called rel = “canonical” Google, then everything is fine. Perhaps you accidentally specified rel = “canonical” in your content management system. Then it can happen to you that Google crawls your site but does not index it. What are you doing now? You now go to your content management system and check in the settings for your URL whether you have set a rel = “canonical” on another page. Most of the time, these settings are made possible via WordPress plugins. If you didn’t want that, remove the rel = “canonical” and give the URL to Google for crawling again.
You locked out Google via noindex
The meta tag “Robots” with the value “noindex” may prevent Google from indexing your site. If you want to change that, remove the value “noindex”. Most of the time, these settings are made via your content management systems. Note : The noindex command is located in the head area of your website and should look something like this:
<meta name="robots" content="noindex" />
Browseo is a simple tool to identify the wrong markup via a “noindex”. In addition to a lot of relevant data that will help you with search engine optimization, such as title, description, etc., you will also find below the information whether your page is set to “index” or “noindex”.
You have generally locked out crawlers via robots.txt
If there is a file named robots.txt in the root directory of your domain, which contains the following lines, you have blocked both Google’s crawlers and all other crawlers from viewing your site:
User-Agent: * Disallow: /
If you remove these two lines, your page can be crawled again and thus also indexed. You can read out the rure robots.txt file quite easily. Since this has to be freely accessible, you just have to append /robots.txt behind your domain or start page and you can use it to check which prohibitions are possibly stored here. Here you can see what that could look like. Usually, for example, you close pages like
- Pages with duplicate content
- Pagination Pages
- Dynamic product and service pages
- Admin pages
- Shopping cart pages
- Chat sites
- Thank you pages out.
You just locked out Google’s crawlers via robots.txt
With the robots.txt it is also possible to specifically lock out crawlers. In this case, the entry is, for example:
User-Agent: Googlebot Disallow: /
If you remove this entry, you can happily continue to index.
Your pagination is not set correctly on the SEO side
If a subpage of your project cannot be reached, this may be because the pagination is assigned nofollow.
Individual links point to nofollow
It is also possible that individual links cannot be reached via nofollow. You can check this by searching the head of your page for the following sequence:
<meta name="robots" content="nofollow" />
You have too much duplicate content
If you provide content that can be found exactly like this or very similar on another page, Google will index this page, but probably not display it where you would like it to be. The reason: this content is not unique. This happens particularly often with product descriptions in online shops or faceted navigation.
Tip : You can easily find out whether you are providing duplicate content by taking a sentence from the text in question and searching in quotes on Google or checking your URL on Copyscape for DC.
The URL was removed using the Google Search Console
Perhaps you intentionally or unintentionally removed a URL from the Google Index using the Google Search Console. In the end, someone requested that your page be removed from the index. To find out, log into the Google Search Console and click on Index -> Remove. If no URL appears there, this is not the reason why your page cannot be found.
You have become a victim of malware or hijacking
This is usually pointed out to you in the Search Console (webmaster tools). So always take a look at the notifications in the top right or check out the security notifications in the navigation on the left. Alternatively, I recommend the Anti-Malware SaaS Sucuri – this service constantly monitors your website and informs you of problems and helps you solve them!
Is your site still not found?
Of course, it is possible that none of these errors caused poor indexing. However, the causes mentioned above cover the most common sources of error. In any case, you shouldn’t panic if your project suddenly no longer appears in the SERPs. Have a cup of coffee and go through this article step by step. I am sure you will find the solution.
Carry out regular SEO audits!
Do mistakes happen in search engine optimization ? Yes of course, that’s why I’ve gotten used to doing an SEO audit on my site in regular routines
- Check Google Search Console. Report coverage – daily
- Important page, such as product pages once a week about the ScreamingFrog
- All pages – once a month in the ScreamingFrog
- How do impressions develop. Clicks, CTR and position – once a week in the GSC
I look at the HTTP status codes in ScreamingFrog, the indexing status and the indexability. If there are problems with redirects, incorrectly set NoIndex instructions, exclusion by robots.txt, rel = “canonical” etc. I then proceed as shown here in this article.