Indexability Hints

Indexability

Indexability relates to the technical configuration of URLs so that they are either Indexable or Not Indexable.

Search engines generally take the stance that any successful URLs (i.e. HTTP status 200) they find should be indexed by default - and they will, in the main, index everything they can find. However, there are certain signals and directives you can give to search engines that instruct them to NOT index certain URLs.

Setting URLs so that they are Not Indexable is a relatively common task, and straightforward to do in most modern CMSs. You might want to set a URL to noindex, for instance, if it is useful to website users, but is not a page that would represent a useful search result (e.g. a 'print' version of a page).

However, indexing signals often get misconfigured, or set up incorrectly, which can result in important URLs not getting indexed. An important thing to note is that if a page is not indexed, it has no chance to generate any organic search traffic.

Sitebulb's Indexability Hints deal with the robots.txt file, meta robots tags, X-Robots-Tag and canonical tags, and how these directives may impact the way in which URLs are crawled and indexed by search engines.

What are robots directives?

Robots directives are lines of code that provide instruction on how search engines should treat content, from a the perspective of crawling and indexing.

By default - or with the absence of any robots directive - search engines work under the basis that every URL they encounter is both crawlable and indexable. This does not mean that they necessarily will crawl and index the content, but that it is the default behaviour should they encounter the URL.

Thus, robots directives are essentially used to change this default behaviour - by instructing search engines to either not crawl, or not index, specific content.

How are robots directives presented to search engines?

There are 3 ways in which robots directives can be specified:

  • Robots meta directives (also called 'meta tags'), which work at a page level. Within the <head> of a page's HTML, you include meta tags like this:
    <meta name="robots" content="noindex, nofollow"> to control crawling and indexing on a specific URL.
  • X-robots-tags, which can be added to a site's HTTP responses, and can control robots directives on a granular, page level, just like meta tags, but can also be used to specify directives across a whole site, via the use of regular expressions.
  • Robots.txt file, which normally lives on example.com/robots.txt, and is typically used to instruct search engine crawlers which paths, folders or URLs you don't want it to crawl, through 'disallow' rules.

In the methods outlined above, if the 'nofollow' directive is used, it means that you do not wish for any of the links on the page to be followed. However, it is also possible to specify that individual links should not be followed, via the nofollow link element.

What is a canonical?

In the field of SEO, a 'canonical', is a way of indicating to search engines the 'preferred' version of a URL. So if we have 2 URLs that have very similar content - Page A and Page B - we could put a canonical tag on Page A, which specifies Page B as the canonical URL.

To do this, we could add the rel=canonical element in the <head> section on Page A; 

<link rel="canonical" href="https://example.com/page-b" />

If this were to happen, you would describe Page A as 'canonicalized' to Page B. In general, what this means is that Page A will not appear in search results, whereas Page B will. As such, it can be a very effective way of stopping duplicate content from getting indexed.

When you set up a canonical, you are effectively saying to search engines: 'This is the URL I want you to index.' People may refer to a canonical as 'a canonical tag', 'rel canonical' or even 'rel=canonical'.

In Sitebulb, if a URL is canonicalized, it is also classed as 'Not Indexable.' Conversely, if a URL has a self-referential canonical (i.e. a canonical that points back to itself) this URL would be Indexable.

Self-referential canonicals are a useful default configuration, and are typically set up to help avoid duplicate, parameterized versions of the same URL from getting indexed, for example:
https://example.com/page?utm_medium=email

How are canonicals implemented?

The most common way that canonicals are implemented is through a <link> tag in the <head> section of a URL. So on Page A, we could specify that the canonical URL is Page B with the following:

<link rel="canonical" href="https://example.com/page-b" />

Canonicals can also be implemented through HTTP headers, where the header looks like this:

HTTP/... 200 OK

...
Link: <https://example.com/page-b>; rel="canonical"

Typically, this is used to add canonicals to non-HTML documents such as PDFs, however they can be used for any document.

As such, it is considered best practice to only ever use one method of assigning canonicals for each URL on a given website.

Indexability Issues

Most of the Indexability Hints are Issues, which represent errors or problems that need to be fixed. They are additionally classified in terms of their importance - this should be taken into account when prioritizing implementation work, along with the number and type of URLs affected.

Importance: Critical

These Hints require immediate attention, as the issue may have a serious impact upon crawling, indexing or ranking.

Importance: High

These Hints are very important, and definitely warrant attention.

Importance: Medium

These Hints are worth investigating further, and may warrant further attention depending on the type and quantity of URLs affected.

Importance: Low

These Hints are of the lowest significance, and should only be addressed if there aren't more serious issues which have not been handled.

Indexability Potential Issues

Hints marked 'Potential Issue' describe a situation that might be an issue, or might cause an issue. In the case of Indexability Hints, they typically highlight configurations which are not damaging right now, but could cause issues further down the line.

Indexability Opportunities

Hints marked 'Opportunity' describe where you could optimize the site to potentially improve performance further.

Indexability Insights

Insights are neither issues nor opportunities, and often don't require any action at all - they are brought to your attention as they may provide a useful avenue of investigation.

Ready to try Sitebulb?
Start your free 14 day trial now

Start Free Trial