Robots HintsGo to hints
Sitebulb's Robots Hints deal with the robots.txt file, meta robots tags and the X-Robots-Tag, and how robots directives may impact the way in which URLs are indexed by search engines.
This article will explain how robots directives impact crawling and indexing, and how the Robots Hints can help you unpick indexing issues. Throughout the article you will find links to all the relevant Hints that Sitebulb uses.
What are robots directives?
Robots directives are lines of code that provide instruction on how search engines should treat content, from a the perspective of crawling and indexing.
By default - or with the absence of any robots directive - search engines work under the basis that every URL they encounter is both crawlable and indexable. This does not mean that they necessarily will crawl and index the content, but that it is the default behaviour should they encounter the URL.
Thus, robots directives are essentially used to change this default behaviour - by instructing search engines to either not crawl, or not index, specific content.
How are robots directives presented to search engines?
There are 3 ways in which robots directives can be specified:
- Robots meta directives (also called 'meta tags'), which work at a page level. Within the <head> of a page's HTML, you include meta tags like this:
<meta name="robots" content="noindex, nofollow"> to control crawling and indexing on a specific URL.
- X-robots-tags, which can be added to a site's HTTP responses, and can control robots directives on a granular, page level, just like meta tags, but can also be used to specify directives across a whole site, via the use of regular expressions.
- Robots.txt file, which normally lives on example.com/robots.txt, and is typically used to instruct search engine crawlers which paths, folders or URLs you don't want it to crawl, through 'disallow' rules.
In the methods outlined above, if the 'nofollow' directive is used, it means that you do not wish for any of the links on the page to be followed. However, it is also possible to specify that individual links should not be followed, via the nofollow link element.
Sitebulb checks all of the above, and they are all taken into consideration for the Robots Hints.
There are 3 Hints that relate to potential issues caused by the directives themselves, and how they are used in conjunction with internal linking practices.
- Has noindex and nofollow directives
- Internal Disallowed URLs
- URL only has nofollow incoming internal links
There are 3 Hints that relate to rendering issues caused for disallowed resource files:
Multiple robots directives
There are 6 Hints that relate to issues caused by robots directives being specified multiple times:
- Mismatched nofollow directives in HTML and header
- Mismatched noindex directives in HTML and header
- Multiple nofollow directives
- Multiple noindex directives
- Nofollow in HTML and HTTP header
- Noindex in HTML and HTTP header
Further Resources and Reading
- Robots meta tag and X-Robots-Tag HTTP header specifications by Google
- Using the robots meta tag by Google
- What is robots.txt? by Moz