Advanced robots settings

In order to crawl certain websites, you may need to adjust some of the default robots settings, which you can do via the Robots Directives options from the left hand menu of the audit setup.

Robots Directives

Search Engine Bot or Browser Emulation

This section is otherwise known as 'user agent', and reflects which user-agent Sitebulb will use when making HTTP requests. The user-agent can affect how the requests are logged by the website server, and potentially what content is returned.

One important thing to note is that if you selected a Desktop audit during the initial setup, you will only be able to choose from desktop user-agents. Similarly, if you selected Mobile, there will only be mobile user-agents to choose from.

By default, Sitebulb will crawl using the Sitebulb user agent, but you can change this by selecting a different one from the dropdown, which contains a number of preset options.

User agent options

Politeness

By default, the Sitebulb crawler will respect robots directives, but you can over-ride this by unticking the box 'Respect Robots Directives', and optionally control many different directives and how you want Sitebulb to handle them:

politeness-options

Virtual robots.txt

This setting allows you to over-ride the website's robots.txt file, to instead use a 'virtual robots.txt' file.

Virtual Robots

To use it, click the green button 'Fetch Current Robots.txt', which will populate the box above with the current robots.txt directives.

Then just delete or adjust the existing directives, or add new lines underneath. Sitebulb will follow these directives instead of the original ones.