Advanced Settings: Robots

To go to Advanced Settings, you scroll to the bottom of the main settings page and hit the 'Advanced Settings' button on the left.

The Advanced Settings are split up using a double-tabbed system. The tabs at the top are split into: 'Crawler', 'Authorization' and 'Robots'.

This article will focus only on the 'Robots' settings:

Advanced Settings: Robots

Respect Robots Directives

By default, the Sitebulb crawler will respect robots directives, but you can over-ride this by unticking the box 'Respect Robots Directives'.

This will spawn 3 new options, which will allow you to control more specifically which robots directives are crawled.

Respect Robots

  • Crawl Disallowed URLs - The crawler will ignore disallowed directives in robots.txt.
  • Crawl Internal Nofollow - The crawler will ignore any nofollow directives on internal links.
  • Crawl External NofollowThe crawler will ignore any nofollow directives on external links.

User Agent

By default, Sitebulb will crawl using the Sitebulb user agent, but you can change this by selecting a different one from the dropdown, which contains a number of preset options.

Change user agent

Virtual robots.txt

This setting allows you to over-ride the website's robots.txt file, to instead use a 'virtual robots.txt' file.

Virtual Robots

To use it, click the green button 'Fetch Current Robots.txt', which will populate the box above with the current robots.txt directives.

Then just delete or adjust the existing directives, or add new lines underneath. Sitebulb will follow these directives instead of the original ones.