To go to Advanced Settings, you scroll to the bottom of the main settings page and hit the 'Advanced Settings' button on the left.
The Advanced Settings are split up using a double-tabbed system. The tabs at the top are split into: 'Crawler', 'Authorization' and 'Robots'.
This article will focus only on the 'Robots' settings:
Respect Robots Directives
By default, the Sitebulb crawler will respect robots directives, but you can over-ride this by unticking the box 'Respect Robots Directives'.
This will spawn 3 new options, which will allow you to control more specifically which robots directives are crawled.
- Crawl Disallowed URLs - The crawler will ignore disallowed directives in robots.txt.
- Crawl Internal Nofollow - The crawler will ignore any nofollow directives on internal links.
- Crawl External Nofollow - The crawler will ignore any nofollow directives on external links.
By default, Sitebulb will crawl using the Sitebulb user agent, but you can change this by selecting a different one from the dropdown, which contains a number of preset options.
This setting allows you to over-ride the website's robots.txt file, to instead use a 'virtual robots.txt' file.
To use it, click the green button 'Fetch Current Robots.txt', which will populate the box above with the current robots.txt directives.
Then just delete or adjust the existing directives, or add new lines underneath. Sitebulb will follow these directives instead of the original ones.