Crawler Settings

The final settings option is the crawler settings themselves, which control how the crawler itself will work.

Crawler Settings

  • Crawler Type - There are two options available in the dropdown here: 'Non-JavaScript Crawler' and 'JavaScript Crawler'. The default option is the Non-JavaScript Crawler, which crawls using traditional HTML extraction. This type of crawling is a lot faster, and is sufficient for most websites. If you use the JavaScript crawler, this will download all page resources, fire JavaScript and render each page - which makes it a lot slower. However, some websites can only be crawled with the JavaScript crawler. If you are planning to use the JavaScript crawler, we recommend reading our guide.
  • Maximum Audit Speed - This defines how fast you want the crawler to work. We recommend a speed of 5 URLs/second, which is more than adequate for most websites. Adjust the number dropdown box to increase this speed. In the second dropdown, there is also an option to set the crawl speed using Threads. This is an advanced feature, so please read the associated documentation for this.
  • Maximum URLs to Audit - You can limit the crawler so that it stops after it hits the number of URLs you have set here. The default maximum is set at 500,000 URLs, however Pro users can push this up to a hard maximum of 2 million URLs.
  • Respect Robots Directives - This is selected by default, and means that Sitebulb will not crawl disallowed or nofollowed URLs. Deselecting this option gives you further options, so you can choose if you specifically would like to crawl disallowed, internal nofollow, or external nofollow URLs.
  • Save Disallowed URLs - This is deselected by default, so tick the box if you wish to save off disallowed URLs in the 'Uncrawled URLs' export. As Sitebulb encounters URLs disallowed by robots.txt, it will store these so you can look at them in your audit report. If you leave this box unticked, these URLs will simply be discarded.