Crawler limits

You may wish to limit Sitebulb, in terms of how deep it will crawl, or in terms of how fast it will go. By default, Sitebulb is set up to cater for most sites very well, but occasionally you may find that you need to limit the crawl, and you can do this via the Advanced Settings.

To get to Advanced Settings, you scroll to the bottom of the main Audit setup page and hit the grey Advanced Settings button.

The Limits section is under Crawler -> Limits

Crawler Limits

The first two options give you various ways to limit the crawler so that it crawls less (or more) pages/content:

  • Maximum Crawl Depth - The number of levels deep Sitebulb will crawl (where the homepage is 0 deep, and all URLs linked from the homepage are 1 deep, and so on). This is useful if you have extremely deep pagination that keeps spawning new pages.
  • Maximum URLs to Audit - The total number of URLs Sitebulb will Audit - not that this will include external URLs, and page resource URLs. Once it hits this limit, Sitebulb will stop crawling and generate the reports.

The next options give you ways to limit the crawler so that it crawls more quickly, or more slowly:

  • Number of Threads - this controls how much CPU usage is allocated to Sitebulb. In general, the more threads you use, the faster it will go - however this is capped by the number of logical processors (cores) you have in your machine.
  • Limit URL Speed - a toggle you can use to switch on a "max URLs/second" speed cap. If switched on, Sitebulb will not crawl faster than the specified URLs/second rate.
  • Max HTML URLs per Second - you can only set this value if the box above is ticked. If so, this value with provide the limit. So for instance if this is set to 5, Sitebulb will not download more than 5 HTML URLs per second.

If you wish to learn more about crawling fast, we suggest you read our documentation How to Crawl Really Fast. Alternatively, to examine the benefits of a more measured approach, then check out our article, 'How to Crawl Responsibly.'