How to Crawl Really Fast

Sitebulb is designed to be a responsible crawler, with rate limiting in place that slows the tool down if the server appears to be struggling. This is handled through the default speed setting 'URLs per second', which, in general, is the safest way to crawl a website.

However, this can be too limiting to some users, who really want (or need) to crawl a lot faster than the URLs/second limits allow. For these cases, Sitebulb includes a second speed option: Threads.

To control crawler speed using threads, in the Crawler Settings section on the audit setup page, select 'Threads (Advanced)' from the dropdown alongsideĀ Maximum Audit Speed, and then adjust the number dropdown to determine how many threads you wish to use.

Crawling with Threads

Advanced Use Only

Please note that we only recommend advanced users crawl using threads, only on sites you have permission to crawl and the website owner is comfortable with you crawling the website fast. Ideally, this would be a site you know can handle a high number of connections at once.

If you want to learn more about this subject, we suggest you read our guide on crawling responsibly.

You will also need to consider the impact of fast crawling on your computer. To crawl with anything over 20 threads, the machine Sitebulb is installed on needs to have a minimum of 4 cores and 8GB RAM. If you're running on a laptop, make sure your laptop is optimised for performance (or at the very least, not on low power mode!).

Similarly, think about the website itself. How many links are there in navigation? Parsing mega menus that link to every category can really hammer your CPU. If you see this happening, pause the crawl and update the settings to slow it down.