Beyond the basic configuration, which determines how Sitebulb crawls and what data it selects, you can change some more specific items via the Advanced Settings.
As a rule of thumb, if there is a setting from another tool that you can't find in Sitebulb, it's probably in the Advanced Settings somewhere.
Please view our full Advanced Settings docs to familiarise yourself with each setting option.
The find Advanced Settings in the tool, just scroll to the bottom of the Audit Setup page:
HTML Crawler: Speed Settings
The speed settings are on the first screen in Advanced Settings, and the bit to pay attention to is the highlighted area below:
You can select the number of threads you wish to use when crawling, but this is limited by the number of logical processors (cores) that your machine has.
So in the example above, the machine only has 4 logical processors, so the maximum threads available is 4. If it had 16 processors, the maximum would be 16.
(The reason for this limitation is because auditing using more threads than are actually available can lead to thread starvation, which causes your computer to slow down and sometimes crash).
Additionally, there is also a default limitation applied via the tickbox Limit URL Speed, which you can over-ride either by un-ticking the box or by changing the dropdown value for Max HTML URLs per Second.
This limitation exists to help you crawl responsibly, and if you want to learn more about that we suggest you read our guide on crawling responsibly.
However, if you are looking for the fastest crawl you can do with Sitebulb, do the following:
- Select the HTML Crawler
- Push 'Number of Threads' up to the maximum
- Untick 'Limit URL Speed'
Please note that this is still limited by the machine itself. If you buy a new computer with 16 cores, you will be able to crawl faster than a machine with 8 cores, all else being equal.
Chrome Crawler: Speed Settings
If you selected the Chrome Crawler from the Crawler Settings, the first screen of the Advanced Settings will look slightly different.
There are no thread options or URL speed limiting options (as neither are applicable when crawling with Chrome). Instead, there is simply the option to select how many Chrome instances you wish to use for crawling. Again, this is limited by the number of logical processors you have on your machine.
Word of Warning
Please note that we only recommend pushing up the speed options if you have permission to crawl and the website owner is comfortable with you crawling the website fast. Ideally, this would be a site you know can handle a high number of connections at once.
If you want to learn more about this subject, we suggest you read our guide on crawling responsibly.