Excluded External URLs

Sitebulb has a range of 'global settings' which act as the default settings for every new audit you start. From the Dashboard, choose the 'Settings' option from the left-hand menu.

Global Settings Option

From here, the 'Excluded External URLs' tab allows you to exclude Sitebulb from scheduling and crawling specific external URLs, by defining domains, paths and patterns.

You will find a number of values pre-set, which are mostly used to stop Sitebulb crawling thousands of irrelevant URLs (e.g. social share linking URLs). If you want to stop Sitebulb from crawling external URLs or domains, add them to these lists.

Global Settings Exlcuded URLs

Excluded Hosts

By adding domains to the list of Excluded Hosts you will stop any URLs that reside on these domains from being scheduled and checked by the Sitebulb crawler.

Adding in 'example.com' would exclude:

  • Any URLs on http://example.com (e.g. http://example.com/abc/)
  • Any URLs on https://example.com (e.g. https://example.com/abc/)
  • Any URLs on subdomains of example.com (e.g. https://blog.example.com)

If you wanted to ONLY exclude a specific subdomain, just add that subdomain only (e.g. https://blog.example.com).

Excluded Paths

By adding paths to the list of Excluded Paths you will stop any external URLs that include these paths from being scheduled and checked by the Sitebulb crawler.

Adding in 'tweet' would exclude:

  • Any URLs that had /tweet/ in the folder name (e.g. https://example.com/tweet/abc)
  • Any URLs that had tweet in the filename (e.g. https://example.com/abc/tweet.php)

You can limit this to make it more specific, for instance adding 'tweet.php' will only match URLs with that specific string.