Included URLs

Using Included URLs is a method for restricting the crawler, and this method allows you to restrict the crawl to only the URLs or directories specified.

As an example, if I were crawling the Sitebulb website and only wanted to crawl the 'Product' pages, I would simply add the line:
/product/

To get to Advanced Settings, you scroll to the bottom of the main Audit setup page and hit the grey Advanced Settings button.

The Included URLs section is under URLs -> Included URLs

Included URLs

It is worth noting a couple of things:

  • Excluded URLs over-ride Included URLs, so ensure your rules do not clash.
  • Your Start URL must contain at least one link to an Included URL, otherwise the crawler will simply crawl 1 URL and then stop.