URL crawl sources

The default setting is for Sitebulb to crawl the website. However it can be configured to also crawl XML Sitemap URLs, and/or a provided URL List.

URL Crawl Sources

Crawl Website

Pretty straightforward option - Sitebulb will perform a website crawl, following links on every page to discover new URLs, until every page on the website is crawled.

XML Sitemaps

Sitebulb will crawl URLs found in XML Sitemaps, that were not already found in the main crawl. It will also provide analysis of sitemap URLs, and compare URLs found in the sitemap vs URLs found by the crawler.

Any XML Sitemaps referenced in robots.txt will be pre-filled when you select this option. You can also add in multiple sitemap URLs or sitemap files using the various upload options.

Add Sitemap URLs

URL List

Sitebulb can also 'crawl' based on a list. It isn't strictly crawling, as links from the pages will not be followed, but the data will be collected and analysed for all URLs contained in the list. Typically URL Lists are used when you DON'T also crawl the website, and are used to crawl a specific area or section of the site.

One thing to note is that Sitebulb will only crawl URLs that match the subdomain of the start URL provided (so you can't just upload a massive list of URLs from lots of different sites).

To add a URL List, simply upload from your local computer.

URL List