Data analysis options

When you set up a Sitebulb audit, you are able to customise with a high degree of granularity the data that Sitebulb collects when auditing websites.

There are lots of options in the audit setup, which basically break down into these three 'zones'.

On this page we will run through the 'Audit Data' options, which are marked in red as 'Main Data Options' in the image below.

Sitebulb Data Options

Audit Data

By default, certain options are ticked, so if you just go ahead and start your audit without adjusting anything, Sitebulb will crawl your website and collect data regarding SEO, Page Resources and Security.

Search Engine Optimisation

Sitebulb will collect core on-site SEO data, such as internal links and indexability signals. If you were a Sitebulb user prior to version 5, these settings were basically 'always on' and you could not switch them off.

You can toggle some of the data options in the Advanced Settings, which you may wish to do in order to save time and CPU resources:

SEO Settings

Page Resources

In addition to crawling and reporting on data for HTML URLs, Sitebulb will also crawl and check page resources, such as JavaScript, CSS, images, videos and audio files.

You can click to select which data options you wish to include/exclude in the audit via the Advanced Settings.

Page Resources Advanced Settings

Performance and Mobile Friendly (Chrome Crawler only)

Sitebulb carries out its performance analysis directly with headless Chrome, which means the Chrome Crawler is required. If you have the HTML Crawler selected, you will see this message below. You can switch to the Chrome Crawler in the Crawler Settings.

Performance Requires Chrome

With Performance & Mobile Friendly enabled, Sitebulb will perform performance and mobile friendly analysis for every URL, highlighting opportunities and diagnostic issues. Sitebulb will also collect Web Vitals metrics for a sample of URLs.

Enabling this option will automatically open up the Advanced Settings, which allow you to change the sampling for Web Vitals (default selection 10%) and toggle the Code Coverage and Technology options.

Performance Advanced

We have a complete guide on auditing performance and Web Vitals.

Structured Data

Sitebulb will collect structured data and validate it against both Schema.org guidelines and Google's guidelines for their Search result features.

We also have a comprehensive guides on auditing Structured Data.

Security

Sitebulb will perform server analysis on protocols and certificates, in addition to checking every URL for on-page security issues and vulnerabilities.

International

Sitebulb will crawl URLs specified in hreflang annotations (even if they are on different domains), and check the validity of hreflang and HTML lang attributes.

AMP

Sitebulb will crawl any AMP URLs found, and check that they are valid and reciprocal.

You can use the Advanced Settings to toggle crawling pure AMP URLs (on sites that only use AMP pages).

AMP Advanced Settings

Accessibility

Sitebulb carries out its accessibility analysis directly with headless Chrome, which means the Chrome Crawler is required. If you have the HTML Crawler selected, you will see this message below. You can switch to the Chrome Crawler in the Crawler Settings.

Accessibility requires Chrome

With Accessibility  enabled, Sitebulb will run over 50 automated accessibility checks, across every page on the website. It will highlight accessibility violations and identify opportunities to make your web pages more inclusive and user-friendly.

Audit Settings

In the section above we covered the 'Audit Data' options, which take up the right hand side of the screen when you first open view the audit setup screen. 

Now we will cover the small menu on the left hand side, which was marked in red as 'Extra Data Options' on the big image at the top.

Audit Settings

By default, these typically have a greyed out tick alongside them, which means they have not been enabled. If you go through and enable a particular option, it will have a green tick alongside.

If an option requires attention, it will have a red alert marker - for example, if you have selected Google Analytics but not chosen an account:

No GA account selected

When you select any one of the menu options on the left, the right hand panel will change, to show further configuration options.

Google Analytics

You can connect to a Google Analytics account to access visit, engagement and conversion data for each URL.

In terms of setup, the most important thing to check is the Property and View selections. Sitebulb will attempt to auto-select the right View based on the Google Analytics tracking ID found on the page, but sometimes you may wish to select a different view.

In the configuration section, you can also adjust the date range used for Google Analytics data and select to crawl any other URLs found in Google Analytics (that were not found by the crawler).

Select GA Account

Google Search Console

You can connect to a Google Search Console account to access Search Analytics, Keywords and Sitemap data. Additionally, you can select to crawl any other URLs found in Search Analytics (that were not found by the crawler).

In terms of setup, the most important thing to check is the Property selection. Sitebulb will attempt to auto-select the right Property by matching up the start URL with the properties in the account, but sometimes you may wish to select a different Property.

In the configuration section, you can also adjust the date range used for Google Search Console data and select to crawl any other URLs found in Google Search Console (that were not found by the crawler).

Google Search Console

The final option for Google Search Console is 'Analyse Google Search Console Keywords', and by ticking this you activate the Keywords report. This means that Sitebulb will extract keyword data from the Search Console API, including clicks, impressions and CTR. The bottom box allows you to enter brand keyword, which then allows Sitebulb to group the data by brand or non-brand keywords.

Keyword Analysis

Extraction

Available using either the HTML Crawler or the Chrome Crawler are 'Extraction' options, as below. Typically, the reports you can get from these options would be considered tangential to SEO auditing.

An important thing to note regarding extraction is that it may influence the crawler you need to select. For example, if Schema.org markup is added to the page via JavaScript, then you would need to use the Chrome Crawler in order for Sitebulb to pick it up.

Extraction Options

  • Structured Data - Sitebulb will collect structured data and validate it against both Schema.org guidelines and Google's guidelines for their Search result features. We also have a complete guide on Structured Data.
  • Content Extraction - Sitebulb will collect specific content elements from the HTML, based on custom rules that you define using CSS paths. We also have a complete guide on Content Extraction
  • Content Search - Sitebulb will check each URL for words or phrases that you specify via rules, and count the instances it finds for each rule. Optional advanced configuration allows you combine multiple words or phrases. We also have a complete guide on Content Search.

Sitebulb's Recommendation: Don't tick everything

Some users are tempted to tick every box they can, figuring they can just ignore any data they don't want or need. This is not a great idea, in general.

Every checkbox you tick will require Sitebulb to do more processing, which means the audit will take more time and will use more computer resources. On some computers, ticking every single box will mean that it is very difficult to continue doing other tasks.

In particular, Performance and Accessibility are CPU intensive, so only select them if you actually care about the data.