Version 7.0
Released on 31st July 2023
Note that this update contains big infrastructure changes: BEFORE you upgrade to v7 please make sure to let any currently running audits finish (DO NOT pause -> upgrade -> resume)
Sitebulb Cloud Launched
We are really excited to announce the launch of our new offering, 'Sitebulb Cloud.'
It answers the question that has annoyed many of you for years: 'what would Sitebulb be like if it was in the cloud?'
We now have the answer! Sitebulb Cloud has all the awesome stuff you already love about the desktop version, now accessible via a web browser.
Check it out:
The video covers some important points that I will reiterate here:
- Sitebulb Cloud is comfortably cheaper than all the other cloud crawlers. I mean it's not even close.
- There's no arbitrary project limits that are designed to extract more money from you.
- We don't charge you extra for JavaScript rendering (like all the others do).
Sitebulb Cloud is an evolution of the 'Server' product we have been shilling since the beginning of the year, but now has the added benefit of the browser login, making it much more accessible.
It's going to be super useful to a whole bunch of customers - it is especially good for teams working together. We already have a number of customers using it and seeing the benefits:
"The most impactful change that Sitebulb Cloud provided was the ability for everyone to work from the same crawl data"
Case Study from Moving Traffic Media
Prices start from £195/month, and we can go all the way up to massive custom plans for loads of users and millions of URLs.
BUT it won't be for everyone. Sitebulb Pro is still going to be the weapon of choice for most consultants and smaller agencies, and we are still committed to relentlessly improving the desktop product (see below!).
If your interest is piqued by Sitebulb Cloud though, please check it out or get in touch.
New Feature: HTML Templates
Every other week there's a new SEO industry study out about how frustrating it is when we make audit recommendations and they are then just ignored by clients or developers.
And we all know that we shouldn't write audit recommendations like this:
"The website has this issue <INSERT ISSUE NAME> on 4762 pages. Pls fix now - Dave"
Very reasonably, you might expect a developer to read this recommendation and think:
"Fuck. Right. Off."
Sitebulb's new feature is designed to help make that developer response a bit closer to "Ok I'll do it now!" (although this level of enthusiasm might be pushing it!).
It is designed to help make your recommendations clearer in the first place:
"The website has this issue <INSERT ISSUE NAME> on 4762 pages. The issue is only present on two of the page templates - Blog Posts (4102 URLs) and Subcategory Pages (760 URLs). If you can resolve the underlying issue on those two templates, it will eradicate the problem across the site. I would be forever in your debt if you could prioritize this fix for me. I have the honour to be your obedient servant, David."
Ok ok. Sitebulb won't turn you into a fawning sycophant, but it will help you deliver more meaningful recommendations by automatically identifying the HTML template the is being used for each URL.
What do you mean by HTML template?
I'm glad you asked. In this modern world of hipster coffee and sourdough bakeries, most websites are built using a content management system (CMS), which allows users to create content without necessarily having lots of technical expertise about how the underlying pages are built.
Typically, developers will build out a range of page templates for the website, which can be selected by users of the CMS. So, for example, a content writer might select the 'Blog Post' template when writing a new post. Or the ecommerce team might select the 'Product Page' template when adding a new product to the store-front.
Essentially, all pages using the same template will inherit the same underlying HTML - which will typically mean that if there is a problem on one page of a template, it will also be present on all the other pages using the same template.
As such, it is very valuable when auditing a website to be able to;
- Recognise the different HTML templates that are in use
- Group pages together based on the template they use
- Segment SEO issues based on template
And this is what Sitebulb's HTML Template feature allows you to do.
Here's our second video of the day:
You don't need to do anything in order to 'switch on' HTML Templates, as Sitebulb will automatically do it.
And then when the audit is finished, HTML Template information will be available once the audit has completed, via the 'HTML Templates' report option on the left hand menu:
This will show you all the URLs on the website, split into the different page template groupings.
You can then go ahead and name all the templates, like in the video. For more comprehensive instructions on how to use this feature, please check out our guide here.
This is the sort of thing you can expect to see, once you have finished naming all the templates:
See how it works? Instead of an unorganized list of URLs, you can work with URL data in a way that reflects how the website has actually been built.
Without any further analysis, this list of templates is already useful data, as it helps you understand the underlying structure of the website you are dealing with, and the potential scope of actually getting issues fixed.
To dig into each template further, you can navigate onto the template report by selecting the template from the dropdown navigation, clicking the name or clicking the 'View' button;
Each of these will take you to the same place, the report page for this specific template:
From here you can explore the data as you would in the rest of Sitebulb tool, with the added context that everything you see on this page relates only to the page template you have selected. You can click through into the URL Lists or tables, or view the triggered hints for this template.
For example, clicking through the pie chart on the left to view indexable URLs will show you the indexable URLs that use this page template. You will notice that the HTML Template is shown as a column in the URL List:
The Hints data alongside each template will also help you easily spot when a particular issue is only present on certain templates. In the example below, we can see that there is a critical issue which affects only the 'Blog Post' template.
If we click through to view this template, we can see the Hints values for the template, and view the triggered Hints for this template by clicking into the Hints tab;
This shows us all the triggered Hints for this specific template, ordered by importance, and therefore showing the Critical one at the top. To dig in further, you can click through to View URLs:
Once you have named all your templates, this data will enrich your reports in other areas of Sitebulb as well.
When you view URLs, you can identify which page templates an issue was present on, which aids your understanding of the issue, and can make your communication with your client/developers a lot clearer.
You will also find the HTML Template in areas such as the URL Explorer, and configurable as a column you can add for any URL List it is not already present on.
You'll also notice a 'HTML Templates' tab that becomes available in the different reports. This will show you a top-down matrix of common issues, and on which page templates they occur.
The HTML Templates tab only appears on certain reports - those that make sense in terms of the issues covered - and in some places you will find them more valuable than others.
Within Performance, for example, it is hugely beneficial to see the breakdown of different Web Vitals metrics on each template, particularly since the work involved in improving performance is so heavily template-based.
You can dig in further by exporting to CSV or Google Sheets, or by clicking through to View:
Accessing and understanding your audit data through the lens of HTML templates should make it easier for you as the SEO to diagnose issues, and importantly should make your client/developer communication a lot clearer. If this topic is dear to your heart, check out the recent webinar that I was a guest on recently, along with Areej: O/SEO/O E16: Opinions About Leveling Up Tech Audits.
Final aside: you will occasionally come across some websites where the HTML Template data is NOT all that useful. These sites are typically doing something that injects dynamic elements into each page, which makes them look different when Sitebulb parses them - on these sort of sites you'll have tons of templates with like 1 URL in each one. Not helpful at all!
Updated: Reduced database sizes
If you've been a Sitebulb user for a while, you may have noticed that crawl data can be pretty chunky. Like hundreds of GB worth of chunky.
Over time, this would cause your computer to swell uncontrollably, like your body's response to your lack of self-restraint every Christmas Day.
Sitebulb audits that you run on v7 onwards will now be impressively svelte in comparison - we're talking before-and-after pictures that are 20% of their former size!
Please consider this upgrade like the 'New Year New You' programme you follow every January - where for two weeks you treat your body like a temple - this does not turn back time and erase the months of sheer gluttony that proceeded it. Similarly, if you have old Sitebulb database files, they will retain their unwieldy stature.
Your new Sitebulb audits on the other hand have just opened an Instagram account, and in 3-6 months you can expect to see Facebook ads for their new fitness program.
Updated: Unique links
The launch of Sitebulb Cloud has forced us to reassess some of the data decisions we have made in the past, and make provisions for crawling bigger websites.
The area where this was most apparent was with internal links, as it has the biggest potential to blow up.
Take a site like sitebulb.com – we have about 500 HTML pages, and about 40,000 internal links – so on average, each page has about 80 links point at it. At most, maybe a thousand.
When looking at a site like this, it is very reasonable to want to look at every single link, and the scale of the data means this is feasible to do – either within Sitebulb's user interface (the 'Link Explorer') or as an export in Excel/Sheets.
Considering the average website is about 10,000 pages, with a similar link ratio that would be 800,000 – 1 million links. You can still, just about, wrangle this in spreadsheet format (although you are really knocking on the ceiling of Excel's limits), and Sitebulb's Link Explorer can handle it no problem.
But with Sitebulb Cloud, we're dealing with sites that have millions of URLs, and oftentimes, hundreds of millions of internal links. How do you look at THAT in a spreadsheet?
The answer is, you don't. Once you start to reach a certain scale, pairwise analysis of links just starts to break down. Spreadsheets aren't built to deal with 220 million rows of link data, and neither is your brain.
The granularity simply becomes redundant. Say you crawl a 5 million URL site, and every page links to the homepage in the header. This then corresponds to 5 million rows of data in your link table. I mean it is literally the same thing, 5 million times in a row. That is not useful data.
(Re-)introducing: unique links
The solution is to stop looking at every link as a pairwise relation, and instead group like links together.
Those 5 million links above all link to the same page, from the same location in the HTML, using the same anchor text. It is 1 unique link, that occurs 5 million times.
Unique links are actually not a new notion in Sitebulb, we have had them for years, but we've never given them the focus they now have or set any rules in place for when they should become the default.
So this is what's changed. If we find over 2.5 million links on the website, Sitebulb will stop showing 'All links' as options you can click on, and instead default to showing unique links.
Clicking through to any of those Unique Links values will drop you on the list of unique links in the Link Explorer;
As you can see, scrolling across shows you the important data points you need to make decisions about these links:
- Target URL
- Anchor Text
- Example Linking URL
- Number of Linking URLs
If you want/need to explore all links pointing at a particular page, you can use the 'All Links to Target' button.
Added: Additional advanced settings
In the same vein as the above, we have added some other options to make it quicker and easier to do very large audits. By default, Sitebulb's processing stage will tackle some jobs that may not really be necessary for enormous sites, and can add hours to the processing time;
Pre-build exports
During the report building phase, Sitebulb will pre-generate lots of the CSV export files, so they are instantly available when the audit is complete, including all the hint exports. On sites with millions of URL, this can take a few hours. Considering you might not actually use a lot of this data, this is time you could trim off your audit by clicking this button in the 'Data Exports' advanced settings:
Disable URL Rank calculation
URL Rank is our internal link popularity metric, based on the number of incoming internal links, relative to other pages on the same site. It is a useful metric for determining how powerful or important internal pages are. However, to calculate it we run an iterative formula, not dissimilar to PageRank, which can take a long time when you are dealing with millions of URLs.
It will always be on by default (as long as 'Link Analysis' is checked), but you can disable it in the Advanced Settings for Search Engine Optimization:
You may wish to do this if you are crawling a very big site but you are not interested in looking at page importance data.
Sitebulb Server Update
We are no longer publicly selling the 'DIY' server license, however existing customers can grab the latest installers from the Server Release Notes page here.
Bugs
- We had some sort of Schrödinger's Cat scenario going on - Sitebulb was generating different word counts when 'Readability' was turned on and when it was turned off. Now I don't know about you, but I always thought that there would be the same number of words on a page whether or not I was reading them?
- If you did a content search, then added some columns into the URL List and tried to export it, Sitebulb would ignore all the columns you added, as if your opinion was not important. Regardless of the accuracy of this supposition, Sitebulb should probably not having been doing this.
- Insight metrics were being duplicated, for any project with two or more audits. You'd only see this if you hovered over the little sparkline graphs - it would say 'July 10th 1203 URLs' and then the next dot would be 'July 10th 1203 URLs'.
- In the structured data section, the option to use Advanced Filtering for URLs in "Property Explorer" was missing.
- Running the 'All Hints' export would hit an API error if you had Accessibility turned on. This was because some of the Hint names are very long (and in my opinion, far too long, such that they are significantly LESS accessible for people who aren't very good at reading). This was creating Google Sheets with ridiculously long filenames, which was making Google Sheets have a meltdown. We are now truncating the Sheet filenames before they get uploaded, because truncating them as they come in is beyond the scope of an organization such as Google.
- In the keywords report, it was not possible to search for specific keywords (it is now).
Archives
Access the archives of Sitebulb's Release Notes, to explore the development of this precocious young upstart: