Released on 29th July 2021
We have an unwarranted reputation as 'Mac haters'. Truly, I do not know where such propaganda stems from, as we have nothing but love for Macs, and I challenge any individual to present evidence of a contradictory position.
Anyway, despite our predilection for all things Mac, in our last update we managed to introduce a bug which meant that some websites would not crawl on Macs, when they would crawl on Windows. This update includes a fix for these crawling issues, so if you are an unfortunate Mac user, please make sure to update!
Added a veritable panoply of structured data updates;
- Added an error to priceRange when it is greater than 100 characters.
- Removed @id from LocalBusiness
- Added SeekToAction to VideoObject
- Added directApply added to JobPosting
- Added funder added to Dataset
- Updated to Schema.org version 13
- Added gtin to Product
- Added BackOrder to Product availability
When working on these updates, Gareth sent a message to our group WhatsApp, asking me to add a Jira ticket whenever I add a new structured data alert to our change history page.
My response was typically professional and supportive:
- Aforementioned Mac crawling issue was fixed, and to not mention it here would be to deny my completist tendencies, which I am not wont to do.
- The 'All Hints' report was not working in conjunction with our new 'pick and mix' audit options, and would only show if 'Search Engine Optimization' was ticked. It will now work whatever selection you choose. Personally I like a mix of the fizzy ones and white mice. Don't judge me.
- When connecting Google Search Console, Sitebulb was having issues recognising domain properties.
- Sitebulb was incorrectly flagging pages with HTML lang as 'not having HTML lang'. When challenged on why this was happening, Gareth's response was 'I had an IF statement the wrong way around.' Which, if you ask me, is kinda worrying...
- Adding the column 'Natural Height' to a URL List would cause an SQL error on some lists. I am not really sure why we don't just include 'Natural Height' on every URL List by default, since height related data is central to almost all SEO strategies (e.g. 'what is the natural height of Barack Obama').
- On sites with AMP equivalents, ticking AMP would make Sitebulb think that HTML pages are actually AMP pages. This would have the unfortunate consequence that Sitebulb would therefore not collect things like link data, structured data, performance data and whatnot. It would have been a more disastrous bug if we had not all concluded that AMP is basically now dead.
- When setting up a sample audit, if you went to the crawler settings, the 'depth limit' controls had mysteriously disappeared.
Released on 12th July 2021 (hotfix)
- Fixed the Hint: 'Images with missing alt text'... so apparently this disappeared in version 5, and we somehow managed to not notice it was gone. I asked Gareth what happened and he started waffling on about nulls, and 0s, 1s, and how you can't null a null and all this shit. TL;DR the Hint had stopped triggering when it should have, and now it is fixed so that it does fire again.
- Fixed an issue in the audit setup where the language setting would not let you add new languages.
- Fixed an issue with structured data which was inaccurately claiming some 'recommended' properties were actually 'required' for the Local Business Search Feature.
Released on 1st July 2021
Wouldn't it be wonderful if, just once, England could beat Germany in the knock-out stages of a major tournament?
Oh, wait hang on. I said the wrong one.
Wouldn't it be wonderful if, just once, Team Sitebulb could manage a major release without the inevitable and embarrassing 'bug fix update' mere days later?
Reaudit from side menu actually now works
Folks using the burger menu on the left hand side of the Projects list in order to 'Re-audit Website' were being treated to a green 'Start Audit' button that did literally nothing. They would press the button and Sitebulb would just sit there like a fucking lemon.
Keywords report no longer completely missing
Considering that one of the few things I actually do around here is 'chief tester', this one has to go down as one of my worst fails this week (top 5 at least I reckon).
The Keywords report was literally not there at all. We were collecting the data, but not even showing it as a menu item on the left.
'Treat subdomains as internal' was not treating subdomains as internal
This feature had literally one job! How do you fuck that up?
Exports from Link Explorer had displaced column headings
In the URL Explorers and Link Explorers we added a new 'Row' column in the UI, to make it easier to scan the data. However we were also pushing this through to the spreadsheet exports, and for some reason we'd forgotten to set the column header for Link Explorer exports. Which meant that the column headings did not line up as they should, and no longer made sense.
Considering that spreadsheets are literally a numbered list of rows and columns, we figured we could probably do without the row data on exports entirely...
Improved message for JS/CSS/images blocked by robots.txt
This is technically an improvement rather than a bug fix, but I'm so embarrassed by the farce that precedes this one that it does not deserve to be considered an 'Update'.
Now, if you crawl a site that has resource files blocked by robots.txt, Sitebulb will tell you what they are in a little notification box:
(Note that the bottom line links to a list of disallowed URLs, if and only if you tick 'Save disallowed URLs' during your audit setup).
Released on 28th June 2021
Every time we do a major version update, I forget how looong it takes. We've been working on this one for pretty much all of 2021, and it represents the biggest change to how Sitebulb actually crawls websites since we first added headless Chrome back in 2018.
Here's a quick bullet list of the changes, with jump-links to make you life easier, should you need them:
- Brand new Performance report
- Lighthouse audit across every page on your site
- Performance budgets
- New audit setup process
- Open audits in new windows
- Single Page Analysis given more of the limelight
- Re-audit without pre-audit
Brand new Performance report
Google's longstanding promise to factor Core Web Vitals data into their page experience signal has sent the industry into a tailspin over recent months, as deadline day approached and every scribe across the land penned an 'everything you need to know' guide. The inevitable adjournment was, in my opinion, further evidence that we will not see widespread change when it does indeed come to be.
With my soliloquy complete, I may now put forward our attempt at make Web Vitals data accessible at scale:
Our journey to this point was tempestuous, with many twists and turns along the way. As we went, we came to certain realisations, and determined some decisions, defining our philosophy:
- CrUX 'field' data is useless for auditing - the data is SO sparse on most websites that trying to collect it at scale is a joke. You can't make decisions based on a table full of 0s, so we do not include field data, and suggest you leave that to Google Search Console.
- Sampling 'lab' data is the way forward - there is no universe where you need to be collecting lab data for every page on your site, sampling is a much more efficient way to collect Web Vitals data.
- Data can be collected directly from Chrome - there is no need to connect to the PageSpeed Insights API or use Lighthouse (both of which require you to hit the page again), we can just collect this data directly from Chrome first time around.
This is a lot to unpack, and to some extent requires a mindset shift from the typical task-based approach to performance auditing, which typically starts with 'bulk grab CWV data for a massive list of URLs.'
As such, we have ended up with a system that allows you to do the following:
- Collect Web Vitals metrics with the Chrome Crawler as you crawl: LCP, CLS, TBT, TTI, FCP and TTFB.
- Crawl the entire site, yet set a sample % for collecting Web Vitals (recommended 10% sample).
- Run the Lighthouse ruleset (Opportunities and Diagnostics) across every single web page crawled (not just the sample).
- Apply Performance Budget calculations based on the resource files found.
I expect these boring words do not do much to inspire, so I'll show you some images instead.
The first thing you see in the new performance report is the big Performance Score at the top, followed by the green/orange/red bar underneath showing the individual URL performance scores.
The big score at the top relates to the overall average score of the website, for all URLs tested. The green/orange/red bar shows how the URLs split out individually.
These scores are based on the following thresholds:
- Good (green) - 90 to 100
- Needs improvement (orange) - 50 to 89
- Poor (red) - 0 to 49
These scores are based ONLY on the URLs that were sampled, and ONLY using Web Vitals metrics.
The scoring methodology is based on the Lighthouse Scoring Calculator, which takes Web Vitals data and translates this into a weighted average:
As you can see, the furthest right column shows the weighting, so the FCP contributes 10% to the score, whereas LCP contributes 25%, etc...
Sitebulb does the same thing - for every URL sampled, it calculates the Web Vitals scores, then applies the weighting like above to end up at a performance score for the URL.
This is just like when you see a single URL score in Lighthouse or PageSpeed Insights:
To see individual URL scores in Sitebulb's performance report, you can click through from the chart:
To end up at a list of URLs, where you can see the performance scores in the third column.
You can also click the orange Performance Data button to isolate a single URL and see all the Web Vitals metrics, along with the score:
And finally just to re-iterate, we have just explored how to dig in and look at the performance score for an individual URL, but the big performance score at the top is the average of all the URLs that were sampled:
Web Vitals lab data
On the audit overview, directly underneath the performance scores, you will see the Web Vitals charts:
It's super important to understand that this is lab data, not field data.
I know I said it at the beginning, but I want to say it again because you probably forgot. Don't beat yourself up about it - you've had a lot on recently.
We all forget marginally important information once in a while, it's fine. In fact, just the other day I forgot Gareth's surname (later on Geoff pointed out that I should just think about Reservoir Dogs if I need a reminder).
Anyways, you can dig into the Web Vitals data further, simply by clicking on the relevant pie chart segment:
This will bring you through to a URL List that contains all the URLs within the segment selected:
At this point, analysing the URLs to identify patterns in the URL structure can allow you to isolate specific HTML page templates that all share a common issue (e.g. 'Poor CLS on product pages').
Lighthouse audit across every page on your site
Of course, we have Hints as well, in the usual place:
These Hints are processed for every single URL crawled, not just the ones sampled for Web Vitals.
The Hints split into two different categories:
- Opportunities, which present optimization opportunities that can help you speed up your site and improve performance, such as reducing server response time.
- Diagnostics, which flag specific problems on your site that need to be addressed, such as pages with enormous payloads.
These reflect the same issues that Lighthouse use, and we have deliberately used the same wording so you can match the issues up when digging in further with Lighthouse.
What this effectively means is that when you run a performance audit with Sitebulb, you can run the Lighthouse ruleset against every single one of the URLs on your website.
Importantly, whereas Lighthouse will only ever give you this data for one URL at a time, Sitebulb will collate the issues so you can identify groups of URLs that have the same problems - and then dig into these groups further to find the worse offenders.
As an example, consider the Hint below, which picks out pages that have images which could use further optimization:
If I click 'View URLs', I can see the URLs which contain images that need further optimization:
The data columns highlighted above help you understand the scale of the issue on each URLs, so you can determine if you want to explore further. For instance, you can sort by the column 'Images not Efficiently Encoded' in order to highlight the worst offenders.
The other blue button, 'View Resources' shows the image files themselves which need further optimization:
This time, the highlighted columns tell you data regarding the specific image URLs themselves. This allows you to sort either by the images which will benefit most from optimization (the column 'Encoding Image Saving' shows you this) or by the images which are most prevalent across the site (the column 'No. Referencing HTML URLs' shows you this).
We held a Core Web Vitals webinar last week, with expert guests Arnout Hellemans and Billie Geena. Try as we might, we could not get Arnout to shut up about the value of utilising performance budgets.
If you've watched that and Arnout failed to convince you, I have also written some performance budgets documentation where I explain what they are and how to use them. So if you find the following a rather brief description, it is not about me being a lazy prick, and more about me trying not to write 5000 words of drivel once again.
The performance budget data is, somewhat surprisingly, located under the Performance Budget tab.
Scrolling down you will see a performance budgets chart:
And then underneath that, the same data in table format:
The first column shows the type of page resource, and the second column shows the maximum 'allowed' size of all the corresponding resources on each URL. So for example looking at the second row, there is an allowed budget of 400KB for scripts. Of all the URLs on the website, 541 were under this 400KB budget (i.e. 'passed') and 21 where over the 400KB budget (i.e. 'failed').
Now, you may be wondering at what point you are expected to give a shit about any of this, and I'll point you to the 5th row down: Images.
It doesn't matter how much important 'CWV' work you do, it just takes one intern one moment to upload one full size 5MB photo and the page is totally fucked.
I don't care if you ignore every word Arnout ever says, or print my article out to use as kindling, just make sure you check the images section of the performance budgets, which will identify any URLs with images that have a large total weight.
Let's click this '32' value to see the pages on the Sitebulb site with high image weight...
Look at the fucking state of those release notes pages!
I'll be honest, there's loads more shit I could tell you about the performance stuff, but you've put up with way too much serious shit already, so I'll just point you at the docs instead if you wish to read further:
- How to audit performance & Web Vitals
- Performance budgets (what they are and how to use them)
- URL sampling (how and why we do it)
New audit setup process
I know what you're thinking, "You fucking charlatan, this isn't a new feature it's just a different way to do the same bastard thing!"
And while I struggle to dispute this assertion, there are a couple of neat features the new setup allows.
The first is that you can now set the device type in the project setup - desktop or mobile - with mobile being the default (mobile first innit):
The second is that the audit options are now completely decoupled, which means you can pick and choose exactly which audit options you want, without being obliged to fill up your hard drive with needless data.
So you can do this sort of thing:
i.e. only collect structured data, without the SEO data which was always on before (on page, links, indexability signals, duplicate content etc...).
This means you can do really specific audits, a lot more quickly and efficiently (think about content extraction on a competitor site just to steal their price data - but you don't give a shit about their SEO or security data... switch on content extraction and turn everything else off).
Open audits in new windows
If you've ever spent your weekends comparing Sitebulb audits (don't pretend like you haven't), you'll have found this task pretty frustrating... before now.
Ctrl+click (Cmd+click on Mac) to open up audits in a new window overlay:
But don't stop with only one! You can keep opening and opening more audits, until you can't see the wood for the audits.
Single Page Analysis given more of the limelight
We've provided a 'single page analysis' tool in Sitebulb for literally years, and almost nobody knew about it. Frankly, we were getting pissed off how often we'd point it out and folks would think we added a new feature. Literally years!!
So we just moved it out of the 'Tools' dropdown menu at the top (emphasising arrows mine).
If you haven't yet tried the tool, maybe this it's about time you should.
We have also now made it available in URL Lists via the (right click) context menu. If you hit 'Single Page Analysis' from here it will open up a new window and run a 'live' Single Page Analysis on the URL in question.
Here's what that looks like:
Note that this is NOT data collected during the audit. This is Sitebulb rendering the page, parsing the HTML and compiling the data live. So if you need to verify an issue has been resolved, this is a great, quick way to do it.
Re-audit without pre-audit
Although some may prefer the flashy features above, this one is my favourite. If you want to re-audit a website using exactly the same configuration as your last audit, you can easily do this now using the dropdown option 'Start with current configuration.'
This skips the pre-audit, and will just start the new audit up instantly. Sooooo much more convenient.
487 Bugs Fixed
We have fixed so many shitty little bugs in this version I actually don't even want to count them, never mind write about them. Especially when you lot will not even give one single fuck about any of them. Also the Belgium - Portugal game is just hotting up...
...so I won't bother. But I will email you individually if you reported a bug we fixed (at some point later today or tomorrow).
Access the archives of Sitebulb's Release Notes, to explore the development of this precocious young upstart: