Sitebulb Release Rants

Transparent Release Notes for every Sitebulb update. Critically acclaimed by some people on Twitter. May contain NUTS.
Written by CEO Patrick Hathaway, who swears A LOT.

Version 8.9

Released on 11th December 2024

This is (fingers crossed) the last update of the year.

The team made me promise to make my release notes jolly and fun.

Updates

  • On Cloud, added the functionality to use 'Ctrl+click' to open an audit in a new tab, which is great for comparing audits.

Ctrl + click

You can also do it for 'View URLs' when in the comparison report, which is even better for comparing audits.

Comparing Audits click to new tab

  • On Windows desktop, added the functionality to use 'Ctrl+click' to open an audit in a new window, which is great for comparing audits. Ok ok OK! So this is not technically a new feature because you could do it before. But I bet loads of you forgot or didn't realise, so it might as well be a new feature.
  • On Mac desktop, added the functionality to use 'Cmd+click' to open an audit in a new window, which is great for comparing audits. What?!? Oh fuck off then. Fine, it's not new.

Bugs

  • Every so often we come across an issue that I like to label a 'Nigel Farage' type issue, in that they share the quality of being really fucking annoying and that the universe would be a significantly better place if they simply didn't exist. We had one of these Faragian issues crop up, where audits run on version 7.5 would not accessible in version 8.8. Grimace emoji.
  • You're a bum.
  • Someone emailed in to say that Sitebulb's Spell Checker was not reporting any spelling errors. Bizarrely, they were less enthusiastic than me about this turn of events: 'CONGRATULATIONS! You finally did it!!' Turns out there was an issue where spell checker dictionaries weren't being loaded for pages where the html lang was mixed case en-GB rather than en-gb. So we're back to 136,468 spelling errors. As you were.
  • You're a punk.
  • Sitebulb's PDF Reporting feature was not quite working as expected. The spinner would spin, you'd play guess the movie, but then the PDF would not have anything on it - literally, a completely blank white page. This is not what I intended when I put in a feature request for white-labelled PDF reports (#sorrynotsorry).
  • You're an old slut on junk.
  • Malformed links were not displaying in the link explorer properly. According to the dev team, malformed links fall outside the 'normal' order of operations. So that makes it ok then does it?
  • You scumbag.
  • Sitebulb's Looker Studio Connector was having some issues displaying some data points from the performance report. For a long while I thought this was just Looker Studio being its typical capricious self, but it turns out in this case we were actually sending the data in wrong.
  • You maggot.
  • Sitebulb was not setting meta robots directives correctly if they were too shouty (e.g. "NOINDEX, NOFOLLOW").
  • You cheap lousy faggot.
  • On Sitebulb Cloud, there was an error when trying to download a generated XML Sitemap. I remain surprised that Sitebulb customers still need to use this feature in 2024 - surely any competent developer should be able to figure this shit out?
  • Happy Christmas your arse.
  • Updated a few small issues with structured data flagging false positives on issues with Breadcrumb and Product markup.
  • I pray God it's our last.

Version 8.8.22 (hotfix)

Released on 22nd November 2024

Bugs

  • Sitebulb opens and then closes automatically. This hotfix resolves this issue, that some customers were experiencing when opening Sitebulb. 

Version 8.8

Released on 15th November 2024

Bugs

  • There was a pretty nasty bug that affected a number of users, whenever they tried to start a new audit, Sitebulb would error, with the message: [object Object]. Most users were not affected, and it literally meant that you could not start any audits, so I'm pretty sure you'd know if you came across it. If so, download and install this update post haste! 
  • TIL what 'Stack Overflow' means. It is a pretty catastrophic error that occurs when a computer program tries to use more memory than has been allocated, and is often caused by a recursive function that continuously calls itself... so many times that it eventually eats up all the memory. This error was happening on a site where Sitebulb's structured data validator was building out the schema, and hitting a type that appeared to be telling Sitebulb: 'I'm a WebPage. I'm not a WebPage, I'm a Breadcrumb. I'm not a Breadcrumb, I'm a WebPage...' and so on, descending deeper and forever deeper into it's subconscious until eventually it hit the peaceful bliss of Limbo. I found out about this on a call with James from the dev team earlier today. It was weird, come to think of it, he looked a lot older and kept saying strange things.
  • Saving 'Response vs Render' data wasn't always allowing the user to see the saved data, which is a bit embarrassing because we've been banging on all about our ridiculously awesome JavaScript Training Course, and this is a key feature to help with identifying rendering issues. Sigh. 
  • The mobile-friendly hint 'Content does not size correctly in viewport' was not firing accurately. It was causing Mark Williams-Cook to, quote, 'have some umms and arrs'. That link to his profile was for Bluesky, by the way. Because if we all just stay on it, how are we supposed to abandon that fucking prick Elon Musk's hate-fuelled dumpster fire propaganda machine shit stain of a social network? With that in mind, go follow Sitebulb on Bluesky as well :)
  • The point-and-click setup for content extraction was not working properly, leading to a vastly frustrating experience of clicking the thing and Sitebulb saying it is wrong, then clicking something else and that being wrong too, at which point you would then go and click the same thing you did in the first place (still wrong). It gave me flashbacks of a teen-age experience trying to get George Stobbart to climb over the wall behind Mac Devitt's pub in Ireland. IYKYK.
  • The standalone Structured Data Testing Tool was not working properly. This was basically the same problem we were experiencing with the Stack Overflow issue I described above, it just wasn't going to a deep enough level to cause full-on avalanche.
  • I'm an old man. Filled with regret.

Version 8.7

Released on 5th November 2024 (hotfix)

Just a few small bug fixes in this release.

Bugs

  • Fix issue with mixed cased attributes not being parsed by the HTML parser (e.g. meta name="description" Content="bollocks").
  • Fix issue with the spell checker not returning the correct dictionary if the language code didn't have a region.
  • Improve error messages in the front end.
  • Fix issue with staging site setting not removing the main disallow path from robots.
  • Fix content search issue with duplicate search terms.

Version 8.6

Released on 31st October 2024

Upgraded HTML Parser

When we do new releases (like this one) I sometimes go back to the dev team with questions about what a new function does, and this one here is a case in point. I have known that Gareth and James have been working on this for a few months, and they think it's pretty cool, but often the way the changes are described mean very little to my ignorant little brain. Sitebulb now has a new HTML parser, and this now means that instead of taking 100ms to analyse a page, it only takes 5ms.

Erm...great?

The thing I was keen to explore with them, however, is what does this actually mean for normal Sitebulb users? Will they even notice it?

And it turns out that what this actually means is that Sitebulb is much more efficient in its use of resources - you can use less to achieve more. In practical terms this means:

  • Sitebulb can crawl faster, given the same resource allocation
  • Sitebulb will now run better on shittier machines
  • Features that would traditionally hog resources will no longer do so (like Spellings)

Let's dig into the top one a little more. Most of you are probably familiar with how you can control or adjust the speed of Sitebulb's crawler via the Crawler Settings:

Speed Settings in Sitebulb

In simple terms, you can make Sitebulb crawl faster by increasing the number of threads it is able to use - this is the primary resource Sitebulb uses for crawling, and when using the Chrome Crawler this is equivalent to 'Instances of Chrome.'

Most of the time, folks keep the tickbox 'Limit URL Speed' ticked - if you untick this, Sitebulb will make full use of the threads allocated to it and just crawl as fast as it is able.

But yeah, mostly, people impose a limit to slow the crawler down, lest they accidentally make their client's webserver fall over.

However, what this can mean a lot of the time is that although 5 URLs/second is the maximum limit, the crawler will actually sit there at 2 or 3 URLs/second - taking significantly longer than anyone would like. The only way you can impact this further is to increase the number of threads.

With the new update, you need less threads to achieve the same crawl limit. 

This means it is a lot more likely that Sitebulb can crawl closer to the limit you have set, with no change in the amount of resources allocated to crawling.

Note that there are lots of aspects that impact crawl speed which are outside of Sitebulb's control, like if there's a crazy amount of code on the page, or the speed of your internet connection, or how fast the webserver responds with content following a request (TTFB). What has happened in this update is that the bit Sitebulb is responsible for is now significantly slicker.

I will contextualise this a little bit with an example;

  • Customer had raised a support ticket with us that his crawl was very slow.
  • He was using 8 instances of Chrome, but could only crawl at a maximum of 0.4 URLs/second.
  • Pages on this website have circa 22,000 lines of code, with over a million characters (so parsing this much HTML was taking a long time).
  • In the new version, this site will now crawl at 1.4 URLs/second.

This is a speed increase of 250%.

So if I go back to my original question: 'what does this actually mean for normal Sitebulb users? Will they even notice it?'

And the answer is, of course, in line with all the best answers in SEO, 'it depends.'

If you're paying attention to this sort of thing, you'll probably notice audits finishing quicker. You may see that Sitebulb can crawl faster on sites it would traditionally struggle with, and you may find your laptop is much slower at frying your eggs.

But if Sitebulb was already plenty quick enough for you before, my guess is you probably won't notice ¯\_(ツ)_/¯

New option to parse tables

Gareth has long been of the opinion that 'the internet is broken', and spends most of his life bemoaning the wretched husks of websites that Sitebulb is required to crawl, with broken, gnarled HTML that is further away from W3 compliance than Twitter is from being a place that anyone wants to spend even a second of their lives. 

James has been working with Gareth on the dev team for less than a year, and so seems much less brutalized by the HTML wildlands that Sitebulb swims in. But I think we're now seeing the seeds of contempt starting to grow.

Let me present Exhibit A, a comment James left on a recent update ticket:

"This website has an old-fashioned design, where the layout is table based. Tables are for data, not layout, so Sitebulb does not parse the HTML in tables.

I’ve added an option (that defaults to unchecked) to enable table parsing. When this is enabled, the content and links are parsed and crawled."

While we can applaud his constraint for 'old-fashioned', it is impossible to ignore 'Tables are for data, not layout', positively dripping with passive aggression. 

As a result of his work, we now have an additional option to 'Parse tables' in the crawler settings, if you ever find yourself dealing with an *ahem* old-fashioned website...

Parse Tables 

New On Page Hint: Broken or invalid HTML

Despite the introduction of our sexy new HTML parser, and despite all of our other efforts to tackle putrid HTML (see above), every so often you will find a page that is so fundamentally fucked that there is just nothing Sitebulb can do with it.

Think of these pages like those bizarre people you encounter every so often who are impossible to reason with, despite enormous gaping holes in their logic. Flat earthers, for example. Or those Donald Trump supporters who think the government control the weather.

In the world of HTML, we're talking about pages that exhibit this sort of thing:

  • no <head>
  • no <body>
  • <body> before <head>
  • <body> outside of <html>

Please note that this is not to do with W3 HTML validationBrowsers and search engine parsers have become pretty good at fixing most shitty HTML (see: 'The internet is broken'), and Sitebulb is the same. It is about really extreme cases, where the validation issues are so severe that Sitebulb's HTML parser was unable to build a meaningful HTML document from it.

The point of course being that if Sitebulb think it looks totally fucked, then visitors and search engines probably will as well.

It's a Critical Hint, so ignore it at your peril.

Structured data: report broken JSON

Sitebulb has far and away the best structured data reporting of any crawler tool on the market (even the big fancy ones that you can only afford by selling your own bodily organs on the black market). It's bloody brilliant at telling you what markup errors you have, and aggregating them for easy analysis. However one thing it did not tell you previously was when the JSON itself was broken.

Now it is very clearly flagged in the aggregated issues list:

JSON Script error

And in the URL Details view for a single page:

JSON Script Broken

No longer checking social domains

One of the big changes we have seen over the last few years is the prevalence of 'anti-scrape' technology on websites that intentionally block automated crawl software like Sitebulb. This can be solved on websites you own (or if you are working with the owners) as you can request whitelisting of an IP address/user-agent/custom header.

This is not so straightforward when it comes to external links, of which most websites contain a sizable bunch. Sitebulb does not actually technically 'crawl' these pages - it doesn't download and parse the HTML - but it does fire off a HTTP request to check the status. Since you have no control over these websites, they can't be whitelisted, which will often lead to plenty of external links being marked as 'forbidden' or some sort of crawl error.

We'd noticed this happen to an extreme extent on social networks, where this sort of thing would be the norm:

Social Links Forbidden

Endlessly pinging these websites is a pointless waste of time, and gives you no useful data you can do anything with. So now, Sitebulb will no longer check the HTTP status on any social network link.

The important thing here is to realize that when you re-audit a project it will look like Sitebulb has found way less URLs and that something catastrophic has happened:

External URLs Dropped

Fear not, gentlefriends, we have simply stopped expecting anything other than an obnoxious rebuttal from Mr Musk, and all those other power-crazy arseholes who want to control how we think.

Bug Fixes

  • We have very unfortunately needed to remove a small feature from Sitebulb - the ability to press a button and export a chart in different formats (PNG, JPEG, etc...). If you have tried this recently you will have seen a message like 'Sorry, you have been blocked' which would take over the app and force you to quit it. The problem unfortunately is that since we have upgraded Electron (which powers Sitebulb's UI) to the latest version, it no longer supports exports - so our only solution is to remove that export option altogether. This sucks, and we're sorry to have to take something away that folks were used to using. As a replacement, I would suggest a snipping tool such as Techsmith Capture, Snagit or ScreenRec.
  • On one particular site that was reported to us, Sitebulb's old HTML parser was struggling to parse the <head>, which meant that it was breaking when being parsed. This would trigger several Hints that were inaccurate: 'Canonical outside of head', 'missing hreflang' and 'meta description outside of head' - essentially, all the things that Sitebulb would have found in the <head>, had it not been broken. Happily, all these issues disappeared when we moved to the new and improved parser (see above).
  • The Hint 'Canonical points to homepage' was also including the homepage itself in the list of URLs affected. I mean, sure, a self-referencing canonical on the homepage will indeed point to the homepage. But it's not exactly a fucking problem is it?
  • Sitebulb was giving a count of URLs for the Hint: 'URL is orphaned', but then when you clicked through to try and look at the URLs, they would not be there. This was not some sort of Schrödinger's Cat type thought experiment designed to tease the mind - we just fucked it up.
  • There was a very weird problem where certain bookmark URLs were being reported as 'found in XML sitemaps', even when they weren't actually present in the sitemap.
  • On the Audit overview we have a 'HTML URL Sources' chart, with green/orange/red columns to indicate the presence or absence of URLs found for a particular source. If you clicked the orange bars, they would lead to empty URL lists, whereas the green and red bars worked as advertised. It turns out that Sitebulb is actually orange-blue colourblind, and was trying to show the column that corresponds to 'blue', which of course does not exist. As a quick fix we told Sitebulb to just look for 'blue' instead, and now it happily returns the values for orange.
  • We fixed the capitalization and spacing of the filter dropdown options for 'Crawl Source' (used to look like this, e.g. 'crawlerViaGoogleAnalytics'). The content team actually pushed this change through, arguing vehemently that Capitalization and correctspacing is important.

Version 8.5

Released on 1st October 2024

Over the weekend we celebrated Sitebulb's 7th birthday - happy birthday Sitebulb! This is actually 15 and a half in software years.

Bug Fixes

  • Sitebulb was experiencing a level of cognitive dissonance when trying trying to comprehend settings that included both 'limit the crawl to X' and 'crawl all the URLs on my sitemaps please'. After reading up on psychology, Sitebulb has resolved the initial problem, but unfortunately has now started behaving with extreme hostility towards Gareth.
  • A customer reported that some pages on his site were showing as having minus 3 words. After looking into it, we realised it is not technically possible to have minus words - though we all acknowledged that some words are less good than other words. For example, we can all agree that 'dickbag' is indisputably a better word than 'mousemat.' But they both still count as a single word each, if counting the number of words is what we are supposed to be doing (which it is).
  • Sitebulb was not correctly handling the .page file extension (e.g. website.com/blog/blogpost.page). One might suggest that adding .page to the end of every single URL on the website is perhaps a tad unnecessary, but who are mere SEOs to question almighty developers?
  • Viewing unique links on a site with 30 million links was taking a long time to open up in the Link Explorer, apparently because Sitebulb was trying to count all the links one by one. Now it counts all the links at once. "246 total toothpicks".
  • If you added a list of terms in Content Search and accidentally left a line in that just contained blank space, Sitebulb would (accurately) tell you that shitloads of pages were identified with said blank space on them - when it would be much more helpful to just ignore the blank space, which is what it does now. When we questioned why Sitebulb did it in the first place, it flicked it's hair back and pouted: 'Cause darling, I'm a nightmare dressed like a daydream'. Teenagers... 
  • We have deprecated the tabnapping Hint from the Security audit, as this issue is now fixed in modern browsers.
  • In v8 we'd been experiencing issues on Sitebulb Cloud when trying to add or re-authorize Google accounts when connected to the cloud server through the desktop install, but we've resolved that issue now and it works as expected.
  • We realised that the URL Inspection API had become a lot slower than when we originally implemented it, which was causing some issues always returning the full set of results. We have slowed down the requests to ensure that Sitebulb stays under the rate limit, and also added an 'expected time' to the audit progress screen, just so that you know that it's the API that is slow and don't incorrectly assume that Sitebulb is just shit.

Version 8.4

Released on 5th September 2024

This update revolves around two big issues:

  1. Some Mac users were unable to open audits properly
  2. Stability issues running the Chrome Crawler on some websites

Bug Fixes

  • Some Mac users have been seeing the message 'Audit Failed and no URLs were crawled' when trying to view completed audits. Although we often jest that Mac users should upgrade to Windows, this was not an issue we took lightly, and we believe we finally have it cracked. If you have experienced anything like this issue, please upgrade ASAP.
  • Certain hints were showing a mismatch in values between the 'URL Count' and what actually displayed in the URL List. These issues were actually caused by the problems below, but I am flagging this separately as this is one of the things most obvious/visible in the audit results.
  • We've been experiencing a lot of random issues with the Chrome Crawler, that have caused certain audits to fail halfway through, freeze up, or not start at all. It hasn't been affecting all users or all websites, but for those that have it has been very frustrating. We have upgraded the version of Chrome we are using to 128.0.6613.119, and this seems to be completely stable now.
  • We also found that Sitebulb was tripping up on some sites that make heavy use of the ShadowDOM. Upon investigation we found that our implementation of hydrating the ShadowDOM was actually breaking the HTML DOM on some sites. This meant that the HTML was sometimes recorded as being empty, broken or incomplete.
  • We also had a number of issues occurring due to iframe rendering, and our Chrome Crawler was not always downloading the iframe src URI, which was causing the Chrome Crawler to become very slow and often hang or timeout completely. We found certain cases when the response HTML was being returned as the rendered HTML, which meant our hints were not correct. The underlying reasons that were causing the iframe issues (which we have now fixed) include the following:
    • User firewalls blocking URLs
    • Blocked by x-frame-options
    • Sitebulb being blocked by WAFs
    • The Chrome Crawler refusing to download the URL content, due to the URL being slow or erroring
  • Improved some of the metrics for image alt text metrics and related hints.
  • Improved the 'Add dimensions to images' hint check to align with PageSpeed Insights.
  • Fixed issues with Accessibility hints to ensure they always show the correct 'Hint Details' data (we had an issue where the live rendering was not inheriting the same settings as the audit itself).
  • On Sitebulb Cloud, we fixed a few more issues we had seen with the Looker Studio Connector, and added the option to enable Looker Studio for all projects on cloud (in the Server Settings -> Looker Studio tab).
  • On Sitebulb Cloud, exporting Google Sheets data while connected to the cloud server via the desktop application was throwing an error.

Version 8.3

Released on 21st August 2024

Due to Patrick having a well earned week off with his young family, Sitebulb v8.2 was released last week (which introduced some new things and resolved a lot of niggling issues) but unfortunately led to a couple of follow-on issues that of course would never have been missed under the usual circumstances.

By the way, feel free to share on socialz if you do spot the film references and Shakespeare quotes in the v8.0 notes. We'll give you a retweet or a Mars bar or something.

Bug Fixes

  • In v8.2 we had an issue with Google Sheets, which was resolved, but inadvertently led to another Sheets issue - which meant that previous audit data was replaced with 0s and apparently from audits done on January 1st 2001. My spidey sense was tinging when first seeing this, as I was temporarily convinced it must be the Millennium Bug we've all been waiting multiple decades for... before realising it was a year too late for that. Some sort of ghost in the machine, instead.
  • The issue above also caused similar problems for the Looker Studio Connector (which is only available on Sitebulb Cloud), and some authorisation quirks which were equally annoying. This is all resolved now.
  • Fixed an issue where exporting results content extraction would take literally forever. Well not literally forever, but just a really really really long time.

Version 8.2 

Released on 16th August 2024

Due to Patrick taking his annual six-week summer holiday, the apprentice has been given the opportunity to knock up some release notes. 

There will be no references to films that no one has seen, football, or Shakespeare. None of Patrick’s short stories or poems. No bashing Gareth. And, you won’t be bored by the end (well, no promises).

Yes, we released version 8 just a few weeks ago… and here’s another bloody release! After version 8, we were all hoping the dev team could take a breather. Of course that didn’t happen. 

Fuelled by coffee and ADHD meds, the dev team has been chasing bugs non-stop but also somehow managed to bring you some new Hints and improved features. They JUST.CAN’T.STOP.

The big bug fix - attempting the Scandi flick (again)

After releasing version 8, we found a Mac operating system encoding issue that was affecting all Scandinavian users and some other non-English language users. This led to incorrect or missing audit data, and in some cases, the audit would run to the end but report that no URLs crawled😬. 

The fix we implemented in 8.1 revealed new errors once it was out there. 

This is our second and hopefully final attempt at fixing this issue. Have we mastered the Scandinavian flick? (you can thank the car-obsessed dev team for the driving trick references - vroom vroom).

New and Better Things

We’re SO keen to get this update out there. The support team has yet to produce documentation for the new Hints and features below. But we promise it’s coming soon!

Two New Hints for Canonical Links

Our friends at Merj recently released a study that analyzes how extra HTML attributes in canonical tags impact search engines. The takeaway? Google ignores Canonical Link elements containing hreflang, lang, media, or type attributes. Gasp!

Well, you know it’s a good one because the dev team turned around, and in less than 24 hours, they added two new Hints: 

  • Canonical Link element contains invalid HTML attributes (Critical Issue) - flags up URLs with Canonical Link elements containing HTML attributes that will cause the canonical annotation to be ignored.
  • Canonical Link element has Superfluous HTML attributes (Potential Issue) - flags up URLs with Canonical Link elements containing anything other than rel=’canonical’ and href=’URL’ attributes.

Independent external link analysis and subdomain analysis settings

In our endeavour to make things as simple and straightforward as possible for you all, we sometimes make assumptions (and mistakes) about what SEOs want.

For the longest time, our External Link Analysis and Subdomains settings went hand-in-hand, assuming that you either wanted to analyze all external URLs (including subdomains) or none.

Then some of you pointed out that you may want to, say, crawl external links while excluding all subdomains, or crawl your shop subdomain while excluding every other external link.

So you now have the option of configuring External Link Analysis and Subdomains completely separately. 🎉

What you need to know: the default setting for subdomains is ‘Check HTTP Status’.

That means that when you disable the ‘External Link Analysis’, subdomains status will still be checked unless you go into Subdomain Options and choose to ‘Exclude All’ subdomains as well. 

Exclude all subdomains

Cleaned up the Include and Exclude URLs setup tab

The Exclude URLs tab was getting too long, and confusing. So we have renamed it and split the page up into tabs to make it easier and more intuitive to set up your include and exclude rules.

External Domain Exclusions at the audit level 

As you couldn’t do this at an audit level, users were doing it within Sitebulb’s global settings. These exclusions were then applied to all Projects, which wasn’t something everyone wanted and made it easier to make mistakes.

You can now also configure external link exclusions at the audit settings level in the new and improved Include and Exclude URLs setup tab.

What you need to know: There are two boxes in this tab, one for Links and one for Resources. Add your domains list to the corresponding box depending on what you want to exclude. Add the domains to both boxes to exclude both links and resources from the specified external domains in your audit. 

Include and Exclude URLs tab

Bug Fixes

  • Geoff Kennedy reported the ‘Multiple, mismatched canonical tags' Hint triggering across a majority of the URLs on one site. Sitebulb was getting confused when the website used both HTTP header and canonical link tags, even when they were the same link.
  • When an XML Sitemap Index was added as a URL crawl source, Sitebulb was reporting 0 URLs in sitemaps and in some cases, returning a 403. Individual XML Sitemap files were working as expected.
  • For a brief moment in time, the automatic Sheets Exports feature was not working as expected. You could still send Historical Audit data to Google Sheets manually. Thanks to Dave Turney for reporting this bug.
  • For some VPN and Proxy users, Chrome would fail to download, leaving Sitebulb stuck on the ‘Downloading and installing the latest Chrome Crawler’ tab.

For the bug fixes that may have impacted audit data, you will see the benefits once you run a new audit within existing Projects in version 8.2.

That’s it, folks. Fingers crossed that I don’t get fired once Patrick finds out I have touched his precious release notes. For those of you missing him right now, don’t panic! He’ll be back for the next round of regularly scheduled bug fixes and feature announcements.

I'll see you all in the support inbox! 

Miruna

Version 8.1

Released on 6th August 2024

You knew it was coming. The #bugfixupdate following the big update. Sigh.

Bugs

  • It turns out our software testing protocols need updating. Next time around, we need to fly to Norway and borrow a Mac from a Norwegian fella who has his Mac set up (presumably) like most Norwegian fellas do, with 'Norway' set as the region and Norwegian set as the language. We then need to install Sitebulb on said Norwegianly-configured Mac and make sure the fucking thing actually works.
  • As above, but also for Sweden, Denmark, Finland and Poland...
  • The Chrome Crawler was occasionally failing to start up properly. We only saw 2 reports of this in our error logs, but that's 2 too many! We've now made Chrome more resilient, to avoid this happening to 3 people next time.
  • Sitebulb was incorrectly classifying certain URLs as redirecting back to themselves. The dev notes for the fix on this ticket say: 'the location header was not lowercase when it should have been.' Even one such as I would not make so bAsic an error.
  • When trying to change settings though 'Start and configure audit,' only the desktop user-agents were showing in the user-agent dropdown, even for mobile projects. Not gonna lie this one is a bit embarrassing.
  • When certain URLs had been disallowed by robots.txt, Sitebulb was reporting an incorrect robots directive as the directive that was responsible for the disallowment (disallowance? Who knows).
  • Sitebulb was not showing accessibility violations for the Hint: 'Form <input> elements must have labels.'

Version 8.0

Released on 31st July 2024

Note that this update contains big infrastructure changes: BEFORE you upgrade to v8 please make sure to let any currently running audits finish (DO NOT pause -> upgrade -> resume)

I have an apology to make.

My original thesis, when writing our release notes, was to avoid the interminable updates you see from other tool providers, which typically go like this: 'Bug fixes and performance improvements.'

Today, alas, I have failed you all.

Version 8 Launched

Codename: Bug Fixes & Performance Improvements

Gareth and I made a commitment towards the end of 2023 that we wanted to grow the Sitebulb team.

We hired, in chronological order:

  • Kevin Letchford, to build all the cool things on our website
  • James Crawshaw, to work in the dev team alongside Gareth
  • Jojo Furnival, to build our content marketing machine
  • Miruna Hadu, to revolutionize our customer support and success process

Since they've come on board, we've managed to launch or improve tons of cool things... but none of these have been to do with the product.

Version 8 has been hanging over us all for the entire year. Product planning meetings have started and ended with 'Finish v8'. It has been '99% done' for at least two months, as we continually find more annoying little things that we can't ignore. The other day Gareth compared it to painting the Forth bridge, and let us set aside the fact that they have actually finished painting this now, I'm sure you get the idea.

And the payoff for all this hard work? Remarkably little.

You, dear customer, will not really notice any difference. We haven't got any fancy whozits or whatzits to show you... it pretty much looks the same.

That has been entirely the point though. Sitebulb software is built in .NET, and we have been running version 4.8 of the software pretty much since launch. The job that Gareth and James have just completed was an upgrade to version 8. This is a move from '.NET Framework' to '.NET Core', which it like moving standard definition TV to one with UHD - it can just do so much more.

We'll be able to do a shitload more new cool stuff in the future now we are on .NET Core, but we still needed Sitebulb to be able to do all the stuff it's been doing incredibly well up to this point.

So that's what we've been doing: making Sitebulb exactly the same as it was before.

It may look the same, but the architecture under the hood has completely changed. One benefit of this change is that we can now produce an 'ARM' installer for Mac, that runs on M1/M2/M3 architecture natively. If you're unlucky enough to be a Mac user, this will probably mean something to you (to be fair, on my M1 MacBook Air that I am forced to have for testing, it is hella fast).

Check the new installers on the left hand side, and please for the love of God choose the right one.

By the way, one of the side effects of this complete and utter rebuild is that once you install version 8, the old shortcut in your dock will stop working, and will just show a rather confused looking question mark:

Sitebulb Docked on Mac

Just delete this and forge forwards with the new shortcut. Alternatively, splash out on a new Windows machine and you can avoid this sort of hassle.

In general, Sitebulb is more rapid, the UI is snappier and it is smoother and quicker to use. The crawler is more efficient, so it can crawl more quickly if you let it (i.e. increase or lift the crawl speed limits). 

Version numbers added to filenames

I mean I warned you at the beginning, I've not got a lot to work with here in terms of features.

Downloading Sitebulb files no longer results in this bullshit:

Never ending Sitebulb installers

I mean this is a fucking win, right?

Upgraded Accessibility checks to support WCAG 2.2 

We've updated our Accessibility checks to include support for the latest standard, WCAG 2.2.

WCAG 2.2

Although it doesn't help with search engine optimization, accessibility is becoming increasingly important from both a usability and compliance perspective. From October 2024, all UK Government public sector websites and mobile apps must meet the WCAG 2.2 AA guidelines.

Sitebulb has you covered up to WCAG 2.2 Level AAA.

Custom headers added to domain check

It feels a bit like custom headers are the new hotness. We had support for it from years ago but I never really encountered folks using them much, but it seems to come up all the time now.

We have added custom header fields now on the 'New Project' page and throughout the pre-audit, and also as optional config fields on the Single Page Analysis.

Custom Headers

Previously you were only able to add them on the audit setup page, which in some cases was too late.

Show full user-agent string in audit details

Sometimes we make improvements to the software because we think 'this will make the software better.'

And sometimes we do them just because customers whinge at us for long enough about it.

I'll leave it up to you to decide which inspired this change.

Full User Agent String 

Removed Universal Google Analytics

I try to say goodbye and I choke.

Much as it pains me to say it, Universal GA is gone, we all need to move on. Sitebulb is no different.

It's over.

We completely removed it from the codebase, lit it with a match and dropped it in a bucket, alongside all the letters, polaroids and old movie stubs.

If you've got any projects that still use Universal Analytics (e.g. you had a year's grace because you pay the big bucks for Analytics 360), you'll see this message, and you'll need to update the project settings to use GA4.

Universal Analytics

I try to walk away and I stumble.

Bug Fixes

  • When you would crawl a URL List and no other sources, Sitebulb would say 'OMG those poor pages are orphans!' I mean, WTAF, Sitebulb? They are literally just lining up differently, it doesn't mean that they don't have parents. It just means that Sitebulb didn't crawl the parents, because it didn't actually crawl anything - it just worked through the list. We are no longer calling them orphan pages now. We call them bastard pages instead.
  • Kristine Schachinger spotted an issue with how Sitebulb was reporting certain subdomains that was so flat out EMBARRASSING that I was adamant it could not be true, until I tested it for myself... Sitebulb was claiming that 'xyzexample.com' was a subdomain of 'example.com'. I told you it was embarrassing.
  • Bookmark links were being counted and shown as internal links, which is I suppose sort of accurate in one sense, but entirely not what people mean when they talk about internal links in terms of SEO, so we needed to fix it. Berian Reed spotted this one, as these links were being reported on orphan pages (I mean proper orphan pages this time... not bastard pages).
  • Berian ALSO spotted an issue where doing advanced filters on the URL Explorer and exporting to an XML Sitemap would not apply the filter (i.e. URLs that should have been filtered out were still there). We decided to look past the fact that manually building XML Sitemaps in 2024 is absolutely batshit, and instead just fixed the issue. Let it be noted on the record that we can, on occasion, be the bigger person.
  • Code Coverage was not working properly on Webflow websites. Of course this bug was reported by Arnout, who is worryingly obsessed with this feature (read him banging on about it again here).
  • Mark Williams-Cook managed to find a site which Sitebulb would literally download files as it crawled the website. We were baffled initially, until Gareth spent a week digging himself, and then promptly falling down, a very large rabbit hole. Apparently Sitebulb was encountering URLs that looked like HTML, but were actually download files (e.g. PDF), where the server was sending back a 'Content-Disposition HTTP Header'. We'd not seen this before, and perhaps won't again, but it will no longer fill up your 'Downloads' folder Mark - good spot!
  • There was a weird bug where Sitebulb would not let you add/remove columns properly in the keywords report - when you added or removed a column, all the other columns would disappear! I checked with Gareth, but he told me it wasn't supposed to do this.
  • We added the 'logging' tab back into global settings. We had it previously, then thought we didn't need it in v7.5 so took it out. Turns out we were completely wrong about this decision, so we have added it back in now. Move along, nothing to see here.
  • It is regularly pointed out to me that since my job these days is mostly just being obnoxious to customers and laughing at my own jokes, I may have wasted 4 years of my life studying for a maths degree. On the contrary, it is thanks to this training that I was able to recognize an issue where a hint was firing for 161% of URLs (it is not possible for this to be more than 100%). The inconvenient truth that a user actually spotted the issue does not fit my narrative, so we should not dwell on that aspect.
  • On certain webpages, JavaScript would not render correctly when page resources were disabled. This disturbed our users when looking at the Response vs Render report, which would light up like a Christmas tree (if said Christmas tree was covered in red decorations and not really any others). The h1 in the rendered HTML would read 'Something went wrong', reflecting the fact that something had gone wrong. Now that we've resolved the issue, something has not gone wrong.
  • The user-agent was not always saving correctly, for a few user-agents, which was really bloody annoying.
  • Sitebulb would not flag ANY hreflang on a website if hreflang annotations were missing self-referencing annotations. Since self-referencing annotations are a recommendation rather than a requirement, I personally found it rather surprising that we had not discovered this issue sooner.
  • Sitebulb was not showing the correct number of internal links on the URL Details page. In one example, the total links were listed as 2501, yet the page was only showing 10 links. Once again I used my ninja maths skills to figure out that this was wrong.
  • Our word counts were the wrong way around; total words, template words and content words did not add up correctly. Following a thorough investigation, I realized it should have been total words = template words + content words. I know which phrase you are thinking of, and that it rhymes with 'ginger baths' (Note: if you don't get this, it's probably not because you are thick, and probably just because you are not northern enough).
  • Sitebulb Cloud users only: Intercom now recognizes the logged in user. Previously it would assign all conversations to the account owner, who for the sake of argument, is called Caroline. In said argument, the chat would go as follows:
    Caroline: I have a problem with Sitebulb, can I have some help?
    Sitebulb Support: Of course Caroline, what seems to be up?
    Caroline: My name is not actually Caroline
    Sitebulb Support: Erm... right. I don't know if this is a problem we can help you with. Deed poll?
    Caroline: Caroline has left the chat
  • The Performance Hint: 'Add dimensions to images' was misbehaving; it was refusing to show the blasted image URLs when you clicked on the URL Details. Instead, it would show a table with headings and precisely zero rows of data (which is not very fucking useful).
  • The Performance Hint: 'Defer Offscreen Images' was apparently misbehaving, at least when compared to PageSpeed Insights. Turns out the crafty buggers at PSI had updated the code used to run the rule and not bloody told anyone - they were ignoring wastage for lazy and 'eager' loading, which we weren't doing.

So there you have it folks, bug fixes and performance improvements, with nary a feature in sight.

And if perchance I have offended
Think but this and all is mended:
You'd as well be five minutes back in time,
For all the chance you'd miss this line

Archives

Access the archives of Sitebulb's Release Notes, to explore the development of this precocious young upstart:

Sitebulb Desktop

Find, fix and communicate technical issues with easy visuals, in-depth insights, & prioritized recommendations across 300+ SEO issues.

  • Ideal for SEO professionals, consultants & marketing agencies.

Sitebulb Cloud

Get all the capability of Sitebulb Desktop, accessible via your web browser. Crawl at scale without project, crawl credit, or machine limits.

  • Perfect for collaboration, remote teams & extreme scale.