Your 2025 crawl data. Your crawl personality. Introducing Sitebulb Wrapped! Sitebulb Wrapped 2025! Find out more

Sitebulb Release Rants

Transparent and sweary Release Notes for every Sitebulb update. Critically acclaimed by some people on Twitter. Written by CEO Patrick Hathaway.

Reader discretion is advised. 

parental-advisory.svg

Version 9.2

Released on 3rd December 2025

New Feature: Lost and Found

Excited to bring you a new feature today that really lifts the lid on granular changes between crawls, which we call 'Lost and Found' - it's available on all plans for both desktop and cloud.

In our house, 'Lost and Found' was a book we'd read to the kids when they were (ahem: quite a bit...) younger, about a boy who found a penguin and then took him all the way back to the south pole, only to then realise he was sad without his friend.

Our incarnation of Lost and Found is almost identical to this heartwarming tale, except it replaces the lost penguin with Lost URLs, and the found penguin with Found URLs. It's really profound if you think about it.

In line with the feature launch, we have a special feature spotlight webinar running tomorrow (Thursday 4th December) with my awesome colleague Miruna Hadu - sign up here

Lost & Found Spotlight

Lost and Found is an evolution of the crawl-on-crawl change history that has been present in Sitebulb for years, which allows you to actually explore what has changed.

I'll explain with an example, and you can see the new functionality in action. Since it is a comparative metric, Lost and Found only appears once you have at least two audits within a project, just like the current change history. You'll first see it in the Audit Overview.

Audit Overview - Lost and Found

As you can see, Sitebulb found a total of 974 more pages on this website than when the crawl was last run. You may naturally assume that this means 974 more pages have been added to the website since then, but that isn't close to the full picture.

If you look at the bottom right of the image above, you can see that the total URLs found in this crawl is 4,837, which break down as follows:

  • 3,855 'Remained' - these are URLs that were present in the last crawl, and were found again this time around

  • 982 'Found' - these are new URLs that were not found in the last crawl

  • 8 'Lost' - these are URLs that are no longer present on the website

So the change of 974 is a combination of the 982 newly discovered URLs and the 8 that no longer exist.

As always with Sitebulb data, if you click the linked numbers in the table then this will open up a corresponding list of URLs:

Found URLs

What is awesome about this functionality is that it exists natively in the app alongside existing audit data - you don't need to go off to a compare feature or separate reports to find it, and it isn't something you need to 'switch on' or enable.

But there's more.

Lost and Found at an audit level is useful to determine flux on a website, but it doesn't help you understand changes in underlying technical issues that affect the website. That's why we also embedded Lost and Found into the Hint system.

Navigate to the Hints section of any report, and you'll see a new tab for Lost and Found:

Indexability Hints - Lost and Found

This shows you the granular changes from one audit to another at the Hint level, which allows you to more accurately understand what is happening on the website. Did you fix one issue but introduce another, for instance?

When it comes to Lost and Found at the Hint level, we also have two new definitions: 'Regressed' and 'Fixed'. 'Regressed' basically means that this is not a new URL, it just got worse! It now exhibits the problem when it did not on the previous crawl (AKA you done fucked something up).

'Fixed' is the opposite - it also is not a new URL, but this time it got better, and no longer exhibits the problem (Yes, you do deserve that raise).

Remained/Lost/Found essentially work in the same way they do at the audit level, but specifically in relation to triggering the Hint.

So the definitions for these are as follows;

  • Remained: URLs trigger the Hint in both the previous and current audit.

  • Found: URLs trigger the Hint in the current audit, but were not present in the previous audit.

  • Lost: URLs that are not present in the current audit, but were found and triggered the Hint in the previous audit.

  • Regressed: URLs that trigger the Hint in the current audit, but were found and did not trigger the Hint in the previous audit.

  • Fixed: URLs that previously triggered the Hint. They were found in this audit, but did not trigger the Hint.

It's a little nuanced, but does make sense when you think about it. 

What this gives you is a laser-focus when digging into issues you already know about from the last time you crawled the site. You can explore things like 'which pages now trigger the Hint that didn't before?' (AKA 'Regressed'):

Regressed Hints

This removes a layer of opaque-ness (is that a word?) that otherwise exists within an aggregated metric like 'Change history', and gives you the tools to really understand what's going on with your website.

To learn more about the functionality, check out our documentation - and don't forget to sign up for the training webinar!

Bug Fixes

  • The 'Re-audit failed URLs' button had disappeared on the desktop version of site, which meant that users were... unable to re-audit failed URLs. If you didn't need or want to re-audit failed URLs during this time period, you probably would not have noticed - but I assure you that it did indeed disappear.

  • In the last update, the Add/Remove Columns pop-up modal had decided to shift quite far to the left, in protest to the far right populism resurgence. While this is not the forum to comment on political leanings, we must draw the line at the functionality failing at a base level - in this case the modal was falling off the left hand side of the screen, when viewing Sitebulb on smaller screens/laptops.

  • You know how sometimes when developers create sitemaps, they serve the sitemap index file with the incorrect MIME type? Yeah well that was stopping Sitebulb from parsing the sitemaps properly. Since we don't want to punish you, cherished SEO, from the foolish incompetence of such devs, Sitebulb now gracefully ignores the error and parses the sitemaps anyway. It also flags a warning about the invalid MIME type, so you can shout at the dev team to get their fucking shit together. Honestly.

  • Sitebulb was flagging certain URLs with confusing Hints in the International report, such as 'Missing hreflang annotations'. When we investigated, it turns out that the pages in question were non-indexable URLs, and so they should not really have been considered eligible for hreflang in the first place (because hreflang is an indexing signal). In order to make the audit data cleaner and clearer, Sitebulb will now only consider indexable URLs for the following Hints:

    • Invalid HTML lang attribute

    • Mismatched hreflang and HTML lang declarations

    • Missing hreflang annotations

    • Missing HTML lang attribute

    • Has hreflang annotations without HTML lang

Version 9.1

Released on 10th November 2025

Google Search Console Integration Update

We had a few reports of Sitebulb's GSC data being off (mostly on clicks and impressions). These had come from some of our bigger customers with massive traffic numbers, and we thought initially it must be a sampling issue.

But when we looked into it more deeply, we discovered that Google was sending back a LOT less data than we expected in some cases. There were two key things in our requests that influenced what the GSC API returned - traffic split by device (i.e. desktop/mobile/tablet) and keywords data - both of which Sitebulb requested by default. 

When we request keywords, the data returned by the GSC API is sampled, and some queries are removed to anonymise the data - this was having a particularly noticeble impact in thecase of large websites and specific industries. 

Frankly, it was making Sitebulb look like it didn't know wtf it was talking about :(

As a result, we have:

  • Simplified our Google Search Console integration by removing device-specific data breakdowns (Desktop, Mobile, Tablet). Instead, you'll see aggregate metrics across all devices - All clicks, impressions, CTR, and position data now represents combined data from all devices.

  • Keywords data is only requested when the keywords analysis report is enabled. So your search traffic metrics will be most accurate when Keyword Analysis is disabled.

Additional benefits of this change:

  • ~30% reduction in GSC data processing time

  • Faster audit completion for GSC-enabled audits

  • Reduced database storage requirements (saving your hard drive space 1 MB at a time)

But the main one, of course, is that the data is no longer completely fucking wrong.

Bug Fixes

  • When running Content Extraction, in the overview Sitebulb was reporting both 'Total Found' and 'Found on URLs' as identical, which would be awesome if they were measuring the same thing and/or happened to be the same value. Unfortunately they are not measuring the same thing and rarely happen to be the same value, which means this was sub-awesome, so we fixed it.

  • On certain exports that include the XML Sitemap URL, Sitebulb was including a bunch of random numbers in the export, instead of the actual 'XML Sitemap URL.' While the numbers look random to me, I expect it was actually an Easter Egg that Gareth left in the tool intentionally for people like him that only like motor sports rather than real sports (like the 'biggest torque value' ever recorded for an F1 car or some other such nerdy bollocks).

  • So... a user tried to use Bingbot as their user-agent, and Sitebulb refused to crawl. This is because we had a non-ASCII character in the UA string, which means we had fucked up. Does this mean that we encountered the very first usage of Bingbot as a user-agent, over 8 years since Sitebulb first launched?!

Version 9.0.275

Released on 29th October 2025 (hotfix)

Fixed a ridiculous issue when opening Sitebulb on desktop, where it would open up then immediately do a 180 and shut itself down. Computer says no.

This bug is only present on the original 9.0 version, if you encounter it, please update immediately!

Version 9.0

Released on 20th October 2025

URL Crawl Limits Now Based on HTML URLs

Up to now, Sitebulb's 'crawl limit' has been calculated on all pages crawled - which would include external links and page resource URLs, in addition to internal HTML URLs.

The limit I'm referring to is found in the 'Crawler Settings':

Maximum Pages To Audit

But most folks base their understanding of 'how big is my website?' only on internal HTML URLs (i.e. unique pages) and so our implementation of these limits was confusing. As an example, you might have an ecommerce store with about 10,000 pages - based on around 9,000 product pages and 1,000 other pages (categories, subcategories, blog, etc...). But if you run a crawl, Sitebulb could easily find another 1000 external URLs, and maybe 50,000 page resource URLs (images, CSS, JavaScript etc...).

So if you set a crawl limit of 10,000, you might actually find that the crawl would stop when only a couple of thousand internal HTML pages had been crawled (as the rest had been external links or page resources).

While no one actually complains at us about these sort of things, we know how annoying they must be for day-to-day usage. So we completely changed our philosophy on how we calculate it, so it's now based entirely on the number of HTML pages crawled.

We also updated the Crawl Progress UI to add more clarity around how many HTML pages have been crawled or are due to be crawled:

New Crawl Details Panel

In practice, what this means is that all of Sitebulb's plans have become more generous in terms of 'how many pages you can crawl', and it should be easier to set appropriate crawl limits as there is less guesswork involved.

Project Tags

One of the extremely cool benefits of a tool like Sitebulb is that there's no project limits in place at all - you could literally add an infinite number of projects if you could count that high (you can't) - so it's not uncommon to have hundreds of projects in play at any one time.

This is particularly true if you've been using Sitebulb for years, or if you have a team on Sitebulb Cloud with lots of users all adding their own projects. 

Over time, this can become unwieldy, so to help tidy things up a bit, we've introduced 'Tags' to project settings. Tags introduces a flexible, intuitive way to organize and navigate your projects with your own organizational logic.

Once you have some tags up, you can easily filter by them using this new dropdown:

Filter by tag

The above example might work well for consultants who do different types of work for different clients. Here's some other examples:

  • Agencies could tag based on client name, account manager, or region

  • In-house teams could tag based on locale (e.g. US/UK/DE) or brand family

  • Larger teams could combine tags with standardized taxonomies (e.g. client name + project type)

To add a tag to a project, navigate to the project page and hit the Add Tags button:

Add Tags

Then in the modal window, hit the dropdown and choose tags you wish to add to that project.

Add Tags Dropdown

To create new tags, simply start typing in the box, then hit enter:

Multiple Tags

Projects can have multiple tags, so you can group or filter projects across different dimensions.

Please reach out to us if you have ideas or feedback on how you are using this feature and what we can do to make it more useful.

Batch Delete Projects

Several years ago I made the mistake of mentioning 'you'll never add infinity projects to Sitebulb', and a number of users took this as a personal challenge.

Sooner or later they would realise the folly of their ways, as their computer would routinely whinge about 'low disk space' and other such complaints.

The support tickets we'd get revealed the depths of their frustration, 'why can I only delete projects one by fucking one?'.

We heard you, check this out:

Delete Multiple Projects

Simply select the projects you wish to delete, then hit the button to get rid of them forever.

No more angry emails (at least about this specific thing...).

Batch Delete Audits

Not content with one batch delete option, we've actually added two: this one allows you to delete multiple audits at once from within a single project.

And why might you one need to delete lots of audits at once? Erm...

Lots of Audits

Even for smaller sites, if you crawl them shitloads of times, the overall size really can add up.

Deleting audits works almost identically to the deleting projects method - simply navigate to the project in question, select the audits that you wish to delete on the left, then hit the 'Delete Audits' button:

Batch Delete Audits

New Authentication Menu & Better Shopify Support

I've got to admit it has become very pleasing when I go to buy yet more niche cookware I absolutely don't need but absolutely must have, and the ecom store happens to run on Shopify - that checkout process is just slick.

In the last few months we've seen an uptick in support tickets from folks trying to crawl Shopify sites and being hit with a 429 response - which causes Sitebulb to stop the crawl early.

429 response in Sitebulb

This most likely reflects the widespread concern that LLM crawlers are aggressively scraping website page content with little regard for the servers that power them, friendly crawlers like Sitebulb are unfortunate collateral damage.

To their credit, Shopify have addressed this issue directly and published guidance on how to crawl your Shopify store, which allows you to generate a HTTP message signature and authenticate your crawl requests using Web Bot Auth.

We too recognise the growing focus on authentication for crawling - which for the overwhelming majority of the entire history of SEO has been necessary only for staging sites - and so we have a new 'Authentication' menu item which groups all the authentication methods we support.

authentication settings

This comprises the 3 authentication methods you can add to Sitebulb:

  • HTTP Authentication

  • Forms Authentication

  • Custom Headers

  • PLUS a new section 'Shopify Settings', which is a specific application of Custom Headers

So this is where you need to head if you wish to add authentication to Sitebulb, and specifically in regards to Shopify, you can follow along with our doc: Crawling Shopify websites with Sitebulb, which you can use in conjunction with Shopify's own instructions to generate the HTTP message signature and add this into Sitebulb;

Completed Shopify Settings

Improved HTML Template Recognition

This is one of those updates that we kinda had to do because the feature was getting worse over time, but might cause a few of you out there to go 'oh for fuck sake, why did you change that?' I'm sorry.

What have we done?

We have improved the way HTML templates are classified, which should give you a more accurate page template classification.

Why was this necessary?

Over time, the way that HTML pages are created within CMS platforms has made them become increasingly dynamic, with sections and elements loading in with more unique identifiers - which was making it harder for Sitebulb to identify and group them accurately.

Why might you be pissed off?

If you have previously customized the template names within your projects as part of your workflow... you will need to do this again. Once you run a new audit post-update, your pages will be re-classified and re-grouped, and you will notice the template names in the list revert back to default.

Who can you blame?

As always, please direct your curses towards Gareth - he was the one insisting on this change, and in general can be blamed if Sitebulb ever causes any negative sentiment.

Duplicate Projects

To make it quicker an easier to create new projects based on the settings used in another project, we have added a 'Duplicate Project' button:

Duplicate Project

Once you press this button it will bring you to the 'New Project' screen, so you will then need to update both the project name and the start URL:

Duplicated Project

After this, when you get through to the pre-audit stage you will notice that the duplicated project will have inherited all the audit settings from the original project - so you won't need to go through and set these up again.

'Learn More' Links Added to PDFs

We've added an option when generating PDFs to include links to Sitebulb's Learn More content when printing Hints:

Learn More Links option

If you select this option, the PDF will then include clickable buttons to the corresponding 'Learn More' page on our website:

Learn more link on PDF

WARNING: If you have not told clients that you use a magical piece of software called 'Sitebulb' to make you look like a literal fucking superhero, using this option may give the game away... 

Archives

Access the archives of Sitebulb's Release Notes, to explore the development of this precocious young upstart: