Find us at BrightonSEO 25/26 April - STAND 18! Fill your swag bag!

Version 2 - Release Notes

Sitebulb Version 2

Sitebulb spent around 18 months in a 'version 2.x' incarnation, and these notes chart this period of slower, much more measured devlopment. During this time we added a LOT of new customers, so we did a significant amount of work understanding their problems and developing solutions for these (up to this point, development he been mainly led by our own ideas).

We worked on Version 3.0 through most of 2019, where Version 2 was developed mostly during 2018, following on from Version 1 and the beta years.

Version 2.6.2

Released on 16th May 2019

Updates

Sitebulb now supports new GSC domain properties
Google's continued development of Search Console brought about a recent change to 'domain properties', which replaces the property sets feature, and allows users to see all the data for an entire domain (http, https, www, non-www).

As you might expect, people had lots of questions about implementation, but the biggest concern was of course 'BUT WILL IT WORK WITH SITEBULB?!?!'

Alas, it did not. We had ignored it, shunned it like a lonely direwolf so we could focus on bigger and better things.

But this could not stand any longer, so we have righted this wrong, and it will now work as if it always worked properly all along.

Version 2.6.1

Released on 20th March 2019 (hotfix version)

Fixes

Two site-specific fixes
Every so often we get a user report an issue that only seems to affect their specific site. Perhaps it is the amount of HTML that needs to be parsed, or the way JavaScript loads certain scripts, that mean Sitebulb has a lot more difficulty than normal when crawling these sites. Depending where we are in our development cycles, sometimes we can just do little hotfixes that solve the specific issue, and that is exactly what we did here.

Version 2.6.0

Released on 12th March 2019

Fixes

XML Sitemaps report was completely broken
I'll be honest, our plan has always been that x.x.0 releases would include a nice little feature or enhancement, not simply bug fixes. But then we realised that in doing another recent fix, we managed to completely break the XML Sitemaps report. The data was just wrong. Flat out wrong, like people who believe the moon is made of cheese, or people who think that the Earth is flat, or people who think that Donald Trump isn't utterly abhorrent, or people who claim to hate bigotry but can't even recognise when they are demonstrating it themselves... or people who think that any of Alanis Morissette's examples were actually ironic.

So no, this update does not contain any cool new stuff, because we had to fix this.

On the flip side, I just took delivery of ten thousand spoons, so my day's looking up.

Version 2.5.9

Released on 4th March 2019

Updates

Self-referencing 'incoming' links removed from URL Details
This one caused much consternation in Sitebulb HQ. I'll give you the gist:

Gareth: Why don't you want to report these links any more?
Patrick: Because it's stupid.
Gareth: They're staying in.
Patrick: What is the purpose behind reporting them? These are links on page A that are pointing at page A. No one cares about them!
Gareth: Yes, but what if they're nofollow?
Patrick: Even worse! Then we'll trigger the Hint about receiving nofollow links, but the only nofollow links will come from the page itself!
Gareth: No, I'm not having it. We can't just hide data. Because... transparency.
Patrick: Fuck transparency. It's confusing, and it's not helpful. They are not really 'incoming links' because they don't come in from anywhere else.
Gareth: Ah, but what about outgoing?
Patrick: Yes, good point, remove them from there as well.
Gareth: WHAT?!
Patrick: It's the same deal. They aren't outgoing links because they don't point 'out' at another page.
Gareth: This is bullshit.
Patrick: Ok, I'll make you a deal. We can keep them if you can name me one single use case for their existence.
Gareth: ...

More granular detail on Indexability graph
Up to this point, the pie chart in Indexability would show Indexable (green) and Not Indexable (red). But clicking through to the URL List for 'Not Indexable', you were often greeted with URLs that were a mix of noindex, 404, redirects, canonicalized etc... and it was hard to see the wood from the trees.

Now, we've added all of those options so you can see them straightaway on the pie chart, and you just need to click through to the 'problem' sections you wish to investigate further.

Before and after

'Export audit' changed to 'Share audit'
Sooooooooooo many people ask us how they can share audits with other team members. And we've just been saying 'duh, just hit the export audit button.'

And then, only 378 support tickets later, it hit me... 'people keep asking us how to share audits, why don't we just change the button name to say share instead.'

Lo and behold... 0 support tickets*.

Share Audit

*hopefully

Fixes

Pagespeed Hint: 'Loads hidden images' was not working
It would be nice to dress this one up as something that wasn't really our fault, but I can't. The Hint just didn't work, and it does now (sorry).

Code Coverage not reporting correctly for CSS
On some sites, the CSS Code Coverage was not been collected properly, you may have even seen no files in there at all. Bug found and fixed now, however - you'll need to re-run any audits to get this data.

Version 2.5.8

Released on 26th February 2019

Fixes

Fixed a duplicate URL error
In certain circumstances, Sitebulb would report the same URL multiple times. In all instances, the duplicate URLs originated from XML Sitemaps, typically when URLs were referenced in multiple places, such as hreflang in sitemaps. If you have experienced this problem, you can set up a re-audit in the same project, and the problem will be resolved in your new audit.

Version 2.5.7

Released on 20th February 2019

Fixes

#1 Fixed a UI error that affected a couple of users
Most users won't have seen this, but for those that did, it renders the product entirely useless, so we figured we better fix it!

If you've experienced the error below, make sure to install this new update!

UI Error

#2 Fixed Accessibility 'View Hint Details' issue
On websites which require authentication, viewing any of the 'Hint Details' for Accessibility would hang forever.

Version 2.5.6

Released on 5th February 2019

Updates

It's been a wee while since our last update. While we've been away, we gave Sitebulb a good going over. We've spent quite a lot of time optimizing the computer resources (RAM and CPU) that Sitebulb uses up - both during the crawl and while building the reports. The result is that Sitebulb is overall a bit faster and less resource hungry - which turns out to be ironically more noticeable on more powerful machines.

#1 Specify the content area in Advanced Settings
Every so often we get a support ticket about an audit which claims all the word counts are 0. Typically Sitebulb is pretty clever about identifying the content area in order to make these calculations. But every so often we encounter a site with HTML that is so fundamentally wank, that Sitebulb can't find the content area properly. Like a poker noob sucking out on a gutshot straight draw, these shitty sites make Sitebulb look like a punk.

WE'RE NOT HAVING IT ANY MORE, I TELL YOU, WE'RE NOT HAVING IT.

Our, might I say, ingenius, solution is to put the onus upon you, dear user, to tell us what the content area should be.

Content area

This new option allows you to specify a parent DOM element, using a CSS selector, which Sitebulb will then recognise as the element containing the content you wish to analyse. Even better, if you have control of the site itself, you can edit page templates to add a class of "sb-content" to the corresponding DOM element, and Sitebulb will recognise this as the content area - without you having to enter any CSS selector here.

#2 New Robots Hint: Meta robots found outside of <head>
Look, we are big enough and man enough to accept when we are wrong. And we were wrong about this: search engines will respect robots directives even if they are not in the <head>. We did not know this.

It turns out various self-important Googlers have been telling us for years! Somehow we had not been listening.

Gary

Anyways, Sitebulb will now recognise body-dwelling robots tags, and identify the indexable status of these pages accordingly.

In addition, this new Hint will fire, so you know to fix that shit.

#3 Added new indexability warning on URL Details page
Look, we get it. You're busy people. It's easy to miss a noindex, or a canonical tag here and there. So we added a new warning to the URL Details page, like the image below, which only triggers when the URL is not indexable.
Indexability warning

#4 Updated tracking scripts that get blocked
We came across a number of sites that would blow up the audit with tons and tons of tracking scripts, even if you clicked the option 'Block Ad and Tracking Scripts'. So we've updated our lists of tracking scripts that get blocked, so you're more likely to end up with a cleaner audit.

Block tracking scripts

#5 When saving Excel files, Sitebulb now remembers the previous save location
What on earth do you mean, 'it should have just done this in the first place'?

Some people are never bloody happy.

#6 New Hint 'Learn More' pages for Mobile Friendly
More of the same. By the time we do our next release, everything will be done. Whoa.

Fixes

#1 Apparently crawling at lightspeed after pause/resume
This one may have escaped your attention, but if you paused an audit, then resumed it at a later point in time, Sitebulb would tend to make outrageous claims about how fast it was going. I'm sure most of you recognised this as a small UI bug, but I guess not everyone...

Crawl fast

#2 Saying the meta description is missing when it's blatantly not
On the subject of Sitebulb making outrageous claims, this isn't just limited to post-pause speed readings.

Puzzle me this Sitebulb: how can the meta description both 'have a length of x' and 'be missing' at the same time? Huh?! HUH?! 

Didn't think so.

Meta description bug

#3 If a URL was disallowed in robots.txt, Sitebulb was recording links as nofollow
I don't know how we managed this one, it's a pretty dumb mistake. If you checked the URL Details 'outgoing links' for any URL that was disallowed, you would have seen that the outgoing links were classed as meta nofollow. What can I say? Even superheroes fuck things up sometimes (see: every single Avenger).

#4 Sitebulb was forgetting the Google Analytics view on a re-audit
A minor complaint, indeed, but when you are seeking perfection nothing is too small to improve.

Now, the Bulb remembers.

#5 Sitebulb was also forgetting HTTP authentication credentials when doing 'Live View'
This only happened on sites that use HTTP authentication (obvs), when trying to view 'Hint Details' or URL Details -> Live View, it would just hang forever, because Sitebulb would accidentally forget the authentication credentials.

NOW, THE BULB REMEMBERS.

#6 Crawl parameters always marked as 'yes'
Even if you went into Advanced Settings to switch off crawling parameterised URLs, the audit settings would always claim you didn't, which was either an outright lie or it just forgot again.

NOW, THE BULB REMEMBERS!!!

Version 2.5.5

Released on 26th November 2018

Updates

I know what you're thinking... 'what happened to v2.5.4?' We actually did a mini fix (for Windows only) and set that live a couple weeks ago. Which means we've been running the Mac version at 2.5.3 and Windows at 2.5.4. My OCD has been spiralling out of control, so we had to do another release to save my sanity.

#1 Change the start URL of an existing Project
So...we actually added this a couple version back but forgot to announce it. My bad. But I'm claiming it as an 'update' all the same, because I bet most of you out there don't know you can do this:

Within any Project, go to Edit Project on the right, and you can change the start URL associated with this Project.

Edit Start URL

So if you have been working on and auditing a site in a dev environment, once you go live you can simply update the start URL for the project and carry out audits on the new live site, whilst keeping all the history and trendlines from the previous audits, along with any custom crawler or advanced settings.

#2 Sitebulb will set cookies if a site needs them
For most websites you/we/he/she/it come across, you don't need cookies enabled in order to crawl them. But when you do need them and don't use them, you'll be pretty fucked.

As soon as you enter a site that needs cookies to be enabled, Sitebulb will now detect this and automatically enable them for you. It will also give you a little notification at the top of the screen, so you know what the deuce is going on.

Cookies enabled

#3 Sitebulb will now record the rendered canonical
A few months back John Muller famously (well famous-ish-ly) stated that Google do not process the canonical found in the rendered DOM, and rely only on the canonical in the source HTML.

JohnMu Tweet

Like a pair of pissy little fanboys, we immediately went out and changed our process to mirror this - not even recording the rendered canonical and only showing the non-rendered one.

This turned out to be quite limiting for a number of specific use-cases:

  1. User has a site that uses prerender to serve to Google, but want to check the rendered version using Sitebulb.
  2. User has a site that sets the canonical using JavaScript, and wants to check it.

We were essentially hiding data from the user, and not allowing them to see what they actually wanted to check for. And since the canonical affects the indexable status, this had a knock on effect across the reports.

After review, the ruling on the field has changed (AKA we've reversed this now).

But we didn't want to totally dismiss the whole 'Google only looks at the source canonical thing', so we have a new Hint for you:

Fixes

#1 Fixed the filtering format for clicks/impressions
If you tried to filter either clicks or impressions data (from the Keywords report) you would have been met with this rather confusing conundrum.

Impressions contains...10??

We just needed to set the field type to numeric - schoolboy shit right here.

Wrong format for clicks and impressions

#2 You can no longer 'start' a re-audit from an imported audit
Previously, you could import an audit from another user, and then click through to carry out a re-audit. Or at least it looked like you could, but Sitebulb would stop you at the door. 'You're not coming in dressed like that, sonny.'

Yeah...this was not designed behaviour, in fact we didn't even realise it could be done until Steve Morgan told us so.

Steve Morgan tweet

Aside, if anyone is wondering about bug reporting:

  1. Yes. We do want to know about it.
  2. Please email details to [email protected]
  3. No, someone else has probably not already reported it.
  4. Just fucking tell us ok?
  5. We are very thankful for your kind support.

Back to the fix: you can no longer accidentally almost re-audit from an imported audit.

#3 Meta descriptions showing as missing when they are blatantly not missing
In contrast with my condescension above, we actually DID have a few users report this one to us. Well done you! Hooray! Would you like a medal?

Sitebulb would fire the Hint 'Meta description is missing' when it was basically not true in the slightest. It doesn't anymore.

#4 Sitebulb can now crawl shitty Squarespace sites
Ok, 'shitty' is a bit harsh. Although it is fair to say that no well respected business owner would ever consider using Squarespace for a site that is expected to compete in any competitive vertical. However, at least a few of our users have come across problems with Squarespace sites, so we thought we'd better fix it.

Namely, the issue was that Sitebulb would not collect data correctly from the <head> when using the Chrome Crawler. We found there was a common issue - JavaScript injecting a div in between the head and HTML tags, which screwed with our parser. We managed to make a Squarespace-site-specific-solution (try saying that 5 times fast) which circumvents this. 

Version 2.5.3

Released on 8th October 2018

Updates

#1 New 'Audit Summary' Export
A number of users have spoken to us recently along the lines of:

  • "Guys, the Hints in Sitebulb are the dogs bollocks. Any way to export all the data into Excel?"
  • "Dude! These Hints are sick af. But copy/pasting every title and description into a client report makes me want to rip my eyeballs out."
  • "Kind Sirs, the Hints you have in Sitebulb are really rather good. I wish to import them into Teamwork to automatically create tasks for my developer."

I mean I'm paraphrasing but you get the gist.

We couldn't ignore such kind and useful feedback, so we didn't! Instead, we built a new export that shows you every single Hint in Sitebulb, along with the description and 'Learn More' link, so you can do what you want with it.

Excel Audit Summary

This feature is available for new and existing audits, just go to Bulk Exports and export the 'Audit Summary.'

Audit Summary

#2 Allow users to select their own Google Analytics account
Lots of tools out there allow you to integrate with Google Analytics. But most tools offer a really shitty experience when selecting accounts - particularly if you have a lot of accounts or websites, you're forever scrolling through a massive dropdown. Incredibly annoying.

Sitebulb does it in a super smart way, figuring out the right GA account and selecting it for you. Which is awesome, 95% of the time.

The other 5% is when the GA account has been set up incorrectly (e.g. www vs non-www), or if a different UA code is being injected via a plugin, and the user genuinely can't select the right account.

Well, from now on we have added a fallback option for when Sitebulb can't figure out the right account, it will still let you have a shitty never ending dropdown.

Select different GA Account

You'll notice that it will still, by default, try to select the best account for you, but you also now have the option to over-ride with a different account. Best of both worlds.

We also have some new ultra-sexy options for you to play with: Device type and sampling level. Whoa.

GA sampling and device options

#3 Remove default page from Google Analytics
Speaking of Google Analytics accounts with a wonky setup, we've also solved an annoying issue that has bugged SEOs for ages. It happens when the view settings in GA include a 'default page'.

Default Page in GA

It is not an over-reaction to say that this little bastard fucks everything up when you try to match GA data up through a tool like Sitebulb.

The Analytics reports essentially add /index.html to every path:

Index.html

Since these pages actually do not exist, when Sitebulb comes to match the URL with the data pulled back from the Google Analytics API, the URLs do not match, so your report ends up full of 0s.

Or at least it did.

Now, Sitebulb will detect the default page and give you the option to remove it when matching data or extracting data from GA.

BOOM.

Remove default page

#4 More new Hint descriptions and Learn More pages
The never ending task of writing Learn More pages for all our Hints is finally drawing to a close.

This time we've added:

  • AMP
  • International
  • Duplicate Content

If you don't know how to find the Learn More links then you've not been paying attention. Find them on URL Lists like this:

Learn More Links

#5 Redesigned 'Hint Details' buttons
Some people were missing the 'Hint Details' buttons we have for some Hints. So we redesigned the button to a nice subtle hue.

BLINDING ORANGE LIKE THE FUCKING SUN

Alternative.

#6 'URLs not found in sitemap' now only includes indexable URLs
Per the request of several users, who wanted to know about indexable URLs that were not in the sitemap, but gave zero fucks if 404 or noindex pages were not in the sitemap.

Fixes

#1 Sitebulb was ignoring the sample audit maximum crawl depth
With the sample audit, if you entered say 250 levels deep into the 'Maximum Crawl Depth' box, it would still stop at the default depth of 50.

#2 FMP table was showing TTFB data
This was an acronymous slip-up of the highest order. In Page Speed, when you switched to the data view for the First Meaningful Paint graph (FMP), it showed you instead the Time to First Byte (TTFB) data.

#3 Graph legends not showing in James Bond mode
One of the reasons 007 mode exists is to make it easier on your eyes when you're working by candlelight. One might argue we took this to extremes by making the graph legends really really really really really dark grey.

Night mode graph legends

#4 Duplicate URLs had somehow crept back in
We had some issues with duplicate URLs coming in from Google Search Console a couple of versions back, and we thought we'd got rid of the problem for good. But like an oversized waistline, it crept back up on us.

Specifically, it was happening for particularly large websites, that had a LOT of data in Search Console - so most users would not have been affected.

#5 You can now disable technology collection
I know, I KNOW, this is really an update not a fix. In the Advanced Settings, you can stop Sitebulb collecting all the different technologies found on the site when you use the Chrome Crawler.

But we created an option to turn it off because we found a site that did not play nicely with it at all - and you needed to turn it off in order to complete the crawl. I mean it was a really shitty site, but then lots of people have really shitty sites.

You turn it off by unticking this box:

Parse technologies

#6 Links graph showing the wrong data
Kinda embarrassing. On this graph the final bars were not showing the correct values.

All the others were right, so I prefer to think that we were 87.5% right, rather than that we actually did anything wrong.

Links graph data error

#7 Fixed issue with Hint: 'Has an internal link with no anchor text'
On this Hint, Sitebulb was flagging link references where the anchor text was inside a <span>. Clearly, this is still anchor text, so it was totally wrong.

#8 Fixed Code Coverage bug when requesting incoming references
No one likes a SQL logic error.

SQL Logic Error

Version 2.5.2

Released on 4th September 2018

Unfortunately version 2.5.1 had a major database issue, which caused a range of problems and data errors. If you installed version 2.5.1, please consider this a critical update (if you didn't install 2.5.1, it's not so much of of a panic, but you should probably get it anyway!).

Version 2.5.1

Released on 3rd September 2018

Updates

#1 Even more Hint Details added
I've re-written the Hint descriptions for AMP, International, Mobile-friendly and Page Speed. The 'Learn More' pages for these will also be completed very soon (I promise!).

#2 Some new Hints added
We realised that one of our Mobile-friendly Hints was super vague: 'The viewport <meta> tag has scaling issues' and did not help in any way identify what the specific issue is. So to fix this we added in 6 new Hints, which break down all the various things that could have gone wrong:

  • The viewport <meta> tag does not have a width set
  • The viewport <meta> tag has a specific width set
  • The viewport <meta> tag has a maximum-scale set
  • The viewport <meta> tag has a minimum-scale set
  • The viewport <meta> tag initial-scale is incorrect
  • The viewport <meta> tag is missing an initial-scale

Fixes

#1 Capital NOINDEX was being ignored again
We fixed this once already. But then we changed something else that made it unfix (technical term). Bascially, if your meta robots were a bit SHOUTY, Sitebulb would completely ignore them. But my advice is to not have SHOUTY meta tags in the first place. They make you sound like one of those obnoxious English tourists who can't understand why the 'foreign' waiter doesn't understand, so they just say the same thing again - still in English - just a lot louder, getting increasingly enraged in the process.

#2 All the On Page 'length' charts had disappeared from PDF exports!
I mean there was just a massive blank space where they should have been (title, meta description & header 1 length). It looked particularly shite.

Version 2.5.0

Released on 17th August 2018

Updates

#1 More Hint Details added
I've been busy writing new improved Hint Details for a number of sections: Internal URLs, XML Sitemaps and Search Traffic. Each of these also has a specific 'Learn More' page on the site that explains what each issue is and how to resolve it.

We also added a Learn More button in the Hint description on the URL List, to make it easier to get to these web pages.

Learn More link

#2 Added new 'XML Sitemaps' tab to URL Details
If you have checked XML Sitemaps, you will now see a tab when you look on the URL Details for each URL, which shows you exactly which sitemaps a URL was found on.

XML Sitemaps

In hindsight, I should have chosen a screenshot example where the URL was actually on more than one sitemap.

#3 Improved the search function on URL List filtering
It used to match only match if you go the words in the right order. So if you searched 'Status' when looking for 'HTTP Status' it would show nothing. Which, let's be honest, is pretty wank.

It now searches like it probably should have all along.

Search Filter 

Fixes

#1 Duplicate URLs coming in from GA/GSC if you paused crawling
We noticed something like this a few weeks ago, but couldn't pin down exactly what was happening.

It took that magnificent Brummie Paddy Moogan to figure it out for us, realising that duplicate URLs had turned up in his report from the GA/GSC crawl sources and remembering that he'd paused crawling of GA/GSC URLs and noticed the queue numbers disappearing.

Much more useful than our normal bug reports, verbatim: 'It's not working. What's wrong with it?'

#2 Robots rules not running correctly
Talking of brilliant bug reports, I also have to take my hat off to Mark Soon, who consistently sends us very useful feedback, and manages to find the most random sites that are excellent edge cases for our software.

In this instance it is more run-of-the-mill, we were just not taking into account the specific robots 'allow' rule and claiming the URL was disallowed (see example below for this to actually make sense).

Robots.txt file

#3 Some text in James Bond mode hard to see
Ok, if we're being honest, they weren't just hard to see, they were totes unreadable. Like, totally.

Unreadable text

#4 Accessibility slide-out disappears when you go full screen
Yes, someone complained about this. And yes, I know we should have just said 'well don't go fullscreen then'. But we didn't cos we're nice chaps, we fixed it instead. So we'll just bitch and moan about it here, because we're not that nice really.

#5 URL List for 'Broken external URL (4XX or 5XX)' included HTTP 308
If you weren't already aware, 308 is neither 4XX or 5XX, so quite what it was doing in this list I don't know. But it's not anymore, at least.

Version 2.4.1

Released on 7th August 2018

Fixes

#1 Resolved issue of 'No URLs found' in various URL Lists
We managed to introduce an annoying bug in the last version of Sitebulb that caused some URL Lists to return 'No URLs found' - the most noticeable one that did this was Broken Internal URLs. According to Gareth, this was because of an issue with a database join. So now you know.

Version 2.4

Released on 25th July 2018

Please note that if you have any bigger existing audits (roughly > 100,000 URLs) you will probably notice that Sitebulb takes a bit longer to open these audits. This is because it is re-building some of the indexes as part of the updates below. This will only happen the first time you open the audit on v2.4.

Please just let Sitebulb do its thing and finish up - this may take up to 5 minutes.

Updates

#1 Updated Google Search Console with with the latest API changes.
Faster, stronger, better, etc...

#2 New robots.txt warning
This one goes hand-in-hand with our robots-grammar-nazi fix (#4 in the Fixes below) - a new warning for instances where the robots.txt file lets some search engines through, but not others. For example, if DuckDuckGo was told in no uncertain terms to fuck right off:

Duck duck go no

Then you'd get this message on the Audit Overview and in the Indexability report:

Some search engines blocked

#3 Added a load more Hint descriptions
Within the tool you will now find extended Hint descriptions for On Page, Internal URLs and Front-end.

#4 Added indexing for building Crawl Maps
This basically just makes it faster when building Crawl Maps on really big sites. Not much to see here TBH.

Fixes

#1 Fixed 'connection timeout' issues on URL Lists
If you clicked into URL Lists for the new 'Technologies' options on the audit overview, and then jumped into another URL List (e.g. Data Explorer), you could break Sitebulb so it would show a 'Connection Timed Out' message whenever you clicked into subsequent URL Lists. If you followed these steps it would basically break the tool completely.

This is what I meant (in 2.3 notes, below) when I said this was an MVP. And we've taken your first suggestion on board: 'Make sure it actually fucking works.'

#2 Some pages not being parsed correctly when you run Front-end
This one is hard to explain. It is to do with how Sitebulb assigns 'tasks' to the various threads it uses for different processes, and how these tasks are cleared off when they are done. When running Front-end, some of the threads would occasionally clear off the job it was doing, before it had finished its task. This would result in a handful of URLs not being parsed correctly, and some very small data inaccuracies on these audits.

Thanks to our diligent customers for reporting the issue, we isolated the problem and have fixed this now. To clarify, this issue was only present when you selected the audit option: Page Speed, Mobile Friendly and Front-end.

#3 Fixed Hint: 'URL contains no Google Analytics code' for old versions of GA
In v2.3 we added support for the new Google Analytics gtag code, because we are fucking rockstars.

In v2.3 we also managed to 'unsupport' the old Google Analytics codes, because we are fucking idiots.

#4 Now correctly following robots.txt rules for grammatically 'incorrect' robots.txt setups
Most websites are setup to basically accept all bots by default, with a few rules to disallow certain sections or pages. However some excessively paranoid folk are like 'hells no, you ain't comin' in if you ain't no search engine'.

These people are so fond of the double negative they set their robots.txt file up like so:

double negative robots.txt

Sitebulb was handling this like a 19th century English teacher, and getting it wrong. You'll be pleased to hear it's now bang up to date, and can often be heard shouting popular culture references like 'I ain't gettin' on no plane', to demonstrate that it is very much down with the kids.

Version 2.3

Released on 18th July 2018

Updates

#1 Sixteen months of data from Google Search Console
Since Google released their new shiny interface (+ features) to Google Search Console, everyone has been waiting for them to give us 16 months work of Search Analytics data through the API. They recently announced this addition;

Google announcement

So we added support for this in Sitebulb. You can now select up to a maximum of 480 days in the audit setup.

480 Days

Why exactly 480 days? Because Maths (or 'Math' if you are American and can't pronounce 's').

#2 Sitebulb detects technologies used
Sitebulb will collect technology data and tell you which URLs have what stuff on them - a bit like Builtwith except for every page on your site.

Technologies

It only works if you use the Chrome Crawler.

This is a kind of a beta feature. Or an 'MVP', if you want to call it that - we're hoping for feedback from the community as to what they might want to do with that data, so we can build you guys some useful 'Technologies' reports.

#3 The domain check now reports on multiple 200 statuses
This is quite a subtle change. So subtle that literally no one will notice it's there. But I'm going to tell you about it anyway because otherwise I have nothing to do with my life. And the World Cup is over. Sigh.

Consider this example:

Multiple 200 status

What we used to do was just check the first option you entered. And if this responded with a 200 status then we'd just think 'happy days' and let you proceed. Now, we also check if any of the other domain configurations also respond with a 200, and show you those too.

This should help you crawl the RIGHT website, rather than the wrong one (in this case, https://www).

#4 Added Distil Networks to the CDN list
Per the 2.2 update, Sitebulb will show CDN warnings when setting up an audit (if, for instance, your site is hosted on Cloudflare). We added Distil Networks to this list of 'bad CDNs', so you'll see the warning message for sites on Distil as well now. We also softened the warning message somewhat, as too many people were getting scared and running away. Pussies.

Distil Networks Warning

This addition was at the request of Matt Brown. We normally completely ignore the requests of random foodie hipsters, but when we met him at PubCon last year he said that Sitebulb is awesome, so he can do no wrong in our eyes.

Yes, we are incredibly fickle.

#5 Sitebulb detects new Gtag code
No one noticed, but Google Analytics rolled out a new tracking script, gtag.js. If you look in the source code of this very site you'll see a LIVE example of it;

Gtag

Sitebulb now looks for this as well as the old versions, when checking if a GA code is present on the page.

Fixes

#1 User agent not being pulled through correctly
This one is embarrassing. In 2.2 we added a new place where the user agent could be selected - on the domain pre-selection screen - which is useful if the site will not crawl with the default user agent. However... changing the user agent on this screen was not actually pulling through, and so it was completely redundant. Not our finest moment.

#2 Sitebulb would delete URLs in the queue when it hit a random error
Bit of a nightmare this one. We had a site that would crawl 6,500 URLs sometimes, and then 11,000 URLs other times. Turns out one of the pages on the site would occasionally throw a random server error, and Sitebulb was handling this by simply deleting everything in the queue. Technically, this is known as Sitebulb 'shitting the bed.'

It's fair to say that isn't exactly the behaviour we were after.

#3 Accessibility Hint "Form elements must have labels" incorrectly appearing
This Hint was flagged for a site, and our user emailed into support (verbatim): 'You wot m8?'

He was right, there weren't even any forms on the page, so how could they be missing labels? It turns out this was happening because we input a form into the DOM for the CSS Linter (don't worry, no one else know what a 'linter' is either, it's not just you).

So Sitebulb was claiming something was wrong with the page, based on something it had inserted itself in order to check for the problem. I know right? It's like a really shit, and not entirely accurate version of Inception.

#4 Some Crawl Maps were coming out blank
In general, people love Crawl Maps. We've found they are less enamoured when the Crawl Maps come out completely blank, as they had been doing very very very occasionally. They looked like this:

I know, I'm with you. What's the problem right? But some customers just love a fucking good moan (not a euphemism), so we figured we'd better fix it for them. Besides, we do like to deliver complete satisfaction (again, not a euphemism. I think there's something wrong with you).

#5 Sitebulb not parsing really shitty pages
We've said it before, and we'll say it again: The Internet is Broken. We found several instances where Sitebulb was not always collecting page content correctly, and whenever we dug into it further we found stray tags, unclosed elements and generally really crappy HTML. The type of thing that makes W3 Validators just roll over and die.

We fixed all the issues we came across, and more, making Sitebulb a lot more resilient and robust for all audits.

#6 Images missing alt text export was broken
The Excel export for the Hint 'Has images with missing alt text' was not populating at all. Which rendered it pretty useless.

#7 Links graph missing 500+
Someone pointed out that the first graph on the 'Links' report was missing the 500+ data. We launched an internal investigation as to where it had gone, and found it posturing as a French Ski Instructor in Val d'Isere, basking in the glory of a World Cup victory, sipping over-priced champagne in the hot tub of a high-end luxury chalet.

Don't worry people, the data has duly been brought back down to Earth, and restored to station:

French ski instructor

#8 Sitebulb was not always parsing external hreflang pages
In some circumstances Sitebulb would decide that it wasn't going to parse external hreflang pages, which made it look like there were issues with reciprocity when there were not.

Version 2.2

Released on 22nd June 2018

Updates

We spent a long time during this update working on performance and stability, and adding in fixes for crawling specific sites - whilst these are rare we think they are worth fixing in case the same issues appear on other sites. In general these changes make Sitebulb a lot more robust and reliable, without us being able to point at something noticeable that will immediately impress and delight you.

So we added some other stuff as well...

#1 James Bond mode baby
Do you like to work late at night and would prefer a less-bright interface?

Are you Batman, and only work in black (and sometimes very very dark grey)?

Do you want your clients, colleagues, or enemies to think that you're actually James fucking Bond?

Enter, night mode:

James Bond mode

It's the same, only darker, and much, much cooler.

You can toggle day/night modes using this little button in the top right:

Toggle night mode

#2 New diagnosis option in Advanced Settings
When setting up a new audit, you can turn on some new diagnosis options via Advanced Settings.

These are designed to help understand exactly what the crawler is seeing, allowing you to save the HTML found, and take rendered screenshots as Sitebulb crawls (screenshots are only available using the Chrome Crawler).

The data will then be available on the URL Details screen for any given URL.

It will look very similar to 'Live View', but there is one important difference here. Whereas 'Live View' goes to fetch the data 'as it is right now', these diagnosis options actually store the data as it was when Sitebulb performed the crawl.View saved HTMLYou can use the screenshot function in particular to understand page changes or differences between different crawls, as the data acts as a historical record.

View saved screenshotsThis feature is designed to be used with discretion - it is not the sort of thing you want to turn on for every crawl. One of the main reasons for this is that the data takes up a LOT of space. One screenshot image will come in at ~500KB, so if you're running a big site this will take up a lot of space.

We recommend using it alongside the 'URL List' crawl source, so you can control exactly which URLs (and more importantly - how many URLs) are being crawled.

#3 Increased redirect support
Sitebulb now supports 9 (nine) different types of redirect, because who doesn't love a fancy redirect eh?

  • HTTP Header
  • JavaScript
  • Meta Tag Refresh
  • HTTP Header Refresh
  • Interstitial
  • Reload
  • Navigation
  • Form Get Submission
  • Form Post Submission

#4 CDN warnings when setting up Projects
Content Delivery Networks (CDNs) have become the scourge of technical SEOs. Whilst they mostly do a good job protecting their client websites, their anti-DDoS security can play havoc with crawlers. For instance, we've had lots of users try to crawl websites on Cloudflare with the Googlebot User-Agent. Cloudflare will kick back a 403 (Forbidden) response (which is their equivalent of telling you to fuck off), and our user can't crawl the site.

In some cases, you can solve this by simply changing the User-Agent (to e.g. Sitebulb default). In other cases, you will need to get your client to log into the CDN and whitelist your IP address (or the IP of the computer you are crawling from). Pain in't arse.

To help deal with this issue, we now present a warning message in the pre-audit:

Cloudflare warning

This warning will trigger for any site using any of the following popular CDNs:

  • Cloudflare
  • Incapsula
  • Amazon CloudFront
  • ArvanCloud
  • CacheFly
  • CDN77
  • EdgeCast
  • Netlify
  • Fastly

#5 More comprehensive canonical data on URL Details view
Canonicals can be messy fuckers to unpick, and whilst Sitebulb has extremely comprehensive canonical Hints, we thought it could do a better job helping users understand what's going on for each specific URL.

Now, when you go to view the URL Details for a specific page, and click on the 'Indexability' tab, you'll see data like the image below. The first table is all the canonicals associated with the page itself. The second table shows other URLs that declare this page to be the canonical (i.e. 'incoming canonicals').

The 'Type' column will show whether the canonical was in the HTML ('Link') or in the HTTP Header ('HTTP').

Canonicals on page and incoming canonicals

#6 Warning message on audit overview when you only crawl 1 URL
We ran a comprehensive worldwide survey and our data shows that 99.99999%* of all websites have more than 1 URL.

So if Sitebulb comes back with a finished audit of only 1 URL, something probably done fucked up.

1 URL Warning

*Ok, I made this figure up. There is no spoon.

#7 Warning message when no links found in source HTML
To help with users getting tripped up by websites that rely on JavaScript rendered content, we have added a new warning message to the pre-audit.

The message appears at the top of the page, and clarifies that there are no links present in the source HTML, but that there are links in the rendered DOM - meaning you would need to crawl with the Chrome Crawler in order to crawl the site at all.

No links found message

#8 Hubspot mode
A few users contacted us about websites that appeared to crawl waaaaay more URLs than they should have. Like thousands and thousands more.

Digging into each case in turn, we realised that they were always crawling with the Chrome Crawler, and it was always on websites powered by HubSpot CMS. We dug further, and found some other sites that weren't powered by HubSpot CMS, but were using elements of the HubSpot platform (typically lead-gen forms and tracking/analytics).

On these sites, Chrome would find unique tracking scripts and images, several times on every single page. So this typically meant tons of extra JavaScript files and images, and in some cases, loads of redirects too - which would inflate the total crawled URLs by as much as 100X - and fill your audit up with a load of junk.

So we have added another new warning on the pre-audit, specifically if HubSpot scripts were found, which encourages you to block crawling of these scripts and avoid all the junk.

Block Hubspot Platform

#9 Block Ad and Tracking Scripts
The eagle-eyed among you will have already noticed this in the screenshot above, but we have also added the optional tickbox: 'Block Ad and Tracking Scripts.' This will stop Sitebulb from reporting on ad or tracking scripts (based on this massive fuckoff list we have of domains that are only used for ad or tracking scripts).

This option only applies for Chrome.

Fixes

#1 Better handling of large data tables
A user found a website that would literally crash his machine when he crawled it with Sitebulb. Obviously, we initially dismissed this issue as a clear case of user error, but when we could actually be bothered to look into the issue we found that the specific website he was crawling had these MASSIVE data tables, that fucked up Sitebulb's HTML parser and caused it to have a meltdown. RAM went through the roof, and chaos ensued.

#2 Sitemap data no longer turns to shit if you pause the audit
If you were auditing with XML Sitemaps selected as a crawl source, and happened to pause the audit part way through, you may have noticed that some of your data was missing, or incorrectly tagged as 'Not in Sitemaps', when it blatantly was. This didn't always happen, so you also may not have noticed this, but if you did, it's now fixed.

#3 Some of the 'More' links on graphs were not wired up correctly
The 'More' links are designed to give you more - they allow you to switch between the graph view and the data table view. Unfortunately, for a couple of graphs, these links stopped working, and if anything we were showing you less. I spoke to Gareth about what went wrong, and he said 'I was using a ng-if and it should have been a ng-show!'. Just as I suspected!

#4 CSS and JS resource URLs are not being stored against every URL
Page Resources, such as CSS or JavaScript files, were only being stored against the first URL that Sitebulb crawled (when crawling with Chrome). This is more of a screw-up than it sounds initially, it meant that page resource data was wrong, and reports such as the Insecure Content report were missing data.

#5 Chrome Crawler now takes into account HTTP/2 headers
It wasn't doing this before, and does now. I don't know if this really should be described as a bug, but it would report H2 pages as being H1.1, so it kinda feels like a bug even if it technically wasn't.

Version 2.1.2

Released on 30th May 2018

Updates

#1 Added to Dashboard: 'Recently Incomplete Audit'
Every so often, you've just had enough. It's been a busy day. Over ten emails, a couple of clients calls, AND you needed to keep up appearances on #coolseochat. The audit can wait until tomorrow.

So you hit 'Pause', shut down your computer, and go home to watch six hours of Netflix. You treat yourself to a bottle of rosé and finish off that half-eaten tub of Cookies and Cream that was on offer at Sainsburys the other day.

The morning rolls around and you head into work, you check Twitter, Facebook, LinkedIn, Instagram and Snapchat, Google yourself a couple times (because #personalbranding) then fire up Sitebulb before your lunch break. 'Huh, where'd my paused audit go? Oh well, better start again...'

NO LONGER!

Introducing the 'Recent Incomplete Website Audit' notice that will greet your eventual return. You're welcome.

Recent Incomplete Audit Notification

#2 Export audit directly from an audit
Not a lot of people know this, but you can export entire Sitebulb audits, which you can then send on to colleagues to use on their copy of Sitebulb. You could also send them to clients, if they are also a Sitebulb lover, or even to friends and family - a Mother's Day present perhaps?

The reason no one knows about this super-cool feature is that's it's hidden away in a cupboard like Edd the Duck.

So, we also added a button to the Audit Overview, right here:

Export Audit

#3 Two new Hints, related to rendering
Following some bombs dropped at the recent Google I/O conference, Google confirmed that both rel=canonical and rel=amphtml are ignored by Google when found in the rendered DOM - they rely solely on the HTML response for these elements.

Reply from JohnMu

So we added a couple of new Hints to check for issues related to this:

Note that these Hints will only be checked when you use the Chrome Crawler.

#4 Two more new Hints, these ones related to Page Speed
These two Hints are about Sitebulb detecting duplicate page resource URLs, namely:

  • Duplicate Javascript Files
  • Duplicate Style Sheets

These Hints identify JavaScript and CSS files that are technically duplicated - the URLs are the same other than a query string, and the file size and body content are identical. This typically comes from developers adding version numbers or timestamps to page resources, and can be problematic as you lose the ability to cache the resources across multiple pages. In reality, if this only affects a handful of URLs this is not a big issue, but if it affects thousands of URLs then it is a much bigger problem.

We've seen some sites recently with major issues, so thought it made sense to highlight this problem via the medium of Hint.

#5 You can now edit Project names and add Project descriptions
For those instances when you go to visit a client and want to show them their site audit, but regret how you let well placed anger dictate your Project-naming methodology in the past.

Rename projects

#6 More Indexability detail on URL Details page
You can now see more granular details regarding robots directives on the URL Details page. A showstopper indeed.

Indexability Details

Fixes

#1 Audits being queued when they didn't need to be
Queuing. Literally the ONE THING in the world that Brits are good at, and we managed to fuck that up!

In particular, this was a case of 'over-queuing', as in, audits queuing up when they didn't need to. If you ask me, this is taking politeness too far, and needs to end.

#2 Occasional issue with pause/resume
Technically related to the over-polite queue situation above, Sitebulb would also occasionally not correctly set the status to 'paused'.

Version 2.1.1

Released on 16th May 2018 (hotfix version)

Fixes

#1 Correctly processing URLs with extensions
Sitebulb was mistreating URLs that contained an extension on the end that it did not recognise (e.g. couponsite.com/stores/asos.com), which meant it actually was not downloading the HTML at all. Cue unexpected rise in Hints like 'Title tag not found'.

#2 Fixed bug in Link Equity calculation
Sitebulb was somehow calculating Link Equity scores for URLs that were orphaned, and not part of the crawlable site architecture. Duh, Sitebulb.

Version 2.1.0

Released on 9th May 2018

I once again need to give you some context before diving into the updates. This time, it was all about unbreaking Sitebulb.

We released Sitebulb v2.0 on the same day that Google publicly released Chrome 66. We'd been building against Chrome 65, and were happy that Sitebulb was stable and reliable... on Chrome 65. As our users went about their daily lives, Chrome updated itself in the background (it is auto-update by default), which in turn affected how Sitebulb interacted with Chrome.

The past couple of weeks we've discovered a bunch of things that have been changed or broken from 65 to 66, and decided we need to handle this differently.

Without further ado...

Updates

#1 Sitebulb is now packaged with Chromium
Once we'd got everything patched up on Chrome 66, we decided to package Chromium (66) up with Sitebulb. In future, this gives us control of the update process, so that we're not caught short by a new public Chrome release. It also fixes a couple of other issues, which you can read in the 'Fixes' section.

The downside is that it makes the download/install a lot larger, but we think this is a small downside, all things considered.

#2 Paste multiple XML Sitemaps in one go
We tried to make is better/easier to add sitemaps in V2.0, but in doing so we made it worse. And we got told off:

XML Sitemaps Tweet

Egos wounded, we went back to the drawing board...

Paste multiple XML Sitemaps

#3 Awesome new 'learn more' styled links
The 'Learn More' links we recently added to Hint descriptions were boring, plain text links. We've added a bit of styling to give them a touch of glam.

And the crowd go wild...

Learn more

#4 Added a 'help' page when Sitebulb crawls 0 URLs
To a Mouse: The best laid schemes o' mice an' men / Gang aft a-gley.

AKA, humans regularly fuck things up ('I'll crawl my React site with the HTML Crawler'), and machines also regularly fuck things up ('I'm sorry Dave. I'm afraid i can't do that').

Now, when things get fucked up, Sitebulb will try and help you out with some helpful info, instead of leaving you floundering around like a mouse in a fishbowl.

#5 Added wildcard to excluded external hosts
This just makes it a bit easier when setting up the global exclusion rules for external hosts, you can now do either of:

  • *.domain.com
  • *domain.com

In context example:

Excluded Hosts

Fixes

#1 Opener links now work again on Mac
One of the main problems with Chrome (see Updates #1) was that on Mac, Sitebulb-controlled Chrome instances were interfering with regular browser Chrome instances, and vice versa. As an upshot, if you had Chrome as your default browser, 'URL opener' links would not work at all. Which is kinda a problem when the main CTA for trial users 'Upgrade to Pro' doesn't work at all. Anti-CRO, anyone?

#2 Some URL Lists would not export filters
On some of the URL Lists, if you customised the data with filters and using add/remove columns, Sitebulb would ignore your instruction, like an ignorant fuck, and just export the whole lot.

#3 Duplicate titles being reported on pages with SVGs on
Lots of people noticed this one ('Check duplicate titles' must be high up on 'SEO Audit Checklists'?). Sitebulb was incorrectly claiming that there were multiple page titles, when really they were titles associated with SVGs. Schoolboy.

#4 'Redirects' that were not redirecting
When in doubt, blame Chrome. If anyone out there has played around with headless Chrome (y'know, for kicks), you may have come across issues with how redirects get resolved. Chrome likes to handle everything all at once, rather than scheduling the redirected URL as per normal crawling 'rules'. This caused us some problems, which looked (to the user) like URLs were being reported as redirects when they were not actually redirecting - for certain URLs and certain redirects. Most users shouldn't have experienced this, but for the few websites it did affect, it should now be working properly.

#4 Sitebulb was not showing proper respect to meta charset="UTF-8"
In fact, it was full on disrespecting it, triggering the Hint: 'Character Set Not Specified In Head Or Headers'.

#5 Sitebulb was not respecting all robots 'disallow' rules
Talk about disrespect.

#6 Fixed some issues with duplicate content detection
We changed the way that Sitebulb classifies canonicalized URLs in the background, which stops them being included in duplicate content reports (which they never should have been).

#7 'The internet is broken: reprise'
We found a whole bunch of new ways that developers can build shitty websites, which were breaking headless Chrome when Sitebulb tried to crawl them. We fixed Sitebulb so it now handles the websites, instead of unhelpfully falling over.

#8 Cell colours were the wrong way round on Crawl comparison export
The international rules of colours mean that red is bad and green is good. But in our crawl comparison export we were showing more forbidden URLs as GOOD and more redirects as BAD. That's the wrong way round dummy!

Crawl comparison export

Version 2.0.2

Released on 19th April 2018

Before starting with the regular fixes and updates below, we need to draw your attention to an update we've made to the Mac version of the software. It turns out that while running Sitebulb, users were no longer able to open Chrome. This is because Sitebulb now uses headless Chrome and Mac OS does not allow you to open two instances of the same application.

We are astonished, and a little disappointed, we did not spot this during beta testing, but there's nothing we can do about it other than to come up with a solution. And our solution is to package a version of Chrome in with Sitebulb, which can run independently of your normal Chrome browser app.

The downside of this is that it does make the download and install size of Sitebulb a lot larger, which we know will annoy some users. We are hoping that this is a temporary solution, while we find something more fitting.

The upside is that you can use Chrome normally again! It will also prevent the Chrome update bug which some users experienced, and I emailed everyone about the other day.

Onto the regular stuff...

Updates

#1 'Export all the things'
Mr Russ Jones called me out the other day with the following tweet:

Russ Jones tweet

Firstly, who does this guy think I am? Gareth is the one that builds everything, I literally sit on my arse all day writing snarky and borderline offensive release notes copy.

Regardless, we made it happen for you, Mr Jones. Go ahead and add whatever data you want, then export. And yes, including all those sexy Lighthouse metrics like TTFB and First Meaningful Paint.

Export all the things

#2 Configured Sitebulb to ignore certificate errors
If you had a site that did this over HTTPS...

Connection not private

...Chrome would just shit itself and crash. It doesn't anymore!

Fixes

#1 Sample audit working again
The sample audit has stopped working correctly. It is supposed to follow your depth settings, say 3 levels deep at 50 URLs per level, this should crawl a maximum of 150 URLs (or actually, 101 URLs, since there's only ever 1 URL at the first depth, but no one likes a pedant). It wasn't following these rules, and was just hitting the 'total maximum URLs.'

#2 Fixed over-zealous meta refresh detection
Sitebulb was looking for a meta refresh throughout the code, rather than just the head, which meant meta refreshes were triggering for <noscript> fallbacks in the body. It now only looks in the head.

#3 Tidied up formatting on crawl comparison page
Because it looked whack, yo.

#4 XML Sitemaps were not reported as being in robots.txt...
...when they actually were there! Thousands of users were up in arms about this one, understandably so if you ask me.

#5 URL Details -> Duplicate Content was not showing correctly on the Mac
And when I say 'not showing correctly', I mean 'not fucking showing at all.' Fixed up that bad boy good and proper.

#6 Two Page Speed Hints were not wired up correctly to URL Lists
Which meant the data was all out of sync. For those interested in the gory details, it was these 2 Hints:

  • Style Sheet content is greater than 14.5kB
  • Total combined Image content size is too big (over 1MB)

Version 2.0.1

Released on 17th April 2018 (hotfix version)

Fixes

We just push the button to launch v2.0, after 3+ months solid development...

Gareth: 'Give it a quick test, just to be sure'
Patrick: 'Ok, no problem, I'll test bbc.co.uk' (starts up Sitebulb)
Patrick: (enjoys watching colourful spinny thing)
Patrick: 'Oh. Fuck.'
Gareth: 'WHAT? What's wrong??'
Patrick: (silence)
Gareth: 'WHAT THE FUCK IS WRONG WITH IT?'
Patrick: 'Well...it's not crawling'

If you've seen an error message like this: Error - The 'Domain'='bbc.co.uk' part of the cookie is invalid

Then you've seen what caused me to lose the power of speech. It's now working again.

And...breathe.

Version 2.0.0

Released on 17th April 2018

A major update, with a host of new features and a bunch of small updates and bug fixes to boot.

#1 New JavaScript Crawler: Headless Chrome

We've ripped out our old JavaScript crawler and replaced it with a brand spanking new version of headless Chrome.

Our old JavaScript crawler was built using PhantomJS, a very popular headless browser (which is essentially a web browser without a graphical user interface), which has been a stable solution for us as we've developed the software over the last 2 years.

However, in the middle of last year, two important things happened:

  1. Google announced they were shipping headless Chrome in the public release of Chrome 59
  2. The PhantomJS developers, upon learning about headless Chrome, ceased developed of Phantom.

The upshot was that our JavaScript crawler has been getting stagnant, and was completely unable to crawl certain sites using particularly recent frameworks. But the new Chrome release also gave us an exciting opportunity to significantly improve our product.

Integrating headless Chrome has given us the ability to add some unique new reporting features, and report on data that was otherwise unavailable to us. These include a full front-end and performance audit, a code coverage audit, and an accessibility audit (you can read more about all of these below). In short, it's like being able to run Lighthouse on every single page on your site.

You can select the Chrome Crawler on the main audit setup page:

New Chrome Crawler

You may notice from the illustrative image above that we've also re-named our crawlers, the new one being 'Chrome Crawler' (because what does 'JavaScript Crawler' even mean?) and the other one being 'HTML Crawler' (which is exactly the same as the old 'Non-JavaScript Crawler').

#2 New Report: Front-end

Front-end is a new report you can get through Sitebulb, which is bundled in with Page Speed and Mobile Rendering in the audit selection audits. In combination, you end up with a full performance audit on every page on the site, pretty sweet huh?

The Front-end Hints include HTML validation, CSS validation and reporting of JavaScript errors, in addition to a ton of other useful stuff.

Front end

Additionally, it will also give you a Cookies report, which shows you all the cookies set by the website, along with the expiration on them. This gives the ability to do a complete cookie audit, to help you set the right consents in order to be GDPR compliant (25th May is coming, people!).

Cookies GDPR

#3 New Datapoint: First Meaningful Paint

Using the Chrome Crawler, Sitebulb can also now collect Time to First Meaningful Paint, which lives in the Page Speed report. First Meaningful Paint is possibly the most user-centric page timing datapoint anyone has ever come up with, as it is based on the user's perception of load time, identifying the time at which the user feels that the primary content of the page is visible.

First Meaningful Paint

By the way, if you've never heard about First Meaningful Paint, watch the first 10 minutes of this video and you'll understand it well enough to throw it around in a sentence and make everyone else look dumb when they don't know what you're talking about.

#4 New Report: Code Coverage

For those not satisfied by the performance optimizations suggested by the Front-end audit, the Code Coverage report will allow you to squeeze even more juice out of your site.

The idea here is that Sitebulb will show you where CSS and JavaScript code is being loaded in but not being used. It will do this for every page on the website, so you end up with a really clear view of 'dead code.' Cleaning this up so you only include the code you need will allow you to reduce the size of your pages and improve load time.

Code Coverage by file

You can also click through for each resource file and Sitebulb will pick out the unused lines of code from each CSS and JavaScript file. This combination allows you to find files that are not being used, and to optimize the code on files that are being used.

Code wastage highlighter

A few months ago Ian Lurie published a typically enjoyable post on using the Code Coverage report in DevTools to improve page speed, which should act as an excellent primer if you are new to the topic. Sitebulb takes the same DevTools methodology and aggregates it across the whole website. 

#5 New Report: Accessibility

Web accessibility sits beyond the realms of what is generally considered digital marketing, but is becoming more important in the development world as the web matures.

Accessibility is about making your website more inclusive, ensuring that all of your potential users, including people with disabilities, have a decent user experience and are able to easily access your information.

If you select 'Accessibility' from the audit options, Sitebulb will run over 35 automated accessibility checks, as it crawls, across your entire website - identifying the worst performing URLs and the most common violations.

You can pick out any URL and Sitebulb will build the DOM for you and highlight the violations, so you can inspect them directly without leaving the software.

Accessibility Violations

#6 New Report: Keywords

This new report is not made possible by Chrome, but by the Search Analytics API, which you can connect to Sitebulb through your Google Search Console account.

I'll show you what it looks like first, then how to set it up:

Keywords Report

There's a few other charts, and of course you can click through to a list of all the queries for every page. We haven't added any Hints yet, but we'll be soliciting feedback on this and developing this report further over the next few months (so if you have any ideas, please share!).

You add the Keywords report as an optional extra in the Project setup, once you've added a Google Search Console account/property. To get the breakdowns on branded/non-branded queries, you'll need to enter some brand variations in the box so that Sitebulb knows how to classify the keywords it pulls back from the API.

For example:

Keywords Setup

#7 New Report: Security

No matter what audit options you select when setting up, Sitebulb is going to give you the new Security report whether you want it or not.

But you should pay attention to it, because security is kinda a big deal yo.

Not Secure Message in Chrome

HTTPS and mixed content issues grab all the headlines, but Sitebulb will also alert you of a ton of vulnerabilities you've probably never even thought about, which will now keep you up at night. You're welcome.

Advanced Security Checks

There is also a quick 'Insecure Content' export, which includes triggered Hint data from all instances of insecure content: mixed content, HTTPS URLs linking to HTTP, page resources being loaded with protocol relative URIs and links vulnerable to tabnapping.

Insecure Content Export

#8 List mode!

For years we've sold a product, in URL Profiler, that works exclusively in 'list mode'. People would often ask us, 'when are you guys gonna build a crawler onto this thing?'

As soon as we release our crawler product, we get the opposite! 'Yeah but I want to just crawl a list, not the whole website.'

We can't f%$&ing win.

So here it is. You can now select a URL List as a URL Source to audit. You can audit the list on it's own, or combine it with a crawl, or combine it with sitemaps as well. 

List Mode Baby

Despite my sarcastic tone above, this is actually a feature we're pretty excited about. It allows you complete control over which URLs are 'crawled' (although they are not really crawled in the strict sense of the word), and allows you to check smaller subsections of a site without having to crawl the whole thing.

Either way, list enthusiasts should very pleased.

#9 New URL Details panel

This one is underwhelmingly awesome. We've totally rebuilt the URL Details view, and it's very cool.

The new panel slides out from the right of the screen, and shows more data about each URL, which you can navigate via the left hand menu.URL Details panel

You may also notice the blue 'Live View' button under the menu. Clicking this causes Sitebulb to go and fetch the page 'live' and render the page content (using the aforementioned headless Chrome).

This gives rise to some really useful data:

  • The response HTML, the rendered HTML, and any differences between the two (super useful for understanding how JavaScript can change HTML content, which may affect crawling and indexing).
  • Live HTTP Headers (request and response), so you don't need yet another Chrome extension.
  • Rendered screenshots for desktop, tablet and mobile, so you can quickly compare and contrast (or save for later, should you wish).

Since words are rarely enough, this can be better communicated via the medium of gif:

Live view

You can access the new URL Details panel by clicking the blue 'URL Details' buttons you'll find on the left in URL Lists.

URL Details

#10 More helpful Hints

Probably my favourite of all the new additions (because I do the support and it should hopefully make my life easier!), we have added contextual help for a number of Hints, and given ourselves the ability to scale this out across the coming months, so that every Hint contains more support.

Firstly, we have added new 'Hint Details' panels to a number of Hints, that work very much like the 'URL Details' panels discussed above. In particular, we think these can help with Hints that highlight on-page issues that are difficult to dig into without searching through source code.

For example, Images with Missing or Empty Alt Text, which previously would just show you a list of URLs that have more than one image with no alt text, and leave you to fend for yourself when figuring out exactly which images were the offending ones. 

Now, for any URL, you can click through to view the Hint Details from the URL List, which will open up a new frame showing you the HTML, with the images without alt text highlighted in the code.

Hint Details images missing alt text

A whole lot easier, right?

To support you further in your quest for knowledge, we're also creating Knowledge Base articles for each and every Hint, which will be linked up via the Hint descriptions in the tool, and published on the website for general consumption. You can check out an example here, and as you can see they explain what the Hint is actually checking, why it's important, and how to fix the issue.

Roughly 50% of the Hints currently have these Knowledge Base pages, but the rest will be finished in the next couple of months. We think this should make Sitebulb's Hints more transparent, help experienced users better understand what they are looking at, and provide an invaluable resource for less experienced SEOs who won't always appreciate the significance of every issue they come up against.

#11 URL List overhaul

Ok, maybe 'overhaul' is a bit of stretch, but we've made a bunch of changes.

The much requested 'sticky header' when you scroll:

URL List Sticky Header

The ability to freeze the URL column:

Freeze URL Column

Multi-level filtering (also much requested):

Filter URL List

#12 New Audit setup page

Not exactly a feature, but something that will look a bit different to what you're used to. 

We've moved around the audit setup page, and added the new options - all the new audit types mentioned above and the new URL source options.

The orange notices indicate that some of the report options will only work with the Chrome Crawler (and that Front-end works better with Chrome). We hope that users will still be selective about their audit choices and only turn on the data that they actually need, but in our hearts we know that SEOs will just press all the buttons anyway. 

New audit setup page

#13 New AMP Hint: AMP Page has validation errors

Sitebulb will now perform AMP validation on all your AMP pages, so it's checking to see if a given AMP page is;

  • missing mandatory elements
  • including disallowed, depreciated or duplicated elements
  • containing style, layout or templating errors

From the URL List associated with this Hint you can click the 'Hint Details' (see above) to check out the AMP validation errors:

AMP Validation

Smaller Updates & Bug Fixes

Here's a quick unordered list for the other smaller bits and pieces:

Updates

  • We made the hard decision to remove the free version of the software. This was an experimental idea we included initially, hoping that users would use it as a 'stepping stone' between having a paid subscription and not. Turns out that didn't really happen as we had hoped, and supporting the free version just became a burden that was hindering our development of the full app, so we have removed that option. 
  • As requested by a number of users, you can now pick exactly which resource files you wish Sitebulb to crawl, in the audit settings.
  • We've also made it so that Sitebulb will not crawl images linked via anchors, if you choose not to crawl images (or all Page Resources).
  • URLs will no longer be triggered for duplicate content if they have rel="next"/"prev" pagination elements, which makes the duplicate content report a lot easier to use.
  • The Sitebulb window will remember the size you were using previously and open up again at the same size next time.
  • If you hover over a URL in a URL list, it will now show you the full URL in a little rollover box.
  • We moved the Hint 'Images with Missing or Empty Alt Text' from the Page Resources section to the On Page -> SEO section, because it makes more sense there.
  • We also improved that Hint, by making it no longer highlighting tracking pixel images.

Fixes

  • Fixed a rare issue where image names were being incorrectly encoded, due to the way .NET encodes URLs when creating a URI object.
  • These Hints were being triggered when they should not have been: Mismatched nofollow/noindex in HTML and header.
  • On servers running nginx, if Sitebulb encountered a timeout situation, it was reporting this as 'Error' instead of 'Timeout'.
  • Fixed the Hint 'Title Tag Missing' (from On Page -> SEO Hints) as this was occasionally misreporting data.
  • Fixed an export issue when you tried to export resource URLs from the URL Details panel.
  • Sitebulb was not picking up the resource URL for .mp4 files when a subtitle track was also specified.

Sitebulb Desktop

Find, fix and communicate technical issues with easy visuals, in-depth insights, & prioritized recommendations across 300+ SEO issues.

  • Ideal for SEO professionals, consultants & marketing agencies.

Sitebulb Cloud

Get all the capability of Sitebulb Desktop, accessible via your web browser. Crawl at scale without project, crawl credit, or machine limits.

  • Perfect for collaboration, remote teams & extreme scale.