Released on 26th November 2018
I know what you're thinking... 'what happened to v2.5.4?' We actually did a mini fix (for Windows only) and set that live a couple weeks ago. Which means we've been running the Mac version at 2.5.3 and Windows at 2.5.4. My OCD has been spiralling out of control, so we had to do another release to save my sanity.
#1 Change the start URL of an existing Project
So...we actually added this a couple version back but forgot to announce it. My bad. But I'm claiming it as an 'update' all the same, because I bet most of you out there don't know you can do this:
Within any Project, go to Edit Project on the right, and you can change the start URL associated with this Project.
So if you have been working on and auditing a site in a dev environment, once you go live you can simply update the start URL for the project and carry out audits on the new live site, whilst keeping all the history and trendlines from the previous audits, along with any custom crawler or advanced settings.
#2 Sitebulb will set cookies if a site needs them
For most websites you/we/he/she/it come across, you don't need cookies enabled in order to crawl them. But when you do need them and don't use them, you'll be pretty fucked.
As soon as you enter a site that needs cookies to be enabled, Sitebulb will now detect this and automatically enable them for you. It will also give you a little notification at the top of the screen, so you know what the deuce is going on.
#3 Sitebulb will now record the rendered canonical
A few months back John Muller famously (well famous-ish-ly) stated that Google do not process the canonical found in the rendered DOM, and rely only on the canonical in the source HTML.
Like a pair of pissy little fanboys, we immediately went out and changed our process to mirror this - not even recording the rendered canonical and only showing the non-rendered one.
This turned out to be quite limiting for a number of specific use-cases:
- User has a site that uses prerender to serve to Google, but want to check the rendered version using Sitebulb.
We were essentially hiding data from the user, and not allowing them to see what they actually wanted to check for. And since the canonical affects the indexable status, this had a knock on effect across the reports.
After review, the ruling on the field has changed (AKA we've reversed this now).
But we didn't want to totally dismiss the whole 'Google only looks at the source canonical thing', so we have a new Hint for you:
- Rendered canonical is different to HTML source (click to view the 'Learn More' page for this Hint)
#1 Fixed the filtering format for clicks/impressions
If you tried to filter either clicks or impressions data (from the Keywords report) you would have been met with this rather confusing conundrum.
We just needed to set the field type to numeric - schoolboy shit right here.
#2 You can no longer 'start' a re-audit from an imported audit
Previously, you could import an audit from another user, and then click through to carry out a re-audit. Or at least it looked like you could, but Sitebulb would stop you at the door. 'You're not coming in dressed like that, sonny.'
Yeah...this was not designed behaviour, in fact we didn't even realise it could be done until Steve Morgan told us so.
Aside, if anyone is wondering about bug reporting:
- Yes. We do want to know about it.
- Please email details to email@example.com
- No, someone else has probably not already reported it.
- Just fucking tell us ok?
- We are very thankful for your kind support.
Back to the fix: you can no longer accidentally almost re-audit from an imported audit.
#3 Meta descriptions showing as missing when they are blatantly not missing
In contrast with my condescension above, we actually DID have a few users report this one to us. Well done you! Hooray! Would you like a medal?
Sitebulb would fire the Hint 'Meta description is missing' when it was basically not true in the slightest. It doesn't anymore.
#4 Sitebulb can now crawl shitty Squarespace sites
Ok, 'shitty' is a bit harsh. Although it is fair to say that no well respected business owner would ever consider using Squarespace for a site that is expected to compete in any competitive vertical. However, at least a few of our users have come across problems with Squarespace sites, so we thought we'd better fix it.
Released on 8th October 2018
#1 New 'Audit Summary' Export
A number of users have spoken to us recently along the lines of:
- "Guys, the Hints in Sitebulb are the dogs bollocks. Any way to export all the data into Excel?"
- "Dude! These Hints are sick af. But copy/pasting every title and description into a client report makes me want to rip my eyeballs out."
- "Kind Sirs, the Hints you have in Sitebulb are really rather good. I wish to import them into Teamwork to automatically create tasks for my developer."
I mean I'm paraphrasing but you get the gist.
We couldn't ignore such kind and useful feedback, so we didn't! Instead, we built a new export that shows you every single Hint in Sitebulb, along with the description and 'Learn More' link, so you can do what you want with it.
This feature is available for new and existing audits, just go to Bulk Exports and export the 'Audit Summary.'
#2 Allow users to select their own Google Analytics account
Lots of tools out there allow you to integrate with Google Analytics. But most tools offer a really shitty experience when selecting accounts - particularly if you have a lot of accounts or websites, you're forever scrolling through a massive dropdown. Incredibly annoying.
Sitebulb does it in a super smart way, figuring out the right GA account and selecting it for you. Which is awesome, 95% of the time.
The other 5% is when the GA account has been set up incorrectly (e.g. www vs non-www), or if a different UA code is being injected via a plugin, and the user genuinely can't select the right account.
Well, from now on we have added a fallback option for when Sitebulb can't figure out the right account, it will still let you have a shitty never ending dropdown.
You'll notice that it will still, by default, try to select the best account for you, but you also now have the option to over-ride with a different account. Best of both worlds.
We also have some new ultra-sexy options for you to play with: Device type and sampling level. Whoa.
#3 Remove default page from Google Analytics
Speaking of Google Analytics accounts with a wonky setup, we've also solved an annoying issue that has bugged SEOs for ages. It happens when the view settings in GA include a 'default page'.
It is not an over-reaction to say that this little bastard fucks everything up when you try to match GA data up through a tool like Sitebulb.
The Analytics reports essentially add /index.html to every path:
Since these pages actually do not exist, when Sitebulb comes to match the URL with the data pulled back from the Google Analytics API, the URLs do not match, so your report ends up full of 0s.
Or at least it did.
Now, Sitebulb will detect the default page and give you the option to remove it when matching data or extracting data from GA.
#4 More new Hint descriptions and Learn More pages
The never ending task of writing Learn More pages for all our Hints is finally drawing to a close.
This time we've added:
- Duplicate Content
If you don't know how to find the Learn More links then you've not been paying attention. Find them on URL Lists like this:
#5 Redesigned 'Hint Details' buttons
Some people were missing the 'Hint Details' buttons we have for some Hints. So we redesigned the button to a nice subtle hue.
#6 'URLs not found in sitemap' now only includes indexable URLs
Per the request of several users, who wanted to know about indexable URLs that were not in the sitemap, but gave zero fucks if 404 or noindex pages were not in the sitemap.
#1 Sitebulb was ignoring the sample audit maximum crawl depth
With the sample audit, if you entered say 250 levels deep into the 'Maximum Crawl Depth' box, it would still stop at the default depth of 50.
#2 FMP table was showing TTFB data
This was an acronymous slip-up of the highest order. In Page Speed, when you switched to the data view for the First Meaningful Paint graph (FMP), it showed you instead the Time to First Byte (TTFB) data.
#3 Graph legends not showing in James Bond mode
One of the reasons 007 mode exists is to make it easier on your eyes when you're working by candlelight. One might argue we took this to extremes by making the graph legends really really really really really dark grey.
#4 Duplicate URLs had somehow crept back in
We had some issues with duplicate URLs coming in from Google Search Console a couple of versions back, and we thought we'd got rid of the problem for good. But like an oversized waistline, it crept back up on us.
Specifically, it was happening for particularly large websites, that had a LOT of data in Search Console - so most users would not have been affected.
#5 You can now disable technology collection
I know, I KNOW, this is really an update not a fix. In the Advanced Settings, you can stop Sitebulb collecting all the different technologies found on the site when you use the Chrome Crawler.
But we created an option to turn it off because we found a site that did not play nicely with it at all - and you needed to turn it off in order to complete the crawl. I mean it was a really shitty site, but then lots of people have really shitty sites.
You turn it off by unticking this box:
#6 Links graph showing the wrong data
Kinda embarrassing. On this graph the final bars were not showing the correct values.
All the others were right, so I prefer to think that we were 87.5% right, rather than that we actually did anything wrong.
#7 Fixed issue with Hint: 'Has an internal link with no anchor text'
On this Hint, Sitebulb was flagging link references where the anchor text was inside a <span>. Clearly, this is still anchor text, so it was totally wrong.
#8 Fixed Code Coverage bug when requesting incoming references
No one likes a SQL logic error.
Released on 4th September 2018
Unfortunately version 2.5.1 had a major database issue, which caused a range of problems and data errors. If you installed version 2.5.1, please consider this a critical update (if you didn't install 2.5.1, it's not so much of of a panic, but you should probably get it anyway!).
Released on 3rd September 2018
#1 Even more Hint Details added
I've re-written the Hint descriptions for AMP, International, Mobile-friendly and Page Speed. The 'Learn More' pages for these will also be completed very soon (I promise!).
#2 Some new Hints added
We realised that one of our Mobile-friendly Hints was super vague: 'The viewport <meta> tag has scaling issues' and did not help in any way identify what the specific issue is. So to fix this we added in 6 new Hints, which break down all the various things that could have gone wrong:
- The viewport <meta> tag does not have a width set
- The viewport <meta> tag has a specific width set
- The viewport <meta> tag has a maximum-scale set
- The viewport <meta> tag has a minimum-scale set
- The viewport <meta> tag initial-scale is incorrect
- The viewport <meta> tag is missing an initial-scale
#1 Capital NOINDEX was being ignored again
We fixed this once already. But then we changed something else that made it unfix (technical term). Bascially, if your meta robots were a bit SHOUTY, Sitebulb would completely ignore them. But my advice is to not have SHOUTY meta tags in the first place. They make you sound like one of those obnoxious English tourists who can't understand why the 'foreign' waiter doesn't understand, so they just say the same thing again - still in English - just a lot louder, getting increasingly enraged in the process.
#2 All the On Page 'length' charts had disappeared from PDF exports!
I mean there was just a massive blank space where they should have been (title, meta description & header 1 length). It looked particularly shite.
Released on 17th August 2018
#1 More Hint Details added
I've been busy writing new improved Hint Details for a number of sections: Internal URLs, XML Sitemaps and Search Traffic. Each of these also has a specific 'Learn More' page on the site that explains what each issue is and how to resolve it.
We also added a Learn More button in the Hint description on the URL List, to make it easier to get to these web pages.
#2 Added new 'XML Sitemaps' tab to URL Details
If you have checked XML Sitemaps, you will now see a tab when you look on the URL Details for each URL, which shows you exactly which sitemaps a URL was found on.
In hindsight, I should have chosen a screenshot example where the URL was actually on more than one sitemap.
#3 Improved the search function on URL List filtering
It used to match only match if you go the words in the right order. So if you searched 'Status' when looking for 'HTTP Status' it would show nothing. Which, let's be honest, is pretty wank.
It now searches like it probably should have all along.
#1 Duplicate URLs coming in from GA/GSC if you paused crawling
We noticed something like this a few weeks ago, but couldn't pin down exactly what was happening.
It took that magnificent Brummie Paddy Moogan to figure it out for us, realising that duplicate URLs had turned up in his report from the GA/GSC crawl sources and remembering that he'd paused crawling of GA/GSC URLs and noticed the queue numbers disappearing.
Much more useful than our normal bug reports, verbatim: 'It's not working. What's wrong with it?'
#2 Robots rules not running correctly
Talking of brilliant bug reports, I also have to take my hat off to Mark Soon, who consistently sends us very useful feedback, and manages to find the most random sites that are excellent edge cases for our software.
In this instance it is more run-of-the-mill, we were just not taking into account the specific robots 'allow' rule and claiming the URL was disallowed (see example below for this to actually make sense).
#3 Some text in James Bond mode hard to see
Ok, if we're being honest, they weren't just hard to see, they were totes unreadable. Like, totally.
#4 Accessibility slide-out disappears when you go full screen
Yes, someone complained about this. And yes, I know we should have just said 'well don't go fullscreen then'. But we didn't cos we're nice chaps, we fixed it instead. So we'll just bitch and moan about it here, because we're not that nice really.
#5 URL List for 'Broken external URL (4XX or 5XX)' included HTTP 308
If you weren't already aware, 308 is neither 4XX or 5XX, so quite what it was doing in this list I don't know. But it's not anymore, at least.
Released on 7th August 2018
#1 Resolved issue of 'No URLs found' in various URL Lists
We managed to introduce an annoying bug in the last version of Sitebulb that caused some URL Lists to return 'No URLs found' - the most noticeable one that did this was Broken Internal URLs. According to Gareth, this was because of an issue with a database join. So now you know.
Released on 25th July 2018
Please note that if you have any bigger existing audits (roughly > 100,000 URLs) you will probably notice that Sitebulb takes a bit longer to open these audits. This is because it is re-building some of the indexes as part of the updates below. This will only happen the first time you open the audit on v2.4.
Please just let Sitebulb do its thing and finish up - this may take up to 5 minutes.
#1 Updated Google Search Console with with the latest API changes.
Faster, stronger, better, etc...
#2 New robots.txt warning
This one goes hand-in-hand with our robots-grammar-nazi fix (#4 in the Fixes below) - a new warning for instances where the robots.txt file lets some search engines through, but not others. For example, if DuckDuckGo was told in no uncertain terms to fuck right off:
Then you'd get this message on the Audit Overview and in the Indexability report:
#3 Added a load more Hint descriptions
Within the tool you will now find extended Hint descriptions for On Page, Internal URLs and Front-end.
#4 Added indexing for building Crawl Maps
This basically just makes it faster when building Crawl Maps on really big sites. Not much to see here TBH.
#1 Fixed 'connection timeout' issues on URL Lists
If you clicked into URL Lists for the new 'Technologies' options on the audit overview, and then jumped into another URL List (e.g. Data Explorer), you could break Sitebulb so it would show a 'Connection Timed Out' message whenever you clicked into subsequent URL Lists. If you followed these steps it would basically break the tool completely.
This is what I meant (in 2.3 notes, below) when I said this was an MVP. And we've taken your first suggestion on board: 'Make sure it actually fucking works.'
#2 Some pages not being parsed correctly when you run Front-end
This one is hard to explain. It is to do with how Sitebulb assigns 'tasks' to the various threads it uses for different processes, and how these tasks are cleared off when they are done. When running Front-end, some of the threads would occasionally clear off the job it was doing, before it had finished its task. This would result in a handful of URLs not being parsed correctly, and some very small data inaccuracies on these audits.
Thanks to our diligent customers for reporting the issue, we isolated the problem and have fixed this now. To clarify, this issue was only present when you selected the audit option: Page Speed, Mobile Friendly and Front-end.
#3 Fixed Hint: 'URL contains no Google Analytics code' for old versions of GA
In v2.3 we added support for the new Google Analytics gtag code, because we are fucking rockstars.
In v2.3 we also managed to 'unsupport' the old Google Analytics codes, because we are fucking idiots.
#4 Now correctly following robots.txt rules for grammatically 'incorrect' robots.txt setups
Most websites are setup to basically accept all bots by default, with a few rules to disallow certain sections or pages. However some excessively paranoid folk are like 'hells no, you ain't comin' in if you ain't no search engine'.
These people are so fond of the double negative they set their robots.txt file up like so:
Sitebulb was handling this like a 19th century English teacher, and getting it wrong. You'll be pleased to hear it's now bang up to date, and can often be heard shouting popular culture references like 'I ain't gettin' on no plane', to demonstrate that it is very much down with the kids.
Released on 18th July 2018
#1 Sixteen months of data from Google Search Console
Since Google released their new shiny interface (+ features) to Google Search Console, everyone has been waiting for them to give us 16 months work of Search Analytics data through the API. They recently announced this addition;
So we added support for this in Sitebulb. You can now select up to a maximum of 480 days in the audit setup.
Why exactly 480 days? Because Maths (or 'Math' if you are American and can't pronounce 's').
#2 Sitebulb detects technologies used
Sitebulb will collect technology data and tell you which URLs have what stuff on them - a bit like Builtwith except for every page on your site.
It only works if you use the Chrome Crawler.
This is a kind of a beta feature. Or an 'MVP', if you want to call it that - we're hoping for feedback from the community as to what they might want to do with that data, so we can build you guys some useful 'Technologies' reports.
#3 The domain check now reports on multiple 200 statuses
This is quite a subtle change. So subtle that literally no one will notice it's there. But I'm going to tell you about it anyway because otherwise I have nothing to do with my life. And the World Cup is over. Sigh.
Consider this example:
What we used to do was just check the first option you entered. And if this responded with a 200 status then we'd just think 'happy days' and let you proceed. Now, we also check if any of the other domain configurations also respond with a 200, and show you those too.
This should help you crawl the RIGHT website, rather than the wrong one (in this case, https://www).
#4 Added Distil Networks to the CDN list
Per the 2.2 update, Sitebulb will show CDN warnings when setting up an audit (if, for instance, your site is hosted on Cloudflare). We added Distil Networks to this list of 'bad CDNs', so you'll see the warning message for sites on Distil as well now. We also softened the warning message somewhat, as too many people were getting scared and running away. Pussies.
This addition was at the request of Matt Brown. We normally completely ignore the requests of random foodie hipsters, but when we met him at PubCon last year he said that Sitebulb is awesome, so he can do no wrong in our eyes.
Yes, we are incredibly fickle.
#5 Sitebulb detects new Gtag code
No one noticed, but Google Analytics rolled out a new tracking script, gtag.js. If you look in the source code of this very site you'll see a LIVE example of it;
Sitebulb now looks for this as well as the old versions, when checking if a GA code is present on the page.
#1 User agent not being pulled through correctly
This one is embarrassing. In 2.2 we added a new place where the user agent could be selected - on the domain pre-selection screen - which is useful if the site will not crawl with the default user agent. However... changing the user agent on this screen was not actually pulling through, and so it was completely redundant. Not our finest moment.
#2 Sitebulb would delete URLs in the queue when it hit a random error
Bit of a nightmare this one. We had a site that would crawl 6,500 URLs sometimes, and then 11,000 URLs other times. Turns out one of the pages on the site would occasionally throw a random server error, and Sitebulb was handling this by simply deleting everything in the queue. Technically, this is known as Sitebulb 'shitting the bed.'
It's fair to say that isn't exactly the behaviour we were after.
#3 Accessibility Hint "Form elements must have labels" incorrectly appearing
This Hint was flagged for a site, and our user emailed into support (verbatim): 'You wot m8?'
He was right, there weren't even any forms on the page, so how could they be missing labels? It turns out this was happening because we input a form into the DOM for the CSS Linter (don't worry, no one else know what a 'linter' is either, it's not just you).
So Sitebulb was claiming something was wrong with the page, based on something it had inserted itself in order to check for the problem. I know right? It's like a really shit, and not entirely accurate version of Inception.
#4 Some Crawl Maps were coming out blank
In general, people love Crawl Maps. We've found they are less enamoured when the Crawl Maps come out completely blank, as they had been doing very very very occasionally. They looked like this:
I know, I'm with you. What's the problem right? But some customers just love a fucking good moan (not a euphemism), so we figured we'd better fix it for them. Besides, we do like to deliver complete satisfaction (again, not a euphemism. I think there's something wrong with you).
#5 Sitebulb not parsing really shitty pages
We've said it before, and we'll say it again: The Internet is Broken. We found several instances where Sitebulb was not always collecting page content correctly, and whenever we dug into it further we found stray tags, unclosed elements and generally really crappy HTML. The type of thing that makes W3 Validators just roll over and die.
We fixed all the issues we came across, and more, making Sitebulb a lot more resilient and robust for all audits.
#6 Images missing alt text export was broken
The Excel export for the Hint 'Has images with missing alt text' was not populating at all. Which rendered it pretty useless.
#7 Links graph missing 500+
Someone pointed out that the first graph on the 'Links' report was missing the 500+ data. We launched an internal investigation as to where it had gone, and found it posturing as a French Ski Instructor in Val d'Isere, basking in the glory of a World Cup victory, sipping over-priced champagne in the hot tub of a high-end luxury chalet.
Don't worry people, the data has duly been brought back down to Earth, and restored to station:
#8 Sitebulb was not always parsing external hreflang pages
In some circumstances Sitebulb would decide that it wasn't going to parse external hreflang pages, which made it look like there were issues with reciprocity when there were not.
Released on 22nd June 2018
We spent a long time during this update working on performance and stability, and adding in fixes for crawling specific sites - whilst these are rare we think they are worth fixing in case the same issues appear on other sites. In general these changes make Sitebulb a lot more robust and reliable, without us being able to point at something noticeable that will immediately impress and delight you.
So we added some other stuff as well...
#1 James Bond mode baby
Do you like to work late at night and would prefer a less-bright interface?
Are you Batman, and only work in black (and sometimes very very dark grey)?
Do you want your clients, colleagues, or enemies to think that you're actually James fucking Bond?
Enter, night mode:
It's the same, only darker, and much, much cooler.
You can toggle day/night modes using this little button in the top right:
#2 New diagnosis option in Advanced Settings
When setting up a new audit, you can turn on some new diagnosis options via Advanced Settings.
These are designed to help understand exactly what the crawler is seeing, allowing you to save the HTML found, and take rendered screenshots as Sitebulb crawls (screenshots are only available using the Chrome Crawler).
The data will then be available on the URL Details screen for any given URL.
It will look very similar to 'Live View', but there is one important difference here. Whereas 'Live View' goes to fetch the data 'as it is right now', these diagnosis options actually store the data as it was when Sitebulb performed the crawl.You can use the screenshot function in particular to understand page changes or differences between different crawls, as the data acts as a historical record.
This feature is designed to be used with discretion - it is not the sort of thing you want to turn on for every crawl. One of the main reasons for this is that the data takes up a LOT of space. One screenshot image will come in at ~500KB, so if you're running a big site this will take up a lot of space.
We recommend using it alongside the 'URL List' crawl source, so you can control exactly which URLs (and more importantly - how many URLs) are being crawled.
#3 Increased redirect support
Sitebulb now supports 9 (nine) different types of redirect, because who doesn't love a fancy redirect eh?
- HTTP Header
- Meta Tag Refresh
- HTTP Header Refresh
- Form Get Submission
- Form Post Submission
#4 CDN warnings when setting up Projects
Content Delivery Networks (CDNs) have become the scourge of technical SEOs. Whilst they mostly do a good job protecting their client websites, their anti-DDoS security can play havoc with crawlers. For instance, we've had lots of users try to crawl websites on Cloudflare with the Googlebot User-Agent. Cloudflare will kick back a 403 (Forbidden) response (which is their equivalent of telling you to fuck off), and our user can't crawl the site.
In some cases, you can solve this by simply changing the User-Agent (to e.g. Sitebulb default). In other cases, you will need to get your client to log into the CDN and whitelist your IP address (or the IP of the computer you are crawling from). Pain in't arse.
To help deal with this issue, we now present a warning message in the pre-audit:
This warning will trigger for any site using any of the following popular CDNs:
- Amazon CloudFront
#5 More comprehensive canonical data on URL Details view
Canonicals can be messy fuckers to unpick, and whilst Sitebulb has extremely comprehensive canonical Hints, we thought it could do a better job helping users understand what's going on for each specific URL.
Now, when you go to view the URL Details for a specific page, and click on the 'Indexability' tab, you'll see data like the image below. The first table is all the canonicals associated with the page itself. The second table shows other URLs that declare this page to be the canonical (i.e. 'incoming canonicals').
The 'Type' column will show whether the canonical was in the HTML ('Link') or in the HTTP Header ('HTTP').
#6 Warning message on audit overview when you only crawl 1 URL
We ran a comprehensive worldwide survey and our data shows that 99.99999%* of all websites have more than 1 URL.
So if Sitebulb comes back with a finished audit of only 1 URL, something probably done fucked up.
*Ok, I made this figure up. There is no spoon.
#7 Warning message when no links found in source HTML
The message appears at the top of the page, and clarifies that there are no links present in the source HTML, but that there are links in the rendered DOM - meaning you would need to crawl with the Chrome Crawler in order to crawl the site at all.
#8 Hubspot mode
A few users contacted us about websites that appeared to crawl waaaaay more URLs than they should have. Like thousands and thousands more.
Digging into each case in turn, we realised that they were always crawling with the Chrome Crawler, and it was always on websites powered by HubSpot CMS. We dug further, and found some other sites that weren't powered by HubSpot CMS, but were using elements of the HubSpot platform (typically lead-gen forms and tracking/analytics).
So we have added another new warning on the pre-audit, specifically if HubSpot scripts were found, which encourages you to block crawling of these scripts and avoid all the junk.
#9 Block Ad and Tracking Scripts
The eagle-eyed among you will have already noticed this in the screenshot above, but we have also added the optional tickbox: 'Block Ad and Tracking Scripts.' This will stop Sitebulb from reporting on ad or tracking scripts (based on this massive fuckoff list we have of domains that are only used for ad or tracking scripts).
This option only applies for Chrome.
#1 Better handling of large data tables
A user found a website that would literally crash his machine when he crawled it with Sitebulb. Obviously, we initially dismissed this issue as a clear case of user error, but when we could actually be bothered to look into the issue we found that the specific website he was crawling had these MASSIVE data tables, that fucked up Sitebulb's HTML parser and caused it to have a meltdown. RAM went through the roof, and chaos ensued.
#2 Sitemap data no longer turns to shit if you pause the audit
If you were auditing with XML Sitemaps selected as a crawl source, and happened to pause the audit part way through, you may have noticed that some of your data was missing, or incorrectly tagged as 'Not in Sitemaps', when it blatantly was. This didn't always happen, so you also may not have noticed this, but if you did, it's now fixed.
#3 Some of the 'More' links on graphs were not wired up correctly
The 'More' links are designed to give you more - they allow you to switch between the graph view and the data table view. Unfortunately, for a couple of graphs, these links stopped working, and if anything we were showing you less. I spoke to Gareth about what went wrong, and he said 'I was using a ng-if and it should have been a ng-show!'. Just as I suspected!
#4 CSS and JS resource URLs are not being stored against every URL
#5 Chrome Crawler now takes into account HTTP/2 headers
It wasn't doing this before, and does now. I don't know if this really should be described as a bug, but it would report H2 pages as being H1.1, so it kinda feels like a bug even if it technically wasn't.
Released on 30th May 2018
#1 Added to Dashboard: 'Recently Incomplete Audit'
Every so often, you've just had enough. It's been a busy day. Over ten emails, a couple of clients calls, AND you needed to keep up appearances on #coolseochat. The audit can wait until tomorrow.
So you hit 'Pause', shut down your computer, and go home to watch six hours of Netflix. You treat yourself to a bottle of rosé and finish off that half-eaten tub of Cookies and Cream that was on offer at Sainsburys the other day.
The morning rolls around and you head into work, you check Twitter, Facebook, LinkedIn, Instagram and Snapchat, Google yourself a couple times (because #personalbranding) then fire up Sitebulb before your lunch break. 'Huh, where'd my paused audit go? Oh well, better start again...'
Introducing the 'Recent Incomplete Website Audit' notice that will greet your eventual return. You're welcome.
#2 Export audit directly from an audit
Not a lot of people know this, but you can export entire Sitebulb audits, which you can then send on to colleagues to use on their copy of Sitebulb. You could also send them to clients, if they are also a Sitebulb lover, or even to friends and family - a Mother's Day present perhaps?
The reason no one knows about this super-cool feature is that's it's hidden away in a cupboard like Edd the Duck.
So, we also added a button to the Audit Overview, right here:
#3 Two new Hints, related to rendering
Following some bombs dropped at the recent Google I/O conference, Google confirmed that both rel=canonical and rel=amphtml are ignored by Google when found in the rendered DOM - they rely solely on the HTML response for these elements.
So we added a couple of new Hints to check for issues related to this:
Note that these Hints will only be checked when you use the Chrome Crawler.
#4 Two more new Hints, these ones related to Page Speed
These two Hints are about Sitebulb detecting duplicate page resource URLs, namely:
- Duplicate Style Sheets
We've seen some sites recently with major issues, so thought it made sense to highlight this problem via the medium of Hint.
#5 You can now edit Project names and add Project descriptions
For those instances when you go to visit a client and want to show them their site audit, but regret how you let well placed anger dictate your Project-naming methodology in the past.
#6 More Indexability detail on URL Details page
You can now see more granular details regarding robots directives on the URL Details page. A showstopper indeed.
#1 Audits being queued when they didn't need to be
Queuing. Literally the ONE THING in the world that Brits are good at, and we managed to fuck that up!
In particular, this was a case of 'over-queuing', as in, audits queuing up when they didn't need to. If you ask me, this is taking politeness too far, and needs to end.
#2 Occasional issue with pause/resume
Technically related to the over-polite queue situation above, Sitebulb would also occasionally not correctly set the status to 'paused'.
Released on 16th May 2018 (hotfix version)
#1 Correctly processing URLs with extensions
Sitebulb was mistreating URLs that contained an extension on the end that it did not recognise (e.g. couponsite.com/stores/asos.com), which meant it actually was not downloading the HTML at all. Cue unexpected rise in Hints like 'Title tag not found'.
#2 Fixed bug in Link Equity calculation
Sitebulb was somehow calculating Link Equity scores for URLs that were orphaned, and not part of the crawlable site architecture. Duh, Sitebulb.
Released on 9th May 2018
I once again need to give you some context before diving into the updates. This time, it was all about unbreaking Sitebulb.
We released Sitebulb v2.0 on the same day that Google publicly released Chrome 66. We'd been building against Chrome 65, and were happy that Sitebulb was stable and reliable... on Chrome 65. As our users went about their daily lives, Chrome updated itself in the background (it is auto-update by default), which in turn affected how Sitebulb interacted with Chrome.
The past couple of weeks we've discovered a bunch of things that have been changed or broken from 65 to 66, and decided we need to handle this differently.
Without further ado...
#1 Sitebulb is now packaged with Chromium
Once we'd got everything patched up on Chrome 66, we decided to package Chromium (66) up with Sitebulb. In future, this gives us control of the update process, so that we're not caught short by a new public Chrome release. It also fixes a couple of other issues, which you can read in the 'Fixes' section.
The downside is that it makes the download/install a lot larger, but we think this is a small downside, all things considered.
#2 Paste multiple XML Sitemaps in one go
We tried to make is better/easier to add sitemaps in V2.0, but in doing so we made it worse. And we got told off:
Egos wounded, we went back to the drawing board...
#3 Awesome new 'learn more' styled links
The 'Learn More' links we recently added to Hint descriptions were boring, plain text links. We've added a bit of styling to give them a touch of glam.
And the crowd go wild...
#4 Added a 'help' page when Sitebulb crawls 0 URLs
To a Mouse: /
AKA, humans regularly fuck things up ('I'll crawl my React site with the HTML Crawler'), and machines also regularly fuck things up ('I'm sorry Dave. I'm afraid i can't do that').
Now, when things get fucked up, Sitebulb will try and help you out with some helpful info, instead of leaving you floundering around like a mouse in a fishbowl.
#5 Added wildcard to excluded external hosts
This just makes it a bit easier when setting up the global exclusion rules for external hosts, you can now do either of:
In context example:
#1 Opener links now work again on Mac
One of the main problems with Chrome (see Updates #1) was that on Mac, Sitebulb-controlled Chrome instances were interfering with regular browser Chrome instances, and vice versa. As an upshot, if you had Chrome as your default browser, 'URL opener' links would not work at all. Which is kinda a problem when the main CTA for trial users 'Upgrade to Pro' doesn't work at all. Anti-CRO, anyone?
#2 Some URL Lists would not export filters
On some of the URL Lists, if you customised the data with filters and using add/remove columns, Sitebulb would ignore your instruction, like an ignorant fuck, and just export the whole lot.
#3 Duplicate titles being reported on pages with SVGs on
Lots of people noticed this one ('Check duplicate titles' must be high up on 'SEO Audit Checklists'?). Sitebulb was incorrectly claiming that there were multiple page titles, when really they were titles associated with SVGs. Schoolboy.
#4 'Redirects' that were not redirecting
When in doubt, blame Chrome. If anyone out there has played around with headless Chrome (y'know, for kicks), you may have come across issues with how redirects get resolved. Chrome likes to handle everything all at once, rather than scheduling the redirected URL as per normal crawling 'rules'. This caused us some problems, which looked (to the user) like URLs were being reported as redirects when they were not actually redirecting - for certain URLs and certain redirects. Most users shouldn't have experienced this, but for the few websites it did affect, it should now be working properly.
#4 Sitebulb was not showing proper respect to meta charset="UTF-8"
In fact, it was full on disrespecting it, triggering the Hint: 'Character Set Not Specified In Head Or Headers'.
#5 Sitebulb was not respecting all robots 'disallow' rules
Talk about disrespect.
#6 Fixed some issues with duplicate content detection
We changed the way that Sitebulb classifies canonicalized URLs in the background, which stops them being included in duplicate content reports (which they never should have been).
#7 'The internet is broken: reprise'
We found a whole bunch of new ways that developers can build shitty websites, which were breaking headless Chrome when Sitebulb tried to crawl them. We fixed Sitebulb so it now handles the websites, instead of unhelpfully falling over.
#8 Cell colours were the wrong way round on Crawl comparison export
The international rules of colours mean that red is bad and green is good. But in our crawl comparison export we were showing more forbidden URLs as GOOD and more redirects as BAD. That's the wrong way round dummy!
Released on 19th April 2018
Before starting with the regular fixes and updates below, we need to draw your attention to an update we've made to the Mac version of the software. It turns out that while running Sitebulb, users were no longer able to open Chrome. This is because Sitebulb now uses headless Chrome (see v2.0 update if this is news to you), and Mac OS does not allow you to open two instances of the same application.
We are astonished, and a little disappointed, we did not spot this during beta testing, but there's nothing we can do about it other than to come up with a solution. And our solution is to package a version of Chrome in with Sitebulb, which can run independently of your normal Chrome browser app.
The downside of this is that it does make the download and install size of Sitebulb a lot larger, which we know will annoy some users. We are hoping that this is a temporary solution, while we find something more fitting.
The upside is that you can use Chrome normally again! It will also prevent the Chrome update bug which some users experienced, and I emailed everyone about the other day.
Onto the regular stuff...
#1 'Export all the things'
Mr Russ Jones called me out the other day with the following tweet:
Firstly, who does this guy think I am? Gareth is the one that builds everything, I literally sit on my arse all day writing snarky and borderline offensive release notes copy.
Regardless, we made it happen for you, Mr Jones. Go ahead and add whatever data you want, then export. And yes, including all those sexy Lighthouse metrics like TTFB and First Meaningful Paint.
#2 Configured Sitebulb to ignore certificate errors
If you had a site that did this over HTTPS...
...Chrome would just shit itself and crash. It doesn't anymore!
#1 Sample audit working again
The sample audit has stopped working correctly. It is supposed to follow your depth settings, say 3 levels deep at 50 URLs per level, this should crawl a maximum of 150 URLs (or actually, 101 URLs, since there's only ever 1 URL at the first depth, but no one likes a pedant). It wasn't following these rules, and was just hitting the 'total maximum URLs.'
#2 Fixed over-zealous meta refresh detection
Sitebulb was looking for a meta refresh throughout the code, rather than just the head, which meant meta refreshes were triggering for <noscript> fallbacks in the body. It now only looks in the head.
#3 Tidied up formatting on crawl comparison page
Because it looked whack, yo.
#4 XML Sitemaps were not reported as being in robots.txt...
...when they actually were there! Thousands of users were up in arms about this one, understandably so if you ask me.
#5 URL Details -> Duplicate Content was not showing correctly on the Mac
And when I say 'not showing correctly', I mean 'not fucking showing at all.' Fixed up that bad boy good and proper.
#6 Two Page Speed Hints were not wired up correctly to URL Lists
Which meant the data was all out of sync. For those interested in the gory details, it was these 2 Hints:
- Style Sheet content is greater than 14.5kB
- Total combined Image content size is too big (over 1MB)
Released on 17th April 2018 (hotfix version)
We just push the button to launch v2.0, after 3+ months solid development...
Gareth: 'Give it a quick test, just to be sure'
Patrick: 'Ok, no problem, I'll test bbc.co.uk' (starts up Sitebulb)
Patrick: (enjoys watching colourful spinny thing)
Patrick: 'Oh. Fuck.'
Gareth: 'WHAT? What's wrong??'
Gareth: 'WHAT THE FUCK IS WRONG WITH IT?'
Patrick: 'Well...it's not crawling'
If you've seen an error message like this: Error - The 'Domain'='bbc.co.uk' part of the cookie is invalid
Then you've seen what caused me to lose the power of speech. It's now working again.
Released on 17th April 2018
This one is a big update, the biggest we've done since launch. We actually have a dedicated version 2 launch page, with all the details of the new update, so if you want to learn about all the new stuff we've added, head over there.
You can download using the links above, as usual, however please note the following:
DO NOT PAUSE, CLOSE SITEBULB, UPDATE, THEN TRY TO RESUME. FINISH YOUR CURRENT AUDIT BEFORE UPDATING.
Since the architecture has changed substantially, you won't be able to complete a paused audit on the new version.
Smaller Updates & Bug Fixes
As per the message above, all the main changes are on our version 2.0 page, so we're just rolling with a quick unordered list for the other smaller bits and pieces:
- We made the hard decision to remove the free version of the software. This was an experimental idea we included initially, hoping that users would use it as a 'stepping stone' between having a paid subscription and not. Turns out that didn't really happen as we had hoped, and supporting the free version just became a burden that was hindering our development of the full app, so we have removed that option.
- As requested by a number of users, you can now pick exactly which resource files you wish Sitebulb to crawl, in the audit settings.
- We've also made it so that Sitebulb will not crawl images linked via anchors, if you choose not to crawl images (or all Page Resources).
- URLs will no longer be triggered for duplicate content if they have rel="next"/"prev" pagination elements, which makes the duplicate content report a lot easier to use.
- The Sitebulb window will remember the size you were using previously and open up again at the same size next time.
- If you hover over a URL in a URL list, it will now show you the full URL in a little rollover box.
- We moved the Hint 'Images with Missing or Empty Alt Text' from the Page Resources section to the On Page -> SEO section, because it makes more sense there.
- We also improved that Hint, by making it no longer highlighting tracking pixel images.
- Fixed a rare issue where image names were being incorrectly encoded, due to the way .NET encodes URLs when creating a URI object.
- These Hints were being triggered when they should not have been: Mismatched nofollow/noindex in HTML and header.
- On servers running nginx, if Sitebulb encountered a timeout situation, it was reporting this as 'Error' instead of 'Timeout'.
- Fixed the Hint 'Title Tag Missing' (from On Page -> SEO Hints) as this was occasionally misreporting data.
- Fixed an export issue when you tried to export resource URLs from the URL Details panel.
- Sitebulb was not picking up the resource URL for .mp4 files when a subtitle track was also specified.
Released on 19th March 2018 (hotfix version)
#1 Fixed multiple issues with custom headers
The custom header setting (in Advanced Settings) was not working properly, so we fixed that. Also even in it's broken state it was not persisting when you went to 'Pause & Update Settings', or went to do a re-audit, so you had to enter the data again. And on top of all that, we'd left a typo in there ('customer' instead of 'custom'). FML.
#2 'Stop XML Sitemaps' now actually stops
We discovered a very frustrating bug that meant that if you decided to stop the crawl early and build the reports, Sitebulb could get stuck crawling sitemaps. If you hit 'Stop XML Sitemaps', it would just skip onto the next sitemap in the scheduler, instead of actually finishing the audit and building the reports. This is fine if you only have one sitemap, but if you have 5000 it's...less good.
Released on 7th February 2018
#1 Improved crawling speed
Everyone likes FAST. Fast is awesome. My little boy is constantly debating who is faster: Sonic, The Flash, Usain Bolt or Catboy (from PJ Masks). My money's on Catboy.
We've made Sitebulb's task scheduling more efficient, which has made it crawl faster. You can safely file this one under 'performance enhancements.'
#2 View folder link
Sitebulb writes data to disk, which means there's a directory on your local hard drive which contains all the data and export files. Occasionally, folk come to us asking how to find these directories. And to be honest, trying to explain it is a massive pain in't arse. So we just made a little button on the Audit Overview that magically takes you there.
#1 XML Sitemap Report pulling in Sitemaps from other Audits
A number of users noticed this peculiar issue, where Sitebulb would display the XML Sitemap details for a completely different website, when you looked at the XML Sitemap report.
This was an issue with the user interface, rather than the crawler itself, and has now been rectified.
#2 Truncated URLs and anchor text on URL Details page
We noticed that on some websites, the URLs and/or the anchor text was too long to fit in the allotted space we had allocated for it, causing them to tumble horizontally off the side of the page, rendering the page nigh on useless.
Enter, that versatile and truly magnificent typographical cliffhanger itself... the ellipsis.*
*Used in this instance merely to truncate. A waste if you ask me.
#3 Disable cookies now works properly
Sitebulb has cookies enabled by default, but you can turn them off in the Advanced Settings. Well you can now anyway - if you tried to do it before today you may have noticed that it actually did not turn them off. Whoops.
#4 Tiny typo fixed in audit progress text
Annoyingly, no users reported this glaring error, where the report building copy for 'Exporting Multiple <h1> tags' was showing the HTML:
Fortunately, I spotted it. Lucky someone is paying attention.
#5 HTML also in Compare Audits export
Just like the issue above that no one could be bothered to notice, we also discovered that HTML had ended up in the export from the Compare Audits function.
More shit and bollocks!
#6 Time of audit changed to only display hours:minutes
As sticklers for precision, we've always liked to display the 'time of audit' to the nearest second. Upon hearing industry rumours about one member of the Sitebulb team and his 'obnoxious pedantry', we've made the controversial decision that this degree of accuracy is wasted on, and quite frankly, unappreciated by, the entire Sitebulb community.
#7 Typo on Internal URLs Hint
There was a typo on one of the Internal URLs Hints, where the word 'usual' should have read 'usually'. Disappointingly slack on my part.
#8 Pagination was broken on the Dashboard
Curiously, the pagination links at the bottom of the Dashboard stopped working, so you could not browse through old projects. This was particularly annoying for users such as myself, having 320 Projects in my list.
#9 The Hint: '<head> contains invalid HTML elements' was firing for commented out elements
One of our most popular Hints* was giving false positives when encountering commented out elements. Some schoolboy shit right there.
*This Hint scored in the top 3 Hints of all time, based on a survey of 3 people.
#10 Pause and update settings was forgetting the crawl limit
If you set a crawl limit (e.g. 100,000 URLs) before crawling a site, then mid-crawl decided to pause and update the crawl setting, Sitebulb would very annoyingly forget the crawl limit and just keep crawling beyond the limit.
#11 URL Details view showing URL listed in the same sitemap multiple time
If you clicked through to the URL Details page, and had XML Sitemaps turned on, the same sitemap would be listed multiple times. This led to a number of confused users, asking irritating support questions such as 'what am I doing wrong?' and 'why is this happening to me??' and 'what did I do to deserve this?'
#12 Exporting large audits was not always working
Very few people will have seen this, but if you tried to export very large audits the software would sometimes throw an error.
Released on 9th January 2018 (hotfix version)
#1 Meta description length default changed
Since the beginning of time, every SEO on the planet has been conditioned to write all meta descriptions in the range: 140 characters < x < 160 characters. It has got to the stage where it is impossible to even write a regular sentence that falls outside these strict bounds (check these ones and see for yourself).
But then Google came along and decided to change everything – 'meta descriptions can now go up to 320 characters', they said – which I’m sure you’ll agree makes for preposterously long, awkward, unwieldy sentences that just go on and on and feel completely unnatural to both reader and writer, present company included.
So, Sitebulb has followed suit, changing the default 'too long' setting for meta descriptions to now be > 320 characters, following Danny Sullivan's tweet (Please note: this affects only the default setting, which was previously set at 170 - if you have already overwritten the default, yours will not change to 320).
#1 Sitebulb now correctly opening on startup
Occasionally, users would end up in a situation where Sitebulb would not always open up properly first time around, meaning they had to go and start the app up for a second time. If this never happened to you, you're in luck, but let's all just agree that it sounds pretty fucking frustrating. We've resolved it by completely rebuilding the startup procedure, for both Windows and Mac.
Released on 1st December 2017
#1 We sold our souls to the Apple
When we first started building Sitebulb, it was our single guiding vision to have one universal user interface that looked exactly the same on both Windows and Mac. This was the one fundamental principle we knew we must stick by. We spent hundreds of hours perfecting the design - everything from the stunning report graphs to the delightful little X button in the top right hand corner.
Windows users positively fawned over Sitebulb's exquisite design.
Mac users, however... well, that was a different story. "We really love it guys", they'd say, "but..." (there's always a but). "I can't get enough of those graphs, but...", "Those crawl maps - magnificent! But..."
"...BUT WHERE ARE MY TRAFFIC LIGHTS?????"
Day in, day out. Relentless. 400 emails a day about it. They'd tweet me, Facebook messenger me, Slack me (is that a verb yet?), they'd send me SMS messages like they were actually my friends. One guy hand delivered a letter.
I'm ashamed to say, we gave in. Here. Here are your bloody traffic lights:
Yes, this will make half our users happy. But we have paid a toll, a heavy toll indeed. We have abandoned the very principles we lived by, the values that stood at our very core. And for what? Commercialism. Shame! Shame! Shame!
Peer pressure can do this to a man.
You take care out there, kids, it's a tough world.
#2 Added user agent and language data to preferred domain check
You may come across scenarios during your day-to-day technical SEOing, where the preferred domain results do not match your expectations. 'Redirecting to m.example.com, but why?!' you may be known to cry.
Well, Sitebulb now gives you more details of the HTTP request used - the User Agent and the Accept Language - both of which may impact how the site handles the request. You can configure these by clicking change settings, which will take you to the Global Settings page, where you can adjust both of these things.
#3 Added Accept Language to configurable Global Settings
Following #2 above, we couldn't very well tell you the language but not allow you to change it. So now you can, from the Settings -> Crawler options. The first time you use the software, it will auto-detect you language settings and set the default language accordingly (but it won't do it again after that).
#4 On the All Hints page, changed the 'All Hint Data' export to 'Export All Data'
The old export was near useless, with about a million worksheets you needed to tab through. The new export is actually a collection of all the individual hint exports, along with all the reports for each section of the audit. Way more usable.
#5 'Crawl alternates' is no longer on by default
In the 1.6.0 update, we added the ability to crawl alternate URLs, and set this as 'on' by default. Turns out, this is pretty fucking annoying. WordPress sites, for example, will spew out oembed alternate links for every page on the site. So from one crawl to the next, you would see page totals doubling. This feature is best reserved for when you actually need it, so going forwards you'll need to turn it on in the Advanced Settings.
#1 Reduced timeouts to preferred domain checks
Some users reported an issue with the little check Sitebulb does when setting up a new Project, to determine the preferred domain, where the check would take a long time (> 30 seconds). This would happen when one (or more) of the 4 options was completely inaccessible, and was timing out. We've reduced the timeouts on this check, so these edge cases will still take a little longer than normal, but only a few seconds in total.
#2 Finally resolved occasional issues with exporting/importing
As we've been improving Sitebulb, we've been making it faster wherever we can. Turns out we made it too fast in some places, so the export/import overlay could not catch up. We added a slight delay to the export building process, which has resolved this issue.
#3 Split out Hint exports to be indexable/non-indexable
We recently added exports for every single Hint, but in the individual exports they would include both indexable and non-indexable URLs. Doh! We've split them out now to only include the 'right' stuff.
#4 Fixed domain resolution for South African TLDs
If you entered a website with a South African TLD, after the preferred domain checks, Sitebulb was suggesting you crawl https://co.za. The fuck, Sitebulb??
#5 Fixed mis-firing of "<head> contains invalid HTML elements"
Our new ohgm-inspired Hint was accidentally firing when it found references to <p> in a script in the head. Our bad.
#6 Fixed crappy word counts
Occasionally Sitebulb's algorithm for figuring out the content area (and thus, the 'Content Words' and the 'Template Words) would get one mixed up with the other, the other one mixed up in both, and both mixed up in all. Normally such errors are due to really shit HTML, but in this case we found an example that was entirely Sitebulb's fault.
#7 Duplicate Content export button now wired up correctly
From the 'All Hints' page, the export button associated with Duplicate Content did nothing, because it was not wired up to do anything. It is now.
Released on 24th November 2017
#1 Preferred domain check
When you start a new Project, Sitebulb will go off and check that the start URL is the right URL, right after you enter it. It checks the http/https/www/non-www versions to see how each responds, and advises you which it thinks is the best option.
If you tend to copy/paste URLs from your browser, you may not notice a lot of difference here. But if you're a URL-typer, this could save you a ton of wasted time.
Of course, this may also reveal some issues that you need to get fixed!
#2 Crawl faster (with threads)
We've been rather reluctant to add the ability to crawl with threads, for very good reason, as this method of crawling is notoriously bad for crashing servers and pulling down websites. But it's such a regular request that we figured we'd better do something about it.
So we've added the ability to crawl with threads (up to a maximum of 25, beyond which point the tool does not really work any faster).
We've written a guide on How to Crawl Really Fast as well as its counterpoint: How to Crawl Responsibly, which emphasises all the reasons we were reluctant to do this in the first place. We'd encourage all users to read the second piece, as it should also give you a better understanding of how the tool works.
#3 Better 404 Testing
Up until now, we've had 404 tests visible on the URL Resolution section on the Audit Overview. The results they've thrown up have caused us a number of questions, so it's clear that what we had before was not clear enough. Clearly.
In order to be clearer, we have moved the 404 Tests to a tab on the Internal URLs report. Not only that, we've added a lot more tests, and more details about what we are testing and what the tests show.
This constitutes much more thorough 404 testing - checking for pages, folders, images, CSS, text files and XML. Each should respond with a 404 response.
We have *ahem* deliberately left our site misconfigured, for the purpose of this demonstration. The sacrifices I make for you guys eh?
#4 More crawl control: crawl canonical, pagination and alternate URLs
Some users have asked for more control of what Sitebulb will crawl, specifically relating to canonicals and pagination links, so we've added some new crawl controls in Advanced Settings (under the 'Robots' tab).
By default, Sitebulb WILL schedule and crawl any canonical URLs, alternate URLs or pagination URLs that it finds - either in the <head> or in HTTP headers. In order to STOP Sitebulb crawling any of these URLs, you'll need to tick the appropriate box.
Note that if a URL you wish to stop Sitebulb crawling is also linked to via anchors, it will still get scheduled and crawled. You'd need to use 'Excluded URLs' in Advanced Settings.
#5 Pre-audit notifies you about site features
#6 Loads of new Hints
Gareth's favourite saying is 'the internet is broken.' While most blokes while away their weekends watching sport or drinking beer, Gareth prefers to sit in a darkened room, trawling the internet to find new examples of shitty web pages. He's been known to exclaim, without warning, 'JUST LOOK AT THE SIZE OF THAT HEAD.'
New Hints as follows:
- HTML is missing or empty - literally, pages with absolutely nothing on (consider exhibit 1 and exhibit 2)
- Has link with a URL referencing a local or UNC file path - people do some stupid shit when putting together web pages, like linking to files that no one in the world can actually see.
- Has link with a URL referencing LocalHost or 127.0.0.1 - say it with me, 'the internet is broken.'
- Has a link with whitespace in href attribute - like, if someone accidentally put a space at the end of the URL, à la href="https://example.com/page/%20". How embarrassing.
- Next/Prev Paginated URL is canonicalized to different URL - I mean, what are you even playing at with this shit?! If you canonicalize a paginated page, Google is not going to crawl the rest of the paginated series. Dumbass.
- Noindex found on rel Next/Prev Paginated URL - Oh. This one is not that bad actually. But nice to know, I guess.
- Internal/Resource URL is part of a chained redirect loop - This is more like it. Redirect chains that go round in a big loop, like 1 -> 2 -> 3 -> 4 -> 1. Internet = broke.
I'm not finished yet, we have three more. These were all inspired by serial-internet-breaker @ohgm in his latest escapade 'Breaking the Head (Quietly)'. Stop reading my drivel, go and read his instead. Then come back and appreciate these 3 new Hints:
- <head> contains a <noscript> tag
- <head> contains a <noscript> tag, which includes an image
- <head> contains invalid HTML elements
We'll be getting some Sitebulb t-shirts printed with #internetisbroken if enough of you start tweeting out the hashtag. I'll be the judge of when enough is enough, thank you very much.
#7 Search for a Hint
You can view all Hint data via the All Hints screen (top right nav) or via the Hints tab on the Audit Overview. Both of these now have a 'search for a Hint' box. Note that this only searches triggered Hints, so if you search for a Hint and it doesn't come up, then that means it was a green tick pass.
#8 Autoscroll to previous view scroll location
You're using Sitebulb in full blown investigation mode, you're out to cracks some heads together today. You're checking all the reports, inspecting each and every graph, looking for something, something. You know it's there somewhere. A string for you to pull on. Then all of a sudden, you feel your spidey sense tingle. A graph. A pattern. That might be it. It might just be. You have to know more. So you click, and you're there, IN the data. You're Neo now, and you can see everything for what it truly is. You can feel it, you're right on top of it. You pause, not wanting to get ahead of yourself, trying to slow the heart beating out of your chest. You want to double check the graph, make sure you've got it right. You hit 'back' and WAIT. WHAT?! WHICH GRAPH WAS I LOOKING AT AGAIN?? NOOOOOOOOOOOOOOOOOOOOOO!
No one can be told what the Autoscroll to previous view scroll location is. You have to see it for yourself:
#9 Select Google Data for up to 90 days
Thus far, Sitebulb has offered a paltry 30 days worth of data to check. With this 3X update, you can now select up to 90 (ninety) days worth.
#10 Added Google Analytics Page Timings Data
Sitebulb has a Google Analytics integration. It also does Site Speed testing. We figured, why not mash these things together and pull out the GA Page Timings data?
This is what you'll get in the Site Speed report (on a separate tab):
You'll get this by ticking 'Site Speed' and selecting a Google Analytics profile, when setting up the Project.
We plan to iterate on this feature to make it more useful, so please hit us up with any ideas you have about it.
#11 Added Happiness feedback to toolbar
Here in the world of desktop SEO software, we live in a vacuum for most of our lives. And no, I don't mean we live in a fucking vacuum cleaner, I mean that we don't really get to talk to our customers very often.
So we are always looking for new ways to elicit feedback. Our latest idea is a happiness button in the toolbar:
#12 Low disk space warning
Not everyone is aware that Sitebulb does not hold data in RAM (like most desktop crawlers) but writes to disk instead. This means that if you're crawling a big site and you don't have much disk space, bad things are going to happen.
To mitigate this risk, Sitebulb will now warn you if you've got less than 5 GB space remaining - which is where you might want to start thinking about it.
If you're the type who likes to live life on the edge, you can simply dismiss this message with the handy 'X'.
#13 New column added: "No. Outgoing Navigation Links"
This is part of a wider plan, to offer more visibility to internal linking. It starts with this acorn, splitting out navigation links on a page. You'll find it as a new column in URL Lists, so feel free to have a play with it and tell us what you think.
Otherwise... watch this space.
#1 Fixed Bosnian language code on hreflang check
In the International report, the language code 'bs' was being incorrectly labelled as invalid. Thanks to the helpful user who pointed out that Sitebulb was talking complete BS! (sorry)
#2 Corrected colour coding on compare audits
Our vaunted compare audits feature includes helpful colour coding, so you can quickly see what has improved or disproved unproved got worse (using the universal colours for good and bad: green and red). Previously it was doing this in a 'dumb' way, where any increase was green. Now, using a ground breaking combination of AI and machine learning, it correctly figures out things like 'less 404 errors is actually a good thing.' Ground = broken.
#3 Fixed the All Hints page showing SEO and HTML Hints grouped together
The Sitebulb community has been up in arms about this one! Hundreds of you pointed out that the On Page section is split into two Hint groups: SEO and HTML, and combining them as one on the All Hints page was an abomination. Put your pitchforks back in the pitchfork cupboard, people, for they are now separate.
#4 Fixed Hint for 'Base URL Malformed'
Sitebulb was incorrectly claiming that <base href="/"/> was an illegitimate base URL.
#5 Changed sheet name for Hint: Has only one followed internal linking URL
On the export for this Hint, the sheet name used to be called 'Has 1 incoming link'. This has been changed to 'Pages with only 1 linking URL', which is more accurate, and because Sitebulb users are magnificently anal and we love them for it.
#6 Syntax support in exclusion list
One of our users noticed that $ symbols were not being correctly recognised when used in the 'Exclude URLs' setting, so URLs were not being excluded as they should. This has been resolved, so feel free to throw your $$$ around like a mother fucking gangster bro.
#7 Fixed the back button on Audits
If you followed this path: Dashboard -> Project -> Audit, and then hit the Back button, you'd be returned to the Dashboard, rather than the Project.
#8 Fixed: Multiple GA codes Hint
This Hint was occasionally firing false positives, for instance if a GA code was referenced in a script. We've now fixed it so that it only reports if 2 or more different GA codes are found.
#9 Fixed Hint: URL receives follow and nofollow links
In some cases, this was not correctly reporting the nofollow links, making the Hint pretty useless.
#10 Fixed: Cache headers reporting invalid
Cache headers were not being reported correctly, which was incorrectly firing this Site Speed Hint: Set long expires cache headers.
Released on 3rd November 2017 (hotfix version)
#1 Fixed export on compare audits
Per update 1.5.0, we added a ton of new exports, changing the way exports are built in the process. This managed to break the export associated with 'Compare Audits', which was pointed out to us 13 seconds after we launched 1.5.1 (thus inspiring our new hotfix release).
#2 URL resolution checks (404, Non-WWW, etc) on some sites were being rejected by the server
You probably won't notice much difference here, but the little URL Resolution checks on the Audit Overview were failing on some sites and showing inaccurate results. Rest assured that everything is ok now.
Released on 2nd November 2017.
#1 Historical trend data via sparklines
How awesome is this?
Carry out more than one audit within a project, and you'll be presented with little sparklines everywhere which show changes over time. They appear on all the main 'Insight' numbers as below, and alongside all the Hints.
Up and to the right people!
3 important things to note:
- This will work for new AND existing audits, so you check it out right now if you have a Project with a few audits on.
- Even if you delete your old audits, this history will persist so the sparklines will still show your trend data.
- If you're just dialing in your technical SEO, your sparklines will remain flat forever, making you look like the lazy, no good, piece of shit that you are.
#2 New report: Duplicate Content
Duplicate Content is one of those things you have to check on pretty much every single audit, and when it gets out of hand can cause a whole bunch of problems. Previously, we had a tiny little section for this, squashed into the On Page report. We've gone the other way with it now, and it's got a full blown section all for itself.
This is the type of thing you can enjoy seeing in the duplicate content section (plus a whole load more):
A word of warning, however. Despite all the pretty graphs, you can't click through and view duplicate content data in URL Lists. You will need to grab the export file instead (the green button at the top), which will give you everything you need to fix your dup content problems.
The reason for this is because URL Lists are entirely inadequate for communicating duplicate content data. It's not intuitive why not, so I'll explain a little further. URL Lists are built to display 1-to-1 data. There is 1 URL per row, and all the data on that row relates only to the unique URL in question.
However, it we take duplicate Page Titles as an example, in that instance we need to say 'here is a URL with the page title of '10 Cat Pictures that are so cute you will just cry and then you won't believe what happened next' (for example), but here are also 73 other URLs with the exact same title.' So it is more like a 1-to-many relationship that we need to communicate, where the 73 are somehow grouped and associated with the 'original URL.'
Anyway, we are working on a better way of displaying this, so for now just use the export instead.
#3 New charts added to the On Page report
To fill the void left by duplicate content, we've added some new charts to the On Page report, highlighting critical on-page SEO elements.
There are 6 new pie charts in total, displaying data about titles, meta descriptions and H1s - to do with both length and presence thereof - which should hopefully make it easier to pick out optimization opportunities.
Additionally, from the global settings area, users can define the values for 'too short' and 'too long', so you can be captain of your ship, vis-à-vis the length of on-page SEO elements.
#4 Sitebulb users are no longer forced to download the latest version!
Jonny Rockstar, a self-proclaimed SEO guru, takes a call from a client in his swanky attic office:
Jonny: You've reached Rockstar SEO, I'm Jonny Rockstar, how can I help you today?
Client: Hi, I was wondering if you could help me with my SEO?
Jonny: Today's your lucky day my friend, you're reached the right place! I am the world's top expert on SEO don't you know?
Client: Oh wow, you sound amazing. I wish I was you. So you can help me?
Jonny: I can do anything. What's your website, I'll take a quick look now for you.
Client: Great! My site is secondhandsocks.com
Jonny: Sure, I'll just fire up my...Oh.
Client: What's wrong?
Jonny: FOR FUCK SAKE. I NEED TO UPDATE AGAIN?! I JUST DID IT LAST WEEK. YOU STUPID PIECE OF SHIT.
Client: (hangs up)
Sound familiar? Well, from this version onwards*, you won't be forced** to update if you don't want to***.
* The change only exists in the 1.5.0, so you kinda will be forced to update this one. And by kinda I mean completely.
** We will however notify you of new versions in future, and strongly suggest you install them.
*** Also if we release a critical update, we actually will force you do update it. Because critical.
#5 Export internal links directly from URL Lists
We're going to town with exports in this update. This one is a cheeky little update to URL Lists, that allows you to instantly export incoming internal linking URLs (to the URL in your list).
Of course you could just click the blue 'View URLs' link and export from there, but this saves you 1 less click. Think of the dozens of seconds this will save you across a whole year! This thing is like a freaking time machine!
#6 Added export button to individual Hints
For even more exporting joy, you can now export individual Hint data directly from URL Lists.
...and on the Hint page itself (#7)
The important thing to note about both these Hint exports is that this is not always equivalent to exporting the rows from the URL List. This is because not all Hints are created equal.
Some Hints need special treatment (a bit like duplicate content, see above), so those Hints have customised exports. Some examples: broken links, images without alt text, and redirects. These are more cases where there is a many-to-1 relationship, so the exports are built to handle this.
#8 New copy experiment when building PDF reports
PDF reports take about 30 seconds to build, and we used to have a message up there encouraging feedback while you wait. We've just changed it for something a little more fun. Let us know what you think!
#9 Scroll to adjusted column
This is one of those tiny UX changes that new users will not notice at all, but Sitebulb veterans will love (Aside: can you be a veteran for a product that's less than 6 weeks old?).
Anyway, whenever you adjust a column in URL Lists (i.e. sort or filter), Sitebulb will now scroll you along back to that column. Previously it would dump you back on the first column, which was très annoying.
I'll demonstrate through the medium of gif:
#10 Free users can now download exports
We have a free plan, for crawling sites up to 500 URLs. This is perfect for users with small sites, or for people who hate paying for things. We'd left a restriction on there meaning free users could not process any exports. We decided this was a bit tight, so we've now enabled exporting for all free users.
Who's tight now eh?
#1 Fixed pause and resume!
Sorry! I know, I know. Pause and resume was broken, so if you paused an audit, you couldn't resume :(
We feel partially responsible for this as well.
Maybe in time we can forgive each other...?
#2 AMP Hints had a (small) makeover
A number of people pointed out that these two AMP Hints were bogus:
- AMP URL is not indexable
- AMP URL is not in a sitemap
#1 was happening because of the way general URLs are classified as indexable or not indexable, which is of course impacted by canonicals. Since AMP URLs are MEANT to have canonicals, this Hint did not make a lot of sense.
#2 was only advisory anyway, but we removed it because I got fed up of people pointing out to me that 'John Mueller said you don't need AMP URLs in sitemaps therefore you're wrong.' I don't need the hassle in my life.
We also removed AMP URLs from appearing in a lot of the other reports, as they really need to be treated in their own AMP-esque context.
#3 HTTP headers now correctly parsing canonical link
We came across a site that was setting and image source link element AND a canonical link element in the HTTP headers. Sitebulb was not correctly identifying the canonical, happily firing off warnings left, right and centre.
Magnificent as this Hint is, Sitebulb was actually wrong to fire it in this instance, as the canonical setup used was perfectly valid.
#4 You can once again click on nodes in Crawl Maps
I have a bone to pick with you. All of you. No one bloody told me that the 'click node to view URL' feature in Crawl Maps had stopped working.
Or is this your collective way of telling me you didn't know it could do it??
It's awesome, let me try and sell it to you... -> Click on any node and BOOM straight to the URL details!
How did I do?
#5 Special characters removed from all export sheetnames
We should have seen this one coming. Mainly because we'd already spotted it once before relating to slashes (/s). Some Hint exports were failing, because we'd left some special characters in the sheetnames. F$&k my life.
#6 Maximize no longer cuts off scrollbars
Some people are overly fixated on size. Almost like they're covering up for something...
Anyway, some users like to maximize Sitebulb so it takes up the whole screen. Which is FINE. Unfortunately Sitebulb was penalising these size-obsessed freaks by cutting off the scrollbars on the URL Details page, making it look like you couldn't scroll down.
#7 Fixed issue where some exports were getting stuck
In the last release, we optimized the exporting process to make it a lot faster. Turns out we made it too fast, and the UI was actually struggling to keep up with it. This will have resulted in some exports getting stuck (either at 2% or 95%, for some reason) on some audits, some of the time. If I had to put a number on it, I'd say it affected precisely 4.1279% of all exports carried out.
#8 Typo fixed in BOM Hint
A couple of days ago I shared something cool with the world - the BOM hint that Sitebulb triggers (hat tip to Glenn Gabe for his work on this). Whilst most were suitably awestruck, one individual took it upon himself to publicly shame us for our child-like spelling mistake.
What gives? Nerd.
10 new features and 8 fixes in one update?! A big one indeed (that's what she said).
Released on 13th October 2017.
- We've made crawling and processing data more efficient, so Sitebulb will now crawl faster, and on most websites, is better able to approach the max URLs/second limit you set. We've also changed the way crawl speed is reported, switching to a cumulative average, and added a new metric 'Current TTFB.' If you see Current TTFB increase, you will see a decrease in crawl speed, for the two are as entwined as a dragon queen and a Northern bastard.
- Since speed is on the agenda, we've also improved the way the user interface queries data, making it imperceptibly faster. You're welcome.
- It appears that no one is really using the report exports we spent a million hours trying to get right (thanks, guys). So we have tried to make them more visible by building a 'Bulk Exports' page, which explains each one and allows you to download them all (including a 'download all' button as well).
- Since we needed to add another button, we also figured you'd appreciate a massive UI change. If you look to the top right, you'll now find Filtered URL Lists, Bulk Exports, Printable Reports, Crawl Maps and All Hints.
- Advanced settings now pops up underneath the main settings when you click the 'Advanced Settings' button, instead of shifting you to a new screen, which some users were finding unclear (and we're all about clarity here at Sitebulb).
- Occasionally, if a server hates you crawling it, it will return a 429 HTTP status (too many requests). This is the server's was of saying 'would you kindly fuck off now?' Sitebulb will now take this advice, and stop trying to crawl the site.
- Sitebulb was claiming the hreflang nn-no was invalid, when it is in fact perfectly valid (nn is Norwegian Nynorsk, as you were no doubt already aware). Tilgje meg.
- The Indexability export was not pulling through the Indexability Hint data, which is should have been. This is because it was still looking for the 'indexation' data, which was renamed recently to avoid Barry Adams losing his shit every time the word was uttered.
- Sitebulb was forgetting crawl and analysis settings when re-auditing, which was super annoying. Like when you go upstairs to get your phone charger and get distracted by a fly relentlessly smashing itself against a closed window and in a blind rage you chase it across the house with the sport section of Saturday's Guardian, swinging aimlessly (and frankly, embarrassing yourself) and you have no idea what you were writing about when you first started this sentence.
- Tidied up a few of the exports, which had got a bit unruly.
Released on 3rd October 2017.
- New feature alert! Printable PDF Exports are now in town. You asked (again and again and again), so we delivered. Since this is a real big boy update, it deserves more than a snarky comment from me, so it's got it's own fully fledged user guide. Because my little comment isn't good enough.
- I mean, let's be frank, the user guide literally says 'click this button and save it', you can't work that out on your own?
- I Am Jack's Inflamed Sense of Rejection.
- Mac users! You can now hide the app with command+H, and quit the app with command+Q. Now, please, enough with the death threats ok?
- Regex enthusiasts, we've got something for you too! You can now filter columns using regex. It's a bit slower than the normal filtering method, but at least it's there now, ok?
- Adjusted the Dashboard to allow you switch between Projects/Imported Audits/Paused Audits/Queued Audits/Interrupted Audits, which makes it a lot clearer which state each audit is in currently.
- Added a link to our new Feature Requests board - in the left navigation on the dashboard.
- Adjusted the robots.txt checking, so that it now treats this sort of rule: Disallow */example-path/ in the same manner as Google. Note that this is not listed as a fix, because Sitebulb has been doing it correctly this whole time. Robots commands should always start with a / but it seems Google knows what you are going for so just lets it fly. Sitebulb will do the same, but only because it wants to. A dragon is not a slave.
- Fixed an annoying issue that has been lingering around for a while, but we couldn't find out what was causing it. From the Audit Progress screen, if you viewed Realtime URLs and then clicked the back button, Sitebulb would start another audit. Yeah, it's not really meant to do that.
- This one stumped a few of you as well (sorry). The hint HTTPS URLs links to HTTP was wired up totally wrong. Y'all were emailing me asking what you'd done wrong, and it was not you, but us. Stop blaming yourself. It isn't your fault!
- URL Details were not showing nofollowed incoming links. This is fixed now. Page Rank Sculptors, breathe easy.
- Adjusted the duplicate content checking algorithm to stop it flagging some false positives. It was claiming some URLs were duplicate, that had completely different content on them. Hardly Panda arousing.
- Improved checking for empty or missing H1s, as it was missing some in certain circumstances.
- Fixed a typo on the setting page, we'd gone all Slumdog 'Who Wants to Be a Millonnaire.'
- Over the last week we've had a few instances that have experienced database corrupting, which is a massive ballache. We've not been able to recreate this issue, but we've made a couple of critical updates to the way we write and read data. Fingers crossed that's it sorted. Symptoms of this are: Sitebulb doesn't fucking work at all. Please let us know if this happens to you.
Released on 27th September 2017.
- We've added a new all singing all dancing Hints section to the Overview so you can see exactly how many Hints were triggered across all the reports (like this). Mind = blown.
- Also added a little flag to show how many Hints were triggered on each report (like this) so you know when you done fucked up. If you happen to actually be good at your job, you'll earn Sitebulb's respect (like this). By the way did you see whose site that was? Not so cocky now are you?
- We have adjusted the Hint 'Contains links with no anchor text', so that Sitebulb now only looks at internal links, where previously it was looking at any old link. This was adjusted because all Sitebulb users that we surveyed said they give zero fucks about optimising anchor text on external links.
- The Link Equity Score has been changed from having 4 decimal places to 2 decimal places, because brevity > precision 99.7645% of the time.
- When you pause an Audit and go to 'Update Settings', we've added a cancel button down at the bottom, because one user reported being frightened by the Save button (note: those may not have been his exact words).
- When you export any of the charts as images, they will now contain the name of chart in the filename, rather than the accurate-though-pretty-fucking-useless alternative: 'chart.png.'
- Fixed a crazy error which happened if you enabled Site Speed and looked at the export, where there would be tons of additional 'B Score Worksheets'. I'll be honest, it wasn't really meant to do that, so now it doesn't.
- In a CRO experiment, we tried to encourage users to sign up for multiple subscriptions by opening 31 browser tabs when you clicked 'Update to Pro' (when on a Sitebulb Trial). Astonishingly, the experiment failed.
- On some URL lists, if you added/removed columns, Sitebulb would totally ignore you and just displayed whatever it wanted to show, like a petulant teenager. We told Sitebulb to stop being an obnoxious little shit, and this behaviour has miraculously improved (for now...)
- The protocol column was displaying everything as http, even when it was actually https. This is because Gareth, in his infinite wisdom, had trimmed the database column down to 4 characters, because he's stuck in 2007 and literally 'forgot' about https. In other news, 2007 was TEN YEARS AGO. We know it's shocking but don't worry, we're here for you.
- Fixed a small typo that one of our users noticed - in the settings tooltip for XML Sitemaps - it said 'and' instead of 'any'. I was tempted to change it to 'anal.'
- In the 'mixed content' Hint, we'd built the export to include a slash (/) in the filename. Anyone who's ever used Windows ever knows that WINDOWS HATES SLASHES. We should have known better.
- Some servers were throwing a hissy fit when reading our Accept request headers and responding with a Internal Server Error 500 (which meant you couldn't even do the pre-audit, never mind crawl the site). 99.9999% of you would never have noticed, but we've fixed it for the 0.0001%, because precision is very important at all times.
Released on 18th September 2017.
- Added some more data to the hover state on Crawl Maps. Now, when you hover over a node, you'll also see the page title, 'First found on URL' (i.e. the parent URL) and the Link Equity Score. Because more is always better.
- Changed the name of the 'Indexation' report to 'Indexability' instead. Internally referred to as 'the Barry Adams update.'
- Added the option to specify a custom user agent. You can do this either from the Advanced Settings for a single Audit (comme ci) or from the global settings for every Audit (comme ça). Et voilà!
- When you queue Audits, you'll now see a red icon alongside the 'Queued Audits' left navigation button, which shows the number of queued Audits you have. So it will show 1 for when you've got one queued Audit, and then 2 when you've got two. Basically it will increase by one every time you queue another Audit. You see? One more every time. It's just math(s).
- When you terminate an audit, the old warning message we had was 'Terminate the audit and delete data collected'. Some people thought we were bullshitting about the 'delete data collected' bit and called our bluff, only to be bitterly disappointed when their data wasn't there. Since 'I told you so' does not make for a favourable customer experience, we've re-jigged the warning message to make it clearer what will happen. We included some words in bold and everything.
- After our previous update a number of beta users were unceremoniously dumped onto our 'free' tier. They should have ended up on a 'Free Trial', which is a different thing entirely, and is pretty much indistinguishable from beta. Remember that 'soft launch' we were going on about? Yeah, this is why.
Released on 14th September 2017.
- Period fans, rejoice! (no, not that kind of period). I mean these little badgers ->. We call them 'full stops' in good old Blighty. Anyway, as per beta feedback, you can now have a period in your Project name, should such urgent need arise. Read that last sentence as you see fit.
- Several months ago Gareth and I had a very lengthy 'meeting' about whether we should use the magnificent 'log in/out' or the banal 'sign in/out'. The support for both options was fierce, and the argument raged on into the wee hours. Bloodied and limping, Gareth finally announced 'sign in' as the victor, before collapsing in a heap. Despite losing an arm in the battle, my belief in 'log in' was unwavering, and I managed to sneak her onto the 'Account' screen in the tool. It would be an understatement to say I suffered Gareth's wrath when he recently realised. I remain, persisting in a swimming pool of catharsis, overcome with grief for my fallen friend, as 'log in' is no more :(
- Added a 'Support' button to the top right of the user interface, so you can always easily get in touch with support.
- Curiously, Sitebulb had stopped asking for authentication details when trying to audit a site which uses HTTP authentication. The sharp witted among you will realise this means it couldn't actually crawl these sites, which is a bit of a problem for a crawler product like Sitebulb. It will once again ask you for authentication details (which are then saved against future re-audits, and available via Advanced Settings - FYI).
- In very specific conditions, the HTML parser was breaking, and it isn't anymore. That's all we know.
- When you Pause an Audit and go to 'Update Settings', we had a devious Back button on that screen. This duplicitous fellow would sometimes take you back to the Audit, and sometimes back to the Dashboard. The wily trickster was proving difficult to control, so we banished him completely (don't worry, you can still return to the Audit by pressing 'Save Settings' at the bottom of the screen).
The Beta Notes
Version 1.0.19 (beta)
Released on 4th September 2017.
- Added a new global settings area (Dashboard -> Settings), which currently looks a little sparse, having only one thing in it, but in the future we will use this new real estate to house some important global settings some of you have been asking for (e.g. a global proxy setting). Not to get ahead of ourselves, the current global setting we have just added is 'Default User Agent'. This allows you to over-ride the default User Agent that Sitebulb uses (handily named 'sitebulb') and instead choose whichever one you want! This will then become the User Agent used for the little pre-audit check at the start, and the default for all audits, unless you change it in Advanced Settings. Muy emocionante!
- Very occasionally Audits would get interrupted, because your computer would restart, or the internet would drop out, like when it's 1999 and you're playing jittery-as-fuck Grand Theft Auto online against your mate on your 56k dial-up modem, and the line would drop because your mum picked up the phone so she could ring her sister, who she just spoke to yesterday anyway. These sort of Interrupted Audits should always allow you to resume, but sometimes they wouldn't, as Sitebulb would mis-classify them as Errored Audits, and the 'Resume' button would be missing. This shouldn't happen anymore. GOURANGA!
- Fixed a very annoying issue where pause/resume would cause Sitebulb to skip the rest of the crawl and move onto building reports once you pressed resume. I say annoying because this would only happen if the number of queued URLs happened to be larger than the crawl limit, and I'd tried many many many MANY MANY MANY many tests before I realised it was this specific scenario that was causing it. In fact it wasn't even me who realised it, it was one of our magnificent beta testers. I might as well give up and go home now.
- Related to the above, we also fixed an issue where the crawl limit would get accidentally triggered early and stop the crawl (it wasn't strictly pause/resume related, yet did result in a similarly embarrassing premature climax).
- In an Audit, if you were on the Overview and you searched for a specific URL and then clicked through to view the URL Details page, you could then not navigate properly within the modal window. Specifically, if you went to look at Incoming Links, for example, you then couldn't go back to the 'Overview' of the URL Details. According to Gareth this is because the URL Details view opens in a modal, and so Sitebulb would have two references to 'Overview', incorrectly believing it was already on the modal Overview. I suggested it would be straightforward to just rename one of the Overviews, to 'Jimmy' perhaps, or 'Roger Federer'. But he decided to fix the underlying problem instead. Spoilsport.
Version 1.0.18 (beta)
Released on 30th August 2017.
- To make crawl limits more transparent/understandable, and to prevent unnecessarily saving tons of unwanted data on bigger sites, Sitebulb will no longer save disallowed URLs by default. If you want this data, you can switch on a new setting in the Crawl & Audit Setup called 'Save Disallowed URLs.' Good name huh? I came up with that. Ok I didn't really, but I could have if I wanted to.
- Improved the speed that Link Equity Score is calculated. Yes, you could argue that it was simply TOO SLOW before and we have simply fixed the thing that was making it slow, but I'm a glass half full kinda guy.
- When aforementioned Link Equity Score was taking fucking ages on larger sites, users were trying to go back to the Dashboard and look at other Audits, as is their wont. But they were getting the Spinny Spinner of Death, followed by the Grey Screen of Death (just an empty Dashboard). When this happened to me, I asked Gareth WTF was going on here, and he said it was because locks. So now you know.
- Follow-up: when I asked him to explain it in non-geek, he said that basically, when Sitebulb was calculating the link graph (for the Link Equity Score), it was reading and writing to the database, and locking the database from being accessed by other areas of the tool, such as the bit that houses the reports. He re-jigged the locks so that it only locks a certain bit of the database, and doesn't lock all of it. Because locks, QED.
Version 1.0.17 (beta)
Released silently (like a ninja) on 29th August 2017.
- Totally groovy new Hint for the Links section: 'Only has links from URLs that do not pass Link Equity'. This is for obscure situations where you have URLs which ONLY have links from canonicalized URLs, for example. I know it doesn't sound like much now, but it'll be there for you when you need it most. Like that old bottle of Baileys in the back of the cupboard, leftover from a work Christmas party several years ago, which has almost definitely gone off and you can barely open because the lid is crusted over.
- On website, updated 'Changelog' to 'Release Notes'. Because meta.
- There was an issue where Sitebulb didn't want to crawl sites protected by Incapsula. It wouldn't even perform a HTTP header check on external links to sites protected by Incapsula, FFS. This was completely unacceptable, so we didn't accept it.
- In the Links report, the anchor text would crash into the side of the page if it was particularly long (e.g. if you had a link with anchor text: "I'm here with my dad. And we never met. And he wants me to sing him a song. And, um, I was adopted. But you didn't know I was born. So I'm here now. I found you, Daddy. And guess what? I love you. I love you. I love you." - that would cause the anchor text to crash). Now we just use ellipsis...
- Changed the way included/excluded URLs are handled on the crawl progress screen. Depending on the website, and the inclusion/exclusion criteria, the numbers recorded on the crawl progress would jump around like a mother fucker. This was to do with the way they were being counted and recorded by the software. To make life easier for everyone, we no longer count them at all.
- If you had a double slash in the URL (e.g. movies.com/is-die-hard-really-a-christmas-movie//discuss) this would trigger BOTH the Hint 'URL contains a double slash' AND 'URL contains repetitive elements', since the double slash was being classed as a repetitive element. Whilst not technically wrong, this wasn't exactly useful, so we've changed them now to be properly unique. The answer, by the way, is clearly YES. I don't even know why there's a discussion about it.
- I know most of you haven't noticed, but there is a 'Feedback' tab in the left hand navigation. On there is a feedback box (I'm explaining because you probably haven't seen it) which you're supposed to write feedback into. If, on the Mac, you tried it drag and drop an image into this box (the one that is meant for feedback), the image would take over the screen and lock Sitebulb completely. It no longer does this, it just rejects the image outright. Not that you'll notice anything different anyways.
Version 1.0.16 (beta)
Released on 25th August 2017.
- You know how, once or twice in your lifetime, you have to admit that you were wrong about something. Well, this first update is one of those moments for me. We've introduced a new metric, 'Link Equity Score', which is a score from 0 to 10 for every internal URL, and is based on an internal linking algorithm. Gareth put this in the tool about a year ago, and then at some point I decided it needed to go (I can't even remember why). But it's back, by overwhelming public demand, and I was forced to utter the unholy words 'I was wrong'. I hope you appreciate my sacrifice.
- If you know anything about SEO, you'll know that Alt text on images is Kind of a Big Deal. Sitebulb has always had the Hint, 'Images missing Alt text', but up until now, has refused to tell you what these images actually are (you tease, Sitebulb). Now, however, we've moved the Hint to the 'Page Resources' section, and given it the respect it deserves. From the URL List associated with the Hint, you can click through to see the images associated with each page (check it out). Even better, in the Page Resources export you can see the same data, but this time grouped by image, with counts and example images (check it), so you can easily pick out template images that are on lots of pages.
- On the Overview, Sitebulb will sometimes display messages like 'Audit stopped early because crawl limit was hit.' When you only get one of these warnings it looks ok, but sometimes multiple warnings will trigger, so you'll get 2 or 3 of these boxes, and it looks jangly as fuck ('af', if you're down with the kids). So we've added a nifty X button in the top right hand corner of each box, so you can dismiss any that displease your eyes.
- Moved the crawl setting 'Maximum URLs to Audit' onto the first screen of the settings, underneath the Crawler Type selection. This is to give you, dear user, more transparency and control when it comes to bigger Audits. We have also fixed an issue where the maximum URL limit was incorrectly including Disallowed URLs in its calculation (this is really a Fix and should by rights go in the list below, but I do what I fucking well like and don't let anyone tell you otherwise).
- Sitebulb will now parse XML sitemaps that are compressed, so feel free to compress sitemaps to your heart's content. Middle-out for the win, Bro.
- Sitebulb will now pass HTTP headers when content fails to load, so it can at least tell you something about the URLs.
- Added fancy.com to the list of global excluded domains, because if you Audit a site that links to a unique fancy.com share page on every single URL, it's fucking annoying, trust me.
- In the previous update we added a clever little change for the mobile rendering checks, which essentially used just a sample of pages rather than the whole lot. In doing this, we managed to fuck everything up. The data was being collected but never saved, and instead duplicate URLs were just inserted into the database! Cue many reports of odd looking issues, such as duplicate content Hints, missing GA codes and the like. If you've seen anything that sounds similar... then this is probably why.
- On re-audits, Sitebulb was going rogue, resetting the settings for crawl speed and maximum URLs, even if you'd changed them previously. We reined Sitebulb in, so it's no longer doing this. Valar Dohaeris.
- A few users reported an issue where the tool was occasionally crashing when switching between the Crawl Map and the Overview. We believe we have fixed this now, but we are not 100% sure. If it happens to you, it would be awfully good of you to let us know about it.
- AMP URLs were being counted as duplicates, because canonicals on AMP pages were not being classified correctly. I'm so sorry.
- Kinda specific, but we fixed an issue where Sitebulb would refuse to crawl sites where the server returned the Status Code as '200' AND the Status Description as '200' - Sitebulb was looking to find 'OK', and would throw a shit-fit if it didn't.
Version 1.0.15 (beta)
Released on 7th August 2017.
- Quite a few people asked us to make the Crawl Maps more integrated with the rest of the product. So now you'll find the Crawl Maps don't open in a new window, but within the main Sitebulb window. Not only that, but you can now click any node and be taken directly to the URL Details page for that URL. Wait, there's more... you can also now scroll out so far that your Crawl Map becomes a dot (I don't know why people want to do this, but it was a feature request). AND we've also moved the 'Save Crawl Map' button so that it doesn't print onto the captured image, which I think we can all agree looked pretty shit. The eagle eyed among you will also notice that we've tweaked the spacing a bit for how Crawl Maps display, which seems to make most Crawl Maps a bit more legible, but if you think we've made them worse please tell us!
- We've added the option to update crawl and audit settings while an Audit is paused. When you click 'Pause' you should see an 'Update Settings' button alongside 'Resume' on the top right. Bear in mind that whatever settings you change will only come into affect for new URLs. So if you click 'Site Speed' halfway through, then the speed analysis will only come into play for half the URLs. Similarly, if you untick 'Crawl Parameters', then no new parameterized URLs will be added, but there will still be some in the queue that will get crawled. Please make sure you read and remember this forever so I don't have to keep answering the same fucking questions about why it's still crawling parameterized URLs.
- If you've tried crawling a Shopify site with over 500 internal URLs, you may have noticed that those rat bastards eventually stonewall you with 430 HTTP errors (TOO MANY REQUESTS). It only seems to kick in around the 500 URL mark, so if you have a 1000 page site, just delete half your content and you're sorted... Alternatively, you can crawl at 1 URL/second, which is slow enough to stop Shopify blocking you. In the pre-audit, Sitebulb will detect if it's a Shopify site, and give you a cheeky little message to tell you to slow down, lest you blow your load early.
- The big one - searching for Projects/Audits in the search bar top right will now also search imported Audits. Whoa.
- While looking at reports in the Audit, the report you are 'in' will now be a darker shade of grey, to help you understand 'where you are.'
- Sitebulb also now matches on the URL (www and non-www version) when trying to match the Google Analytics code to pre-select the right View, although the UA- code will take precedence, if found (see above).
- Sitebulb is now checking a few more places to find sitemaps to include in the pre-audit. I could tell you where we check, but I'd have to kill you.
- If you delete a Project and then want to start a new Project with the same name, it wouldn't let you do it ('you already have a Project with that name', which WASN'T TRUE!). We coerced Sitebulb to relent in this tyrannical nomenclature.
- The Hint 'URL received search traffic but 0 goal conversions' was showing a negative number, instead of 0. We conducted some research on this topic, and actually found that it is impossible to have negative URLs in real life, so we've changed this to show 0 instead.
- Some URLs were being incorrectly flagged for the Hint 'Set mobile viewport.' This was a coding error by Gareth, as it happens. Rest assured that he has been soundly thrashed.
- There was a tiny little typo on one of the compare audits fields - an extra, unwelcome lower case 'k' - which has now been forcibly rejected.
- The exclude URLs function, and the robots.txt parser, were not working correctly when handling wildcards (*). They are now.
- Astonishingly, no one noticed this bug, despite it being there from day dot: when doing pause and resume, you'd end up with some duplicate URLs. Whilst we've fixed the duplication issue, we can only conclude that no one's even looking at the fucking data, you're all just distracted by the pretty graphs. FML.
- For Site Speed Hints, if the affected URL figure was more than 1000, it would change to 1k, which then did not look like a number and so you wouldn't get the buttons on the right to click through and see the URLs. Now we do the lookup differently to avoid this issue.
- Not all images are created equal. Some images are embedded page resources, while some are anchored images that have their own image URL. Even if you don't select 'Page Resources', Sitebulb will still crawl anchored images and report on them. However, the reports weren't all hooked up right, so you'd click to view resources, and it would tell you that you hadn't turned them on! Touche, Sitebulb, but not very helpful.
- These self same anchored images, if hosted on a subdomain, would still get crawled even if you had unchecked 'Check subdomains', which was bafflingly annoying. This don't happen no more.
- Hint filters were wrong for the Hints 'Canonicalized URL received organic search traffic' and 'Noindex URL received organic search traffic', so the data looked like bollocks. Fixed the filters and now the data looks sweet as a nut (not that kind of nut!).
- Subdomains URLs weren't being identified when there was a protocol mismatch, which caused significant consternation for almost no one. Rest easy people, we've fixed it.
- If the start URL redirects, it is now updated to the final destination URL found by the pre-audit. I know what you're thinking, 'didn't it do that before?'. Apparently not.
- Fixed an issue where audits that got Interrupted (e.g. computer says no) were not always starting up again - they would say they were running, when they actually weren't (which was frightfully annoying).
- Sitebulb would forget what render timeout you set when re-auditing a site with new settings. It would remember everything else, but just forget this. We added 'Arya Stark mode', and now, The North Remembers.
- Google Analytics code was getting reported as not there, when it was there really.
- We experimented a bit with the selection options for 'results to display' on URL Lists. We tried 1000 and 500, but both of these would make Sitebulb hang occasionally on bigger audits. So we've abandoned this and gone back to the stable max option of 250.
- Fixed and issue where Audits were skipping the current process when resuming. We actually already fixed this bastard problem once before, but then something else started causing the same issue.
- If the tool crashes, as is its wont occasionally, the scheduled URLs (max 300) no longer get dumped/lost, so Sitebulb can pick up and recover more smoothly.
- Fixed issue where some exports (e.g. URL Details) were not generating properly. They were getting to 2% and then stopping indefinitely, which is no use to man nor beast.
- Fixed issue where some exports (e.g. URL Details for external URLs) were not saving when you chose 'Save and Open'. They'd just sit there, like a fucking lemon.
- Can now crawl Medium sites without adding the twatting 'GI' query parameter to every URL.
- When filtering URL Lists for 'Content Type', it would only show a few options. Now we show all the options. There's too many, if you ask me.
Version 1.0.14 (beta)
Released on 21st July 2017.
- When using the software for the first time, users were not prompted to create an account. This meant they could use the software up until the point of creating a Project, at which point a license check would fire, and the software would enter a never ending state of blind panic. It was the shittest back door ever invented, and now it's been removed.
Version 1.0.13 (beta)
Released on 20th July 2017.
- The one you've all been waiting for... Added the 'Content Type' column to a bunch of Hints: URL contains whitespace, URL contains upper case characters, URL contains repetitive elements, URL contains non-ASCII characters and URL contains a double slash. I know right? Holy fuckballs.
- Fixed the timeout issues which was causing Sitebulb to shit itself. If the tool has hung or crashed for you since you installed 1.0.12, this is probably why.
Version 1.0.12 (beta)
Released on 19th July 2017.
- As promised, Mac users can now happily dock the app again, without fear of Armageddon.
- Train mode - you can now view completed Audits while on a train (note, you can only start new Audits if you have access to the internet on said train).
- Plane mode - you can now view completed Audits while on a plane (note, you can only start new Audits if you have access to the internet on said plane).
- Offline mode - you can now view completed Audits while not connected to the internet (i.e. you no longer need to be on a train or plane to take advantage of this feature).
- Sped up the pre-audit process for some websites. Websites, in particular, with the defining characteristic of being utterly shite (e.g. bloated or broken HTML).
- Optimized HTML parsing. Translating from nerd language, this just means that when Sitebulb comes across crappy HTML, it can pick out the bits it needs and ignore all the junk. Sitebulb was previously handling this stuff badly, so CPU would sometimes spike when it came across a few of these shitty pages.
- Added a few default reply parameters to the 'Excluded URLs' tab in Advanced Settings. This stops the tool crawling 'replytocom' URLs, or other such worthless chuff.
- We dropped a bollock on this one - a bug that meant if you paused the Audit while it was crawling XML Sitemap URLs, and then hit 'Resume', it would skip the crawling and move on to building reports. It would also do the same for GA and GSC... which might explain some issues where users were getting big lists of 'uncrawled URLs' with no reason whatsoever.
- Fixed all the annoying issues with closing, opening, double opening and crashing the tool. In short, Sitebulb has an application element, which does all the work, and a separate user interface which handles the presentation. We like to be original, so we refer to these as 'the App' and 'the UI'. What we have done is basically made the App and the UI talk to each other better, so each one knows when the other is open, and when it is not.
- Fixed a pretty big issue on the Mac, where if you right clicked the app icon and clicked 'Quit', the application would no longer open again from the Applications folder. This is related to our change above to 'make the App and UI talk to each other better'.
- Fixed a few UI bugs that were throwing errors in our bug tracker, which meant we couldn't see the wood for the trees. You won't notice anything different, but rest assured we've diligently fixed them.
Version 1.0.11 (beta)
Released on 14th July 2017.
Attention Mac Users: Don't run from the dock
We have had to temporarily disable the option to run the program from the Dock. So if you have it docked already and install the new version, it WILL NOT RUN FROM THE DOCK.
Even if you dock the new version, it won't run from there. It will just show you the spinny loader thing. We're fixing this, but it's quite complicated, so bear with us.
- Lots of users were asking for a button to export and save the crawl map (apparently some people haven't heard of Jing). Anyway, button = added (which saves the crawl map as a PNG).
- This one is very cool. You can now highlight any text in the tool (from almost anywhere) and copy it. Anything: Hint copy, URLs from URL Lists, Titles, hreflang, etc... Highlight the text, right click, then press 'Copy'.
- If you click on a Hint, then export the data from the URL List, it now passes the Hint name into the name of the exported Excel spreadsheet (e.g. sitebulb_com_has_upper_case_characters_20170714102236.xlsx). Word up to that.
- Added a natty check to see if the Project name you've entered has already been used. If so, the user is warned to buck up their ideas.
- One confused beta tester thought that Sitebulb was crawling external links and shooting off to crawl the whole entire internet. Fret not, friends, it only does a HTTP status check on external links. It does this by default, but you can switch off external links in the Advanced Settings. We've updated the copy for this to make it clearer for said confused tester.
- Within Projects, if you hover over each Audit in the Website Audit History you will now see a little box showing details of the crawl (success, error, etc...). This will help you tell one Audit apart from another, it really is quite a handy feature. I wish I'd thought of it myself to be honest.
- Increased the maximum view to 500 rows, in the URL lists. Apparently 250 wasn't enough for some people.
- In the Links report, the 'Top Pages' tab in the tool shows the top 50 pages based on internal links. In the export you also got 50 pages, which seemed a bit paltry. So we upped it by 24,500 (in the export only).
- Added a new global setting (Dashboard -> Settings) called 'Excluded External URLs.' This is pre-loaded with a bunch of hosts and paths that, most of the time, there is no benefit to crawling (e.g. instagram or pinterest on every single page of your site). You can delete, change or add new options, should you wish.
- Sitebulb is now saving the version number of the software against the Audit, so it will warn you if you open an Audit from an older version of the software. In case you see some mistakes which we've already fixed. You might refer to this document in fact to find out what's changed (and that way someone will actually read all this shit).
- Added some comments to Hints related to Google Tag Manager to state that they might be redundant (e.g. To stop this: "there is no GA code - yeah but there is a GA code in GTM so fuck off").
- Similarly, added some comments to some of the Performance Hints which are redundant with HTTP/2. Because people kept having a go at us for being behind the times. We are nothing if not down with the kids.
- Added the 'HTTP Error' column, and added some additional clarification, for all lists of 'Errors' in the tool. One user found some URLs that returned a 200 status code, but were being listed as errors. This is because the content had failed to download, and it wasn't clear why, or if the URLs were being incorrectly classified. They weren't, but Sitebulb was just doing a shitty job of explaining what was going on. Naughty Sitebulb.
- Changed the 'multiple H1s' Hint from 'Issue' to 'Advisory', as one user pointed out that in HTML5 you could have perfectly valid multiple H1s (for different sections, for example). Since we still think it could help indicate an issue/misconfiguration, we've kept the Hint in.
- Removed the ability to use any special characters in Project names, as they were getting cut off or causing the software to crash. So no more Projects named 'This client is a c#£$!', sorry.
- Some Crawl Maps were coming back with a single node. While there are some legitimate reasons for this happening (e.g. homepage is canonicalised), in the examples we looked at it should NOT have been happening. Since the Crawl Map is my favourite feature, and a single node Crawl Map is as underwhelming as an England performance in a World Cup, we prioritised this fix above all others. Well, I wrote it here above the others anyway.
- There is an 'anti-sleep' function built into Sitebulb that means your computer won't sleep or hibernate while it is crawling. This was not kicking in properly on the report building element, so your computer would go to sleep. And crash Sitebulb. So you could spend 2 days crawling a 1 million URL site, then it would crash out at the last bit. Sorry if this happened to you! Anyway, 'insomnia mode' now stretches over the entire duration of the audit (note that your computer will still sleep when not running an Audit, and it still allows your display to turn off).it
- On the Audit Overview, we had been pulling back the number of indexed pages in Google, based on a site: search. Since this uses your local machine, it does the check using your own IP address. If you happen to be a massive spammer and blocked by Google, this check will show 0 pages indexed. Since some of our beta testers are massive spammers, they were seeing 0 and thinking the tool was buggy. We can't have that, so we have removed this feature for now. This is why we can't have nice things.
- Changed the Hint text slightly for 'URL is orphaned', to clarify that is checking for internal followed links. Apparently it wasn't obvious to some people...
- Pausing Audits has been a bit of a pain in the arse. Despite previous changes, some people were still having issues, so we've done some more work to help the figures update correctly.
- Fixed a peculiar issue which occurred when you had a Sitemap index, which included XML Sitemaps that redirected (the example we had they were going from having no trailing slash, to having a trailing slash). Sitebulb would follow the redirect and tell you about it, but it would ignore all the URLs it found! Whoops.
- Imported Audits no longer have an associated Project. Well they never originally had an Associated Project, but you could try and click on 'View Project', which would crash the software.
- The XML Sitemaps export button had an identify crisis, and was claiming to be an AMP export. We gave it a dry slap, and it's come to it's senses.
- If you stop an audit early, but let XML Sitemaps complete, you won't get a Crawl Map. Don't ask me why, because I do not know, but it happened, and that is why I'm bothering telling you so. It is fixed, so don't fret.
- A few versions ago I mentioned that we'd made the UI fit better on really shitty old Windows 7 work laptops with tiny screens. Well this was true, except for one part of the onboarding, where you needed to click somewhere off the fucking screen, and would get stuck in the onboarding forever and ever. Well, we've fixed it now so the onboarding actually works on really shitty old Windows 7 work laptops with tiny screens.
Version 1.0.10 (beta)
Released on 10th July 2017.
- A biggie - we've rebuilt all the report building to be much faster, which in particular affects LARGE AUDITS. So if you think size matters, we invite you to smash Sitebulb's back doors in.
- In line with the update above, we also added an Advanced Settings option to 'not crawl parameters', which can make a regular sized website seem a lot larger than it really is (you can also specify some parameters to still be included). It's all a question of perspective, really.
- Added a cancel button to the Import Audit overlay, just in case you change your mind in those crucial few seconds while the Audit is actually being imported, as is your wont.
- A few of you came across this issue - if you had an incomplete Audit (e.g. paused or interrupted), you'd still have the opportunity to click 'View Latest Audit' from the dashboard. Since the Audit hadn't actually been created yet, this caused Sitebulb to throw a hissy fit. Now it will say 'Incomplete Audit', and you can go to the 'Paused Audits' screen to resume or cancel them off.
- Occasionally, when running larger Audits (a few thousand URLs), the main progress screen would just freeze. In the background it was still crawling, but the numbers wouldn't move. Similarly, if you went back and forth from the Dashboard to the Crawl Progress a few times, the UI would often freeze. We felt it would be better if the UI didn't occasionally freeze, so Gareth spent all weekend rebuilding the script for this, and it seems to have done the job.
- At the same time, we also wanted to fix the issue with pause/resume (that you would know about if you actually read the emails we send), where the UI wasn't updating new crawl numbers, and would revert back to the total when you originally paused it.
- On the Mac, re-running a previous audit using the same settings, for any Audit you ran before version 1.0.9 would progress to 102% completion, and only show you half the reports. This may well have been a developer error, and has now been resolved. Any Audits that were affected should have gone back to normal as well.
- One of our testers found a bug whereby, instead of queuing two Audits up, the tool would just crash instead. That isn't exactly the way it's supposed to work. We think this may have been caused by some slowdown to the computer or the network, and the jobs would not get queued up properly. So we've fixed THAT issue, and hope it solves the crashing thing too.
- An eagle-eyed tester spotted this: 'The "page has more than one GTM code" seems to fail on pages which only have one but use the current implementation method of <head> component and <body> component.' Fixed.
- The very same eagle-man also spotted this: 'The <body> GTM script has a frame which flags as fail for "page has no frames". If GTM is in use, it's likely this will always fail.' We deliberately left this misconfiguration in for him to find - an Easter Egg if you will. Our warg friend did not disappoint. Now we're not flagging anything with 'noscript.'
- The sample audit that we pre-load in (which most people delete without looking at) is not exportable. But we still had the button there screaming 'export me!', which, if you succumbed to its request, would duly make the software shit itself. So we took the fucking attention seeking button away now.
- In the Advanced Settings, if you were changing max depth or max URLs to crawl, you could click the down button a few too many times and set these as negative values. Now you can't go to negatives using the down button. You can still enter negative values using your keyboard, but if you do this then you are definitely a prick, and deserve everything you get.
- Fixed an annoying, rare onboarding bug that occurred when you viewed your very first Audit, if you tried to view a report that you didn't actually switch on.
Version 1.0.9 (beta)
Released on 5th July 2017.
- On a report, if you go into the Hint data and click through to look at some of the URLs, the back button will now take you back to the Hint tab you were on before (rather than just back to the report overview, which we can all agree was fucking annoying).
- Added Gravatar to our ever-growing list of automatically suppressed external links, so Sitebulb will no longer check the status of 62,000 near identical gravatar.com links. Soon we'll add this as a configurable global setting, so you can switch it back on (if you really want to check 62,000 near identical gravatar.com links).
- When the crawl has finished and all the reports are built, Sitebulb also builds a lot of the Excel exports that you find within the reports. Previously it would do a faux 'loading' screen, which was largely, if not completely, pointless. Now it just instantly gives you the file to download or open (FYI, in our dev queue, Gareth added this ticket as 'Exports are file aware', as if they are some sort of sentient being).
- If you switched on 'International' in the Audit settings, but Sitebulb couldn't find any hreflang, when you then went to view the International report, the tool would hang forever on the spinny loading thing. Obviously we have fixed this bug, but in reality you never should have been selecting 'International' in the first place, if you knew that there was no hreflang on there (and if you didn't know this, well, you should have know this).
- On Crawl Maps, when you hovered over the homepage it would claim that it had 0 children URLs. Other than single page websites, this was utter bullshit. Suffice it to say, we had words, and it is now a more responsible parent.
- Fixed small typo on the pre-audits checks, 'Web Frameworks' had gone full Jonathon Ross (Web Fraweworks).
- In the International export, one of the columns was misaligned on the hreflang cluster matrix. No one actually noticed this issue, but it doesn't mean it wasn't wrong (if a bear shits in the woods, can the Pope hear it?).
- On the Redirect report, the headers went missing on the top graph when you switched to the 'Data View'. Fortunately, we managed to find them and return them to their graph.
- Fixed (we think) a curious issue where Google Analytics would select the wrong property, meaning the view would be wrong. Only in rather bizarre circumstances would his happen, so it won't have affected many of you.
Version 1.0.8 (beta)
Released on 3rd July 2017.
- If you clicked to start a new Audit from an existing Project, but then had a change of heart and decided you in fact did not want to do a new Audit after all, you'd end up trapped on the overlay, with the only option to start a new Audit (which you just realised you no longer want to do). We added a 'close' button, so you will not longer be trapped thus.
- Changed a few bits of the new account creation workflow, which is nothing to do with the tool anyway so does not concern you. Stop reading this bit please.
Version 1.0.7 (beta)
Released on 30th June 2017.
No updates this time, sorry to be such an enormous disappointment.
- This one is a biggie... on the Mac, if you docked the app, when you next tried to run it from the dock, it stayed on a loading screen forever. To clarify: IT DIDN'T FUCKING WORK AT ALL. Thankfully that shit show was spotted by an eagled-eyed beta tester, and it now works as intended.
- My wife has a really shitty, old Windows 7 laptop from work, that has the tiniest screen you've ever seen. When I tried Sitebulb on there you couldn't actually click on some of the menu options from the navigation. It was an... oversight. So we optimized the user interface to still work on really shitty old Windows 7 work laptops with tiny screens.
- Some of the Site Speed statistics data was incorrect, and is now not incorrect. It was so incorrect, by the way, that nobody actually noticed. What is it we are not paying you beta testers for anyway?
- The filtered list for Broken Page Resources did not include the column 'No. URLs Referencing Resources.' This shocking omission has been duly corrected.
- One beta user complained that the Uncrawled URLs button says 'View' when it should say 'Export', so we updated this button copy to satiate his pedantry.
Version 1.0.6 (beta)
Released on 23rd June 2017.
- In URL Lists, the Add/Remove Columns selector panel was too large for smaller screens, and the bottom was getting cut off. So we made it smaller (size isn't everything y'know).
- For larger audits, when displaying Realtime URL data, there can be a delay loading and indexing the data. Previously we just had a blank screen, which looked really shit, so we added our ubiquitous loading spinner thing.
- Some unfortunate beta users found themselves in a disastrous install-restart-uninstall-restart-install experience with 1.0.5. To avoid this, we added a version number to the installer, and instructions to uninstall old versions.
- Previously, if you did an Audit on a website using a really long Start URL, when displaying Projects on the Dashboard the URL column would be so wide that you could neither see nor click any of the action buttons on the right. We didn't think this looked very good, so when we encounter long URLs now we use ellipsis instead (...).
- Sitebulb stores crawl data on your hard drive. If you don't have enough space left on your hard drive, Sitebulb has nowhere to write the data to. Up until now, Sitebulb has handled this poorly, repeatedly smashing its head against a brick wall and spamming our bug tracking software 57,000 times. Now, however, Sitebulb gracefully crashes if it runs out of space (*ahem* user messaging to follow).
- We added a new 'Realtime URL' view so you could see the crawl progress in a URL List. But we left in the Column Selector, which wasn't that smart, and would have allowed you to query the database as it was being written to. So we removed it.
- We also removed the onboarding from this Realtime URL view, because you probably don't need it.
- MAJOR change to the UI - we moved a couple of the left hand menu items around on the Dashboard, so that all the 'Audit' items were adjacent (this was our most requested update).
- In the Redirect report, we'd forgotten to put a legend on the stacked area chart (the one at the top), so that is now in.
- Also on the Redirect report, we found occasions where the two pie charts were null (had no data), and displaying a pie chart with no pie is just not cricket. So now we just don't show them when there is no data.
- One of the Advanced Settings options is called 'Included URLs'; we've just added some examples to the copy to make it clearer how you actually use this function.
- In some of the 'Hints' sections, we split up the Hint into Indexable/Not Indexable, and the buttons on the right allow you to view the associated URLs. Apparently this was not at all intuitive, so we made it marginally clearer by adding a barely perceptible hover state tooltip.
- If you did not switch on one of the Analysis options when setting up your Project (e.g. you did not tick 'XML Sitemaps'), you would not get to see the corresponding report in your Audit. However we were still producing an empty export for you to not look at, which is a massive waste of resources and entirely unnecessary. So now we don't produce the export, and we've even removed the redundant 'export' button to boot.
- For Sample Audits, link data is not processed, but Crawl Maps are. Normally Crawl Maps include some link data about each URL when you hover over a node, but since link data is not switched on, these overlays were full of 0s, which was confusing and stupid. So we've just stripped out the link data from the overlay on Sample Audits (and also on Standard Audits where 'Link Analysis' is turned off in the Advanced Settings).
- Added the text 'Redirect Loop' to URL Details page for URLs that are in a redirect loop, to make it clearer that they are in a redirect loop.
- Some of the graphs on the International report were not hooked up to URL Lists correctly. When performing hreflang checks, Sitebulb is only looking at Indexable URLs (which filters out a lot of the noise). This 'Indexable' filter was not getting correctly applied, but now it is.
- The Pagination graph on the Internal URLs report was not hooked up to the URL List correctly. In fact it was completely broken, and is now completely unbroken.
- Ditto a bunch of graphs on the Search Traffic report (they are now unbroken too).
- Fixed issue where incorrect data was being displayed for 'Not Indexable XML Sitemap URLs' (it was showing Indexable as well, which is just plain wrong).
- In URL Lists, the Meta Description (and other fields with many characters) often displays a 'read more' eye icon, which shows the full copy when you hover over it. This hover function had stopped working, and just looked like a weird emoji character. We fixed the hover state.
- One of the Advanced Settings options is called 'Excluded URLs'; which contained a disgraceful typo in its copy ('different' instead of 'difference'). This was confusing users left right and centre, so we had to fix it pronto.
- When you close Sitebulb while a crawl is running, the modal window that displays had a button that was slightly too far left. You'll be pleased to hear this has been shifted to its rightful place (on the right).
- In the Indexation report is a warning message for when your robots.txt redirects elsewhere. This message was not firing correctly, so users were seeing it when they shouldn't have, and they were understandably up in arms about this. The problem has been resolved and order has been restored to the realm.
- Resolved obscure issue when re-auditing a site with deleted Google accounts - Search Traffic report is no longer generated (which would probably never actually come up in real life).
Version 1.0.5 (beta)
Released on 21st June 2017.
- Split Links report up by adding new 'External' section, so Links report now only contains internal link data.
- Added new Redirects report, and canonicalized data about redirects to only live in this report.
- Added option to 'View Realtime URL Data' while crawl is in progress.
- Added Filtered URL Lists, which is now the top menu item in the Report. These are pre-filtered URL lists so you can quickly jump to e.g. Internal Indexable HTML URLs.
- Added link from URL list to view all internal linking URLs to each URL (screenshot)
- Changed 'Performance' report to break out into two - 'Site Speed' and 'Mobile Rendering', now both available in the left hand menu in a Report.
- For Site Speed Resource Hints, added 'View URLs' link to column 'URLs Referencing Resource', so you can see exactly which Resources are problematic (e.g. JS not minified) and which URLs they affect.
- Added Indexable/Not Indexable counts for all Hints where it makes sense (screenshot).
- Added HTTP Status for outgoing links on single URL view.
- Moved some Hints from Links Report into Indexation Report. Links Report now only contains Hints that pertain to the distribution of link equity/PageRank (or lack thereof).
- Changed Hint icons to be actual words instead of icons (Issue/Advisory).
- Moved Project List to the dashboard front and centre - you can now view the project or jump straight into the latest audit. 'Recent Audits' is still accessible via the left menu.
- Changed default columns for all Hints and Filters.
- If you have some disallowed page resources, you will see a warning message on the Indexation report. This is now hooked up to the actual offending resource files (screenshot).
- Added messaging to Crawl Comparison page to tell you when settings have changed between Audits.
- On Site Speed report, added links to URL Lists from Insights for 'Slowest TTFB' and 'Slowest Download Time'.
- Added 'Internal Linking URLs' to all filtered URL Lists (including Hints and Insights).
- Added message to tell you when you have some paused Audits.
- Added new link to Dashboard for 'Queued Audits', which displays all Audits currently queued.
- Fixed issue where Sitebulb can't match up domain with GA account (now looking for UA code on page).
- Fixed issue where some resource sizes (in Kb) were not being collected properly.
- Fixed issue where the Hint 'Does not contain Google Analytics code' was being fired incorrectly.
- Fixed issue where some On Page Hints were sometimes showing false positives (H1 Not Set and Missing Meta Description).
- Fixed: Reading time now displays in mm:ss (instead of integers).
- Fixed issue on URL Lists where columns would default if you changed number of results to display.
- Fixed issue where Total Page Size was not being collected properly.
- Fixed issue on URL List where your filter resulted in only a few rows, where editing that filter was then impossible.
- Fixed issue where crawl comparison export stopped working.
- Fixed issue where Sitemaps Report filter 'Not in XML Sitemaps' was picking up non-HTML content types.
- Fixed issue where Crawl Maps data 'links from X% of URLs' wass using percentile instead of percentage.
- Fixed column alignment issue on URL Details export.
- Fixed duplicate content issue where redirects were not being checked - now URLs must be internal, HTML, status 200, indexable in order to be checked for duplicate content.
- Fixed issue where top pages would show external URLs.
Version 1.0.1 - 1.0.4 (beta)
Never publicly released, the very early beta releases are widely considered as Sitebulb's formative years.