Introducing Crawl Maps
Published 10 March 2017
One of the features in Sitebulb that people seem most excited about is Crawl Maps, which, if you've not seen them yet, are interactive vizualizations of your website architecture.
And they look awesome.
Example Crawl Maps
Basically, Sitebulb will take your crawl data and map it out using a force-directed graph, displaying URL 'nodes' as dots, with links represented by the connecting lines ('edges').
The result is an interactive graph that can be incredibly useful for technical SEO audits, often revealing patterns in the site architecture that you'd struggle to spot otherwise.
I'll stop talking about it, and show you some examples instead.
Flat site architecture
This is like a classic 'SEO friendly' flat website architecture, with almost every page no more than 2 clicks from the homepage. You're looking at the big green dot in the middle as the homepage, then the smaller green dots are at crawl depth/level 1 (i.e. they are linked to from the homepage).
The orange dots are at a crawl depth of 2. In this instance, they represent product URLs, and the depth 1 nodes are sub-category URLs.
If you haven't guessed already, this is an ecommerce site with a pretty extensive mega menu.
Not an easy one to digest at first glance, but this is a significant case of duplicate content. In the upper left is the homepage (big green circle) and all the proper site content. But this also links off to two similar structures down at the bottom...duplicate homepages and in fact a duplicate website (twice over!).
The long whip thing coming out the side is a string of paginated pages.
I've got another example of pagination actually, this one even weirder:
That one was actually caused by some legacy pagination markup that wasn't even being used anymore! (hence the 'bare branches' with nothing coming off them).
I like this one a lot. Shared by one of our beta testers, Gareth Edwards from Wolfgang Digital. It shows a relatively small 'product' site, with a large and complex blog.
The homepage is the big green circle at the bottom, and everything coming down off that is the marketing site which lists their products and services. The little green dot in the centre is the blog homepage, with posts, sections, categories and pagination coming off that.
Content marketing FTW.
At this point you might be asking yourself how I'm so sure what I'm looking at. That's because of the bit that doesn't come across so well with static images - the Crawl Maps are also interactive.
If any particular 'node' piques your interest, you can hover over it to find out which URL it represents, along with data about its crawl depth and internal inlinks.
Attribution & Learning More
It would be remiss of me to not give credit where it's due, to the innovative marketers who inspired us to build this feature in the first place.
Finding a way to visualize website architecture has been pretty much at the top of our wish list when building the tool, and the method we were most keen to replicate was the one we first saw demonstrated by Ian Lurie of Portent, in his article 'SEO Using Force-Directed Diagrams'.
As you can see, the crawl maps produced by Sitebulb are very similar to Ian's, and I don't believe for a second that we'd have been able to come up with something this awesome on our own, so thank you Ian (and sorry for nicking your idea!).
To produce his visualizations, Ian used Gephi, and we were actually first introduced to Gephi many moons ago by Justin Briggs, when he set about visualizing external links in his blog post 'How to Visualize Open Site Explorer Data in Gephi.'
This was where we first learned about the concept of using graph theory to represent link data, and going back over his old posts helped us solidify our ideas for implementing Crawl Maps in Sitebulb.
As ever, we are extremely grateful to the wonderful members of the SEO community for consistently sharing such inspirational ideas.
The posts I linked to above can also serve as education pieces, if you are interested in learning more about this kind of data visualization. We have also published our own 'Crawl Maps FAQ' which gives some more specific insight into how Sitebulb's Crawl Maps are built.
Below you will find a collection of Crawl Map examples from the community, which have been grouped together loosely by yours truly, with their own colourful commentary included as extra sauce.
We've only got one of these so far, but it's a doozy:
@sitebulb #sitemaps— Simon Cox (@simoncox) February 15, 2018
This sitemap is from a black hat site where they were framing and hiding high ranking sites and injecting their own SEO services into. Bad boys.
Site now gone. pic.twitter.com/L8wyB6KZfl
Content Mess (Where to begin...?)
Crawl Maps are perhaps best for quickly showing you that you've got a real mess to untagle.
I created this crazy Crawl Map with @sitebulb - a badly done CMS migration that included available content both on htpp and https, duplicate products everywhere, a blog with only a link to the main site and other horrors... #whatamess!!!! pic.twitter.com/dkXE5qleKt— Juan González Villa (@seostratega) February 21, 2018
The rest of the site is a little messy though. pic.twitter.com/43dxoIXjPO— Marie Haynes (@Marie_Haynes) February 17, 2018
Before and Afters
At Sitebulb we pride ourselves on stocking an eclectic range of before and after Crawl Map tweets.
Here's my "before and after" featuring the easiest SEO win probably ever. I found a missing slash before a relative URL in the site's navigation links. BOOM. #TechnicalSEO #SoDamnTechnical @sitebulb pic.twitter.com/XLWpQXuUTR— Tomislav Lukinić (@tlukinic) February 19, 2018
Update to a site this morning has brought a lot more visibility to it. A small and often overlooked issue of urls having trailing slashes and the canonicals not.— Simon Cox (@simoncox) February 22, 2018
Images are before and after adding the trailing slash in the canonicals. https://t.co/SXqNCjNvuy
@sitebulb sitemaps compo:— Simon Cox (@simoncox) February 15, 2018
Same site for these two - 1st is at start of opening crawl map - looks a right old spaghetti bowl! 2nd is when it has settled. This is https://t.co/WOSkbUEQgV - @alistapart - a fine publication. pic.twitter.com/gXRL3Ix1MR
It's probably what The Big Bang looked like.
A lot of problems but crawl map showed issues with URL canonicalization, architecture and similar/duplicate content. @sitebulb pic.twitter.com/lnLh0xCYPQ
Pagination, but like you've never seen it before. Like on top of a mountain.
Here's a "close-up" of the clusterf**k. pic.twitter.com/yhbnVio9Kb— Yanni PapaSomething (@YanniTweets) February 22, 2018
While the blues and purples are pretty, the meaning behind them is not. Looking forward to wrapping up our site IA restructuring soon and getting these little flowers pruned. @sitebulb pic.twitter.com/Y1mU5l6XNo— Kyle Faber (@regal_kyle) February 19, 2018
I just love how visually appealing these are. Gives client a clear view of how there website actually look. Great tool! pic.twitter.com/Stt2s20MyU— Hampus Nyström (@hnysse) February 16, 2018
Some Crawl Map enthusiasts are not satisfied with a static image, and instead created epic multimedia extravaganzas for your viewing pleasure.
Was making a video export of a sitemap using @sitebulb and I just had to share how beautiful this explosion is. Small things like that can entertain me for hours #EasilyAmused pic.twitter.com/UVsJDxVrt8— Shawn Harding | YouTube, SEO & Graps (@shawnbandv) February 22, 2018
A @sitebulb crawl map too beautiful for this world.— Oliver H.G. Mason 📉 (@ohgm) February 16, 2018
This represents a 14k sample of infinite pagination on an otherwise uncrawlable JS site, condensed into 10 wiggly nodes.https://t.co/ZJ1Nn8MSgp
Motion and text added because you people can't *just appreciate things*. pic.twitter.com/UPeUZxbwSw
Difficult to show this all in one screenshot, so have a terrible video that show how the WWE is wrestling with their massive catalog of content. I know... that was terrible, even by my standards 😓 @sitebulb #RecordedOnAPotato pic.twitter.com/LhSONKO1MB— Shawn Harding | YouTube, SEO & Graps (@shawnbandv) February 22, 2018
At Sitebulb we believe in freedom of speech, even trolls are welcome here.
Nice site structure (huh? what?!)
I know, I know. You didn't come here to see sites with a nice structure. That's why they're at the bottom.
Crawl maps generated via @sitebulb provides useful technical insights about your site's architecture.— Murat Yatagan 📈 (@muratyatagan) February 15, 2018
If you want to identify problematic patterns of your site structure take advantage of well visualized crawl maps feature. This image represents a site with flat architecture. pic.twitter.com/8B8zVkWBQW
Calling all Crawl Map fanatics!
If you want to see your name and tweets up in lights on our illustrious tweet wall, get involved!
Simply tweet at us @sitebulb with a picture of your favourite Crawl Map(s), and you too can be featured on this page.
Please be warned, however, as some folks struggle to deal with the worldwide fame and celebrity that awaits.