The Ultimate Guide to XML Sitemaps for SEO

Wayne Barker
Published 08 July 2021

The Ultimate Guide to XML Sitemaps for SEO

XML Sitemaps are like day-trip itineraries for Google - and as an SEO, you're chief tour guide! Got some places you really want Google to visit? Get them on the list! Some parts of town you'd rather they didn't see? Probably best leave them out.

When conducting an SEO audit on any site, you’re going to need some time looking at its XML Sitemaps. While an XML Sitemap isn’t a requirement, you’re going to be potentially leaving traffic on the table if you don't include one.

In this guide, we’ll look at why they are important, how and why you need to structure them correctly, and how to test and audit them to identify any issues.

Table of contents:

XML Sitemap Basics

What are XML Sitemaps?

An XML Sitemap is a file that lists your site’s URLs and allows you to provide information about each page of your site to help search engine crawlers better navigate and index it.

How do I find my XML Sitemap?

More often than not, your XML Sitemap is located in the root directory of your domain, so it is usually pretty easy to find.

There are a bunch of ways you can find the XML Sitemap on your site - or indeed to discover if you even have one.

Use common formats to locate

There are some standard naming conventions, so try those first. Start with these:

https://www.johnnycache.com/sitemap.xml

https://www.oneredirection.com/sitemap_index.xml

https://linkbizkit.com/sitemap1.xml

Thanks to a bunch of people for the SEO Band references - you can find more here.

Look in your robots.txt

It’s pretty standard practice for XML Sitemaps to be referenced in the robots.txt (if it isn’t there, you should probably add it).

XML sitemap in robots.txt file

Check in Google Search Console

If some lovely person has submitted it to Google in the past, you’ll find it in there - or at least whichever sitemap has been submitted.

xml sitemap in GSC

Get your Google on

Google has always been our friend. And search operators are your bestest friend ever. So if all else is failing, try some of these:

  • site:example.com ext:xml inurl:sitemap
  • site:example.com filetype:xml inurl:sitemap
  • site:example.com filetype:xml
  • site:example.com ext:xml

What does an XML Sitemap look like?

We might as well head over to schema.org to have a look at what they are and how they set out the protocol.

This is the example that they use:

Schema.org XML Sitemap code example

The above example shows a sitemap containing a single URL. The tags in italic are optional.

It’s also worth noting that Google has stated on several occasions that they pay absolutely no attention to the priority tag and the change frequency tag.

John Mueller xml sitemap tweet

Tweet source

What kind of websites need an XML Sitemap?

Google has said in the past that small sites don't really need an XML Sitemap (see quote below from John Mueller) and that they are more helpful for bigger sites. I’d argue that it doesn’t matter how big or small your site is - you should still have an XML Sitemap. They’re so easy to create (using a plugin like Yoast or a tool like Sitebulb or Screaming Frog) that you would be daft not to. A job that takes a couple of minutes for a simple site helps Google find the URLs that you want it to.

“With a site of that size, you don't really need a Sitemap file, we'll generally be able to crawl and index everything regardless.

Also, with such a small Sitemap file, you can just check the individual URLs to see if they're indexed like that.”

John Mueller - Google Webmaster Forum

When it comes to a bigger site, you’re going to need an XML Sitemap.

John Mueller fix sitemap tweet

Tweet source

Why are XML Sitemaps important for SEO?

In a world where Google is king, it's essential to keep your site fresh and updated with new content. When doing this, XML Sitemaps are essential for SEO because they allow search engines like Google to find all the pages on your website that they might not have found otherwise. This can mean more organic traffic coming in and more money for your business!

XML Sitemaps help Google crawl your site and get to new content quickly

Think of the XML Sitemap like a roadmap. Along comes Google, and it hits a page on your website. For argument’s sake, let’s say this is the homepage. From there, it crawls your navigation and starts to discover those pages, and then goes on a hunt to find other pages.

Gary xml sitemap tweet

Tweet source

By giving Google an XML Sitemap, it will find all of your pages - more importantly, the pages you tell it you want to discover and rank - in one fell swoop.

XML Sitemaps help crawl budget

Google says that crawl budget isn’t important unless you have a massive site, and that’s true. I guess so if Mueller says it anyway.

That said, I'm all for making Google's life as easy as possible and ensuring you get the right pages crawled, without spending time on redundant parts of the site.

You only need to see the log files of a 10,000 page site to see that Google wastes a whole bunch of time in stupid parts of the site.

What pages should you include in your XML Sitemaps?

How do you decide which pages should be in your XML Sitemap?

The easiest way of thinking about it is, “Would you want a user to find this page on Google?". That’s a good indicator of whether the page should be in the XML Sitemap. If you want users to find it, then you want Google to find it and index it.

That said, there are some outliers. On an e-commerce site, for example, you want users to be able to see pages that are created by filters. They’re helpful for the user, right? But your CMS or SEO budget may mean that those pages aren’t indexable for duplicate content reasons. So those bad boys should not be in the XML Sitemap unless they are indexable and have unique content on them.

Which pages should you leave out of your XML Sitemaps?

There’s a whole bunch of pages that you aren’t going to want in your XML Sitemap. This is usually because you have specifically decided that you don't want Google to index them (and realistically, crawling them is a bit of a waste of time as well).

Here are some types of pages that you might want to leave out.

  • Noindex pages
  • Reply to comments URLs
  • Site search pages
  • Redirected pages
  • Not found pages
  • Server error pages
  • Session ID pages
  • Duplicate pages
  • Non-canonical pages
  • Pages blocked by robots.txt

XML Sitemap guidelines and rules

As with most things in life, when it comes to XML Sitemaps, there are guidelines and rules that you need to follow. Yeah, the rules need breaking sometimes, but you can’t always be the punk that’s breaking the rules.

Google's XML Sitemap guidelines

There are always guidelines, right? Google, in particular, has set out many guidelines for us to follow when we’re putting our XML Sitemaps together. You can read in more depth here, but here are the key ones.

  • There is a size limit - according to Google, a single XML Sitemap is limited to 50MB (uncompressed) and 50,00 URLs
  • Use index files when required - I’ll cover index XML Sitemaps further down in this guide.
  • Only use final destination URLs - you don't want to include URLs that redirect, have session IDs or canonicalise to other URLs
  • Format correctly
  • If you have an international site, include hreflang annotations
  • They must be UTF-8 encoded
  • Use consistent URLs

Protocol / tags

There are six tags that you’re going to need to wrap your head around when it comes to XML Sitemaps. Some of which are compulsory and some that aren't.

Required tags:

  • urlset
  • url
  • loc

Optional tags:

  • lastmod
  • changefreq
  • priority

URL Set Tag: <urlset>

This is another required tag for your XML Sitemaps. This tag encapsulates all of the URLs contained in the sitemap and should also describe the version of the XML standard that is used. Oh yeah, don’t forget you need to close the tag at the bottom as well.

URL Tag: <url>

The URL Tag is also required and this is where your location tag will live (and priority. lastmod and changefreq if you decide to use them). Essentially you are telling the search engines that this is the URL and this is the information you need to know about it.

Make sure that you use the full URL in the tag and the protocol (eg. https://). Oh, yeah, make sure you close that bad boy off as well.

It will look pretty much like this:

<url>

<loc>https://www.rankingsofleoncom/</loc>

<lastmod>2021-01-14T18:23:25+02:00</lastmod>

</url>

Loc Tag: <loc>

This is probably the most important one. The location tag tells the search engine where the URL is located - essentially the URL. Therefore, you must make sure that this is the final destination URL.

It's worth noting that you should make sure you use the correct protocol (secure or non-secure, and whether it’s www or non-www).

Lastmod Tag: <lastmod>

The Last Modified Tag is optional but recommended. All this tag does is tell the search engines when that page was last modified or changed.

John Mueller has said that Google does use this tag when going through your XML Sitemaps.

John Mueller lastmod tweet

Tweet source

Changefreq Tag: <changefreq>

The Change Frequency Tag is another optional one. Originally it was meant to serve as an indicator to search engines about how frequently the content on your URL might change.

Google has gone on record to state that you don’t need to bother with it, and hints that there are better things to be spending your time on.

“Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.

...so what I’d really recommend is using the timestamp.”

John Mueller - Google Webmaster Hangout, May 2015

Priority Tag: <priority>

This optional tag allows you to tell the search engines how important you consider the page to be on the site. This tag has always been optional, and the keen-eyed among you will have noticed in the quote from John Mueller above he mentions that Google doesn't use the Priority Tag.

“Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.”

Where should your sitemap live?

This one is pretty simple. Your .xml file should live in the root directory of your domain.

Like this:
https://www.deeplinkpurple.com/sitemap.xml

Technically you can name it anything you want - which is fine if you’re going to try and hide it from your competitors and their SEO team (it can also be placed in a subfolder) - but I’d recommend sticking with something sensible.

How to structure XML Sitemaps

Many developers and SEOs will simply take a list of all the URLs on the site and chuck them into one hefty XML Sitemap. While there is nothing inherently wrong with this, I prefer to see a sitemap that has had some thought put into how it's structured.

While we can extract the data that we need from a big old list it can get a little unwieldy.

Let's take a look at a couple of ways you can break the information down and make your life easier in the long run.

Break it down by media type

The obvious one is to break parts of the XML Sitemap down by media type. Separate off your images and videos. Put them in their own sitemaps. It will make auditing the media types a hell of a lot easier in future - whether you’re doing that with Google Search Console or some wonderful auditing software like Sitebulb.

Break it down by page type or template

Breaking it down by page type or page template is something else I like to see in an XML Sitemap.

With page types and templates, you might look at creating separate XML Sitemaps for:

  • Blog posts
  • Category or services pages
  • Subcategory pages
  • Products

If you have specific types of pages for SEO work that you've carried out on a site, you may want to separate these off as well. This could be something like a knowledge database for your products.

Different types of XML Sitemap

Most simple sites won’t need anything more than one XML Sitemap with a list of all the pages that you want to be indexed. However, if your site is bigger than a few thousand URLs, you might want to add additional XML Sitemaps.

Essentially these are Index Sitemaps and Child sitemaps.

Not only does this make things easier for Google, but it gives you more visibility on different sections of your site.

Google allows for you to add the single Index Sitemap rather than loads of XML Sitemaps.

Win-win.

Index Sitemap

The Index Sitemap is a master XML file. This is an XML file that simply links to your Child Sitemaps and looks like this.

YoastSEO xml sitemap index

In this example, you can see that they have broken down specific parts of a big site into something more manageable.

Yoast SEO xml sitemap example

Video XML Sitemaps

Video XML Sitemaps are a great way to increase the number of videos your site has indexed by Google. In addition, they allow you to specify which YouTube channel and playlist will be used for each video on your website, as well as whether or not it is publicly accessible.

Read more about guidelines and the required tags:

Images XML Sitemaps

The Image XML Sitemap helps marketers create an organized, searchable index of images on their website. This allows for better organization of content and makes it easier for Google crawlers to find your site's visuals.

Read more about guidelines and the required tags:

Google News XML Sitemaps

A Google News XML Sitemap is simply an xml file that contains links to all the articles published on your website in chronological order. Creating this type of file ensures that Google News crawlers can find and index every article from your site, so they show up in the SERPs.

Read more about guidelines and the required tags:

Should I use static Sitemaps or dynamic Sitemaps?

So the difference here is pretty obvious. A static XML Sitemap is one that you create based on the pages that are currently on your site.

Every time you add a new page to the site, you’re going to need to add that page to the sitemap. So that’s not a great way of going about things.

Most websites are adding new pages to their site on a fairly regular basis. Unless it’s your only option, you don't want to be using static XML Sitemaps.

If it is the only solution, you need to be asking questions of your web developers. Seriously.

Dynamic XML Sitemaps, on the other hand, do what it says on the tin. Every time you add a new page to the site, that page is added to the sitemap. No need for extra work.

How do I create an XML Sitemap?

Right, you've got the knowledge. But all that knowledge is useless unless you can put it into action.

Let's get cracking on showing you how to create your XML Sitemaps - whether they are static or dynamic.

Creating static XML Sitemaps

Write manually

Yeah, this is still an option. But why? Who has the time for this? Manually creating your XML Sitemap is a fairly simple, but very time consuming, process of using a text editor to list all the pages you want Google to discover and index. This might be okay if your site has just 4 or 5 pages. Otherwise, do it another way.

Sitemap generators

There are a bunch of handy dedicated XML Sitemap generators out there. Most of the time, all you need to do is pop your homepage into the tool, and off it goes and creates an XML Sitemap for you.

SEO tools & crawlers

While sitemap generators are handy, they just don't cut the mustard in the real world. If you can't add a dynamic XML Sitemap to your site, then you’re better off using a crawler tool. If you're an SEO, it's like that one of your existing tools will be able to do this for you - Sitebulb, DeepCrawl, Botify and Screaming Frog all have XML Sitemap generator features. They give you more control over the sitemap you’re generating, and you have more data at your fingertips.

Creating dynamic XML Sitemaps

Via your CMS

If you’re using any of the following CMS, you’ll find that they automatically create your XML Sitemap for you, so you don't need to do anything.

  • Squarespace
  • Wix
  • Shopify

Plugins + tools

Some CMS - the obvious ones here are Joomla and WordPress - don't automatically create an XML Sitemap. But there’s no need to panic, Mr Mannering. Both of these have an extensive number of plugins that make creating XML Sitemaps a breeze.

Here are a few for you to check out.

WordPress

Joomla

Magento

Server scripts

Another option is to create your XML Sitemap using a server script. Now I’m no developer, so I’m not going to dig into this. I’d only get it wrong. Know your strengths, Wayne. Your developer, on the other hand, will be able to sort this out for you. Just make sure that you brief them precisely on what you want.

Submitting XML Sitemaps

So you’ve got your XML Sitemap sorted. You’ve got your important pages in there.

You’re on fire.

How do you get about submitting your XML Sitemaps to different search engines? I got your back. Here are some handy resources.

How to submit an XML Sitemap to Google:

How to submit an XML Sitemap to Bing:

How to submit an XML Sitemap to Yahoo:

How to submit an XML sitemap to Yandex:

Auditing XML Sitemaps

Now you know what an XML Sitemap is, how to find it, how to structure it, and all that jazz, you need to make sure that you’ve got your house in order. You need to audit your XML Sitemaps to ensure that you’re getting the most SEO value from them.

When auditing, you need to be looking for.

  • Syntax / validation
  • Broken URLs / 404s / 5XX
  • Redirecting URLs
  • Non indexable URLs
  • Too many URLs
  • Pages that crawlers have found that aren't in your sitemap

But how do you find this information? There's no way that you can do this with just your brain and lists of pages. You need tools. You need help.

You can get this information from a bunch of different places. Here's a starter for ten for you.

Google Search Console

Your first stop. Why? Because it's free, and you probably already have access to it.

Search Console has a few features that will help you maximise the value of your XML Sitemaps. So let’s have a quick goosey look.

XML Sitemap errors in Google Search Console

Head over to the Sitemaps section of Google Search Console and - if you've submitted your XML Sitemaps - you should see something like this.

GSC submitted XML sitemaps

Clicking on the little bar graph icon to the right-hand side will give you more information about the errors that Google has found in any given sitemap.

GSC XML sitemap errors

And then you can scroll down to get more details on specific URLs and the errors.

GSC XML sitemap details

Dig away, find errors and get them fixed. Made that sound simple, didn’t I?

There are a bunch of errors that Google will report to you. Some of the most common you’re likely to see are.

  • URLs not accessible
  • URLs not followed
  • URL not allowed
  • Empty sitemap
  • Sitemap file size error: Your sitemap exceeds the maximum file size limit
  • Invalid URL in sitemap index file: incomplete URL
  • Invalid XML: too many tags
  • HTTP error [specific code]
  • Sitemap contains URLs that are blocked by robots.txt

If you need some more information then Google goes into depth about these and more on their sitemaps management support page.

Sitebulb

Of course, Sitebulb can help you with auditing your XML Sitemaps. They aren't going to leave you in the lurch. Ever.

Cos they're lovely guys, and they want to make your auditing life easier, what they've done that other tools haven't is provide some context on the importance of the hints, alongside the potential errors it finds.

XML sitemaps report in Sitebulb

Sitebulb covers:

Importance: Critical

These Hints require immediate attention, as the issue may have a serious impact upon crawling, indexing or ranking.

Importance: High

These Hints are very important, and definitely warrant attention.

Importance: Medium

These Hints are worth investigating further and may warrant further attention depending on the type and quantity of URLs affected.

XML Sitemaps Insights

Insights are neither issues nor opportunities, and often don't require any action at all - they are brought to your attention as they may provide a useful avenue of investigation.

Handy huh? I think so. I know so.

If you want a full-on guide on setting up, running, and using Sitebulbs wonderful XML Sitemap auditing features, you can do so over here.

Validator tools

There are loads of XML validator tools available - but what are they? Essentially they exist to allow you to check that your XML document code is “Well Formed” and free of errors. Errors are not good. The W3C XML specification says that a “program should stop processing an XML document if it finds an error”.

Not good.

Here are some nifty validator tools to help you along.

Some other handy tools

There's a bunch of other tools out there that can help you with your XML Sitemap auditing.

HTML Sitemaps

Before we wrap it all up, I think we need to touch on HTML Sitemaps. The oft-forgotten sitemap nowadays. But one that is still important.

What is an HTML Sitemap?

An HTML Sitemap is an HTML page that you have on your site. It lists all the pages on your site that are important to users, and links directly to each of them.

Usually found in the footer of a site, a HTML sitemap also gives search engines another way to discover and index pages on your site.

How are HTML Sitemaps different from XML Sitemaps?

So there are two main differences. The XML sitemap is a specific type of file and is solely for search engines. The HTML sitemap is often formatted similarly to other HTML pages on your site and is for users rather than the bots.

There's a strong argument that HTML Sitemaps are good for accessibility, essentially helping users with disabilities navigate around sites. If you want to read more about accessibility and SEO, here's a couple of great places to start.

Do HTML Sitemaps help SEO?

Yes. Yes, they do.

I've mentioned already that they help Google find and index pages. In addition to this, they allow you to create additional internal links to your important pages.

On a site we worked on, we used the HTML Sitemap to help pass page rank down to unique pages that we had created by using a system to turn some filtered pages into static pages that could - and do - rank.

Wrapping up

Indeed that's a wrap. Hopefully, you’re now armed with all the information you need on what's required to plan and deploy an XML Sitemap. They're more important for SEO than people give them credit for.

Wayne Barker

Unrepentant long-time SEO, consultant at Boom Online Marketing, and guest writer for Sitebulb.

Similarly sweary as Patrick, but does a much better job of hiding it (usually).

Free 14 day trial.
Full, unrestricted access.
No credit card required.

Try Sitebulb for Free