The Ultimate Guide to XML Sitemaps for SEO
Published 08 July 2021
XML Sitemaps are like day-trip itineraries for Google - and as an SEO, you're chief tour guide! Got some places you really want Google to visit? Get them on the list! Some parts of town you'd rather they didn't see? Probably best leave them out.
When conducting an SEO audit on any site, you’re going to need some time looking at its XML Sitemaps. While an XML Sitemap isn’t a requirement, you’re going to be potentially leaving traffic on the table if you don't include one.
In this guide, we’ll look at why they are important, how and why you need to structure them correctly, and how to test and audit them to identify any issues.
Table of contents:
- XML Sitemap Basics
- Why are XML Sitemaps important for SEO?
- What pages should you include in your XML Sitemaps?
- Which pages should you leave out of your XML Sitemaps?
- XML Sitemap guidelines and rules
- How to structure XML Sitemaps
- Different types of XML Sitemap
- Should I use static Sitemaps or dynamic Sitemaps?
- How do I create an XML Sitemap?
- Submitting XML Sitemaps
- Auditing XML Sitemaps
- HTML Sitemaps
XML Sitemap Basics
What are XML Sitemaps?
An XML Sitemap is a file that lists your site’s URLs and allows you to provide information about each page of your site to help search engine crawlers better navigate and index it.
How do I find my XML Sitemap?
More often than not, your XML Sitemap is located in the root directory of your domain, so it is usually pretty easy to find.
There are a bunch of ways you can find the XML Sitemap on your site - or indeed to discover if you even have one.
Use common formats to locate
There are some standard naming conventions, so try those first. Start with these:
Thanks to a bunch of people for the SEO Band references - you can find more here.
Look in your robots.txt
It’s pretty standard practice for XML Sitemaps to be referenced in the robots.txt (if it isn’t there, you should probably add it).
Check in Google Search Console
If some lovely person has submitted it to Google in the past, you’ll find it in there - or at least whichever sitemap has been submitted.
Get your Google on
Google has always been our friend. And search operators are your bestest friend ever. So if all else is failing, try some of these:
- site:example.com ext:xml inurl:sitemap
- site:example.com filetype:xml inurl:sitemap
- site:example.com filetype:xml
- site:example.com ext:xml
What does an XML Sitemap look like?
We might as well head over to schema.org to have a look at what they are and how they set out the protocol.
This is the example that they use:
The above example shows a sitemap containing a single URL. The tags in italic are optional.
It’s also worth noting that Google has stated on several occasions that they pay absolutely no attention to the priority tag and the change frequency tag.
What kind of websites need an XML Sitemap?
Google has said in the past that small sites don't really need an XML Sitemap (see quote below from John Mueller) and that they are more helpful for bigger sites. I’d argue that it doesn’t matter how big or small your site is - you should still have an XML Sitemap. They’re so easy to create (using a plugin like Yoast or a tool like Sitebulb or Screaming Frog) that you would be daft not to. A job that takes a couple of minutes for a simple site helps Google find the URLs that you want it to.
“With a site of that size, you don't really need a Sitemap file, we'll generally be able to crawl and index everything regardless.
Also, with such a small Sitemap file, you can just check the individual URLs to see if they're indexed like that.”
When it comes to a bigger site, you’re going to need an XML Sitemap.
Why are XML Sitemaps important for SEO?
In a world where Google is king, it's essential to keep your site fresh and updated with new content. When doing this, XML Sitemaps are essential for SEO because they allow search engines like Google to find all the pages on your website that they might not have found otherwise. This can mean more organic traffic coming in and more money for your business!
XML Sitemaps help Google crawl your site and get to new content quickly
Think of the XML Sitemap like a roadmap. Along comes Google, and it hits a page on your website. For argument’s sake, let’s say this is the homepage. From there, it crawls your navigation and starts to discover those pages, and then goes on a hunt to find other pages.
By giving Google an XML Sitemap, it will find all of your pages - more importantly, the pages you tell it you want to discover and rank - in one fell swoop.
XML Sitemaps help crawl budget
Google says that crawl budget isn’t important unless you have a massive site, and that’s true. I guess so if Mueller says it anyway.
That said, I'm all for making Google's life as easy as possible and ensuring you get the right pages crawled, without spending time on redundant parts of the site.
You only need to see the log files of a 10,000 page site to see that Google wastes a whole bunch of time in stupid parts of the site.
What pages should you include in your XML Sitemaps?
How do you decide which pages should be in your XML Sitemap?
The easiest way of thinking about it is, “Would you want a user to find this page on Google?". That’s a good indicator of whether the page should be in the XML Sitemap. If you want users to find it, then you want Google to find it and index it.
That said, there are some outliers. On an e-commerce site, for example, you want users to be able to see pages that are created by filters. They’re helpful for the user, right? But your CMS or SEO budget may mean that those pages aren’t indexable for duplicate content reasons. So those bad boys should not be in the XML Sitemap unless they are indexable and have unique content on them.
Which pages should you leave out of your XML Sitemaps?
There’s a whole bunch of pages that you aren’t going to want in your XML Sitemap. This is usually because you have specifically decided that you don't want Google to index them (and realistically, crawling them is a bit of a waste of time as well).
Here are some types of pages that you might want to leave out.
- Noindex pages
- Reply to comments URLs
- Site search pages
- Redirected pages
- Not found pages
- Server error pages
- Session ID pages
- Duplicate pages
- Non-canonical pages
- Pages blocked by robots.txt
XML Sitemap guidelines and rules
As with most things in life, when it comes to XML Sitemaps, there are guidelines and rules that you need to follow. Yeah, the rules need breaking sometimes, but you can’t always be the punk that’s breaking the rules.
Google's XML Sitemap guidelines
There are always guidelines, right? Google, in particular, has set out many guidelines for us to follow when we’re putting our XML Sitemaps together. You can read in more depth here, but here are the key ones.
- There is a size limit - according to Google, a single XML Sitemap is limited to 50MB (uncompressed) and 50,00 URLs
- Use index files when required - I’ll cover index XML Sitemaps further down in this guide.
- Only use final destination URLs - you don't want to include URLs that redirect, have session IDs or canonicalise to other URLs
- Format correctly
- If you have an international site, include hreflang annotations
- They must be UTF-8 encoded
- Use consistent URLs
Protocol / tags
There are six tags that you’re going to need to wrap your head around when it comes to XML Sitemaps. Some of which are compulsory and some that aren't.
URL Set Tag: <urlset>
This is another required tag for your XML Sitemaps. This tag encapsulates all of the URLs contained in the sitemap and should also describe the version of the XML standard that is used. Oh yeah, don’t forget you need to close the tag at the bottom as well.
URL Tag: <url>
The URL Tag is also required and this is where your location tag will live (and priority. lastmod and changefreq if you decide to use them). Essentially you are telling the search engines that this is the URL and this is the information you need to know about it.
Make sure that you use the full URL in the tag and the protocol (eg. https://). Oh, yeah, make sure you close that bad boy off as well.
It will look pretty much like this:
Loc Tag: <loc>
This is probably the most important one. The location tag tells the search engine where the URL is located - essentially the URL. Therefore, you must make sure that this is the final destination URL.
It's worth noting that you should make sure you use the correct protocol (secure or non-secure, and whether it’s www or non-www).
Lastmod Tag: <lastmod>
The Last Modified Tag is optional but recommended. All this tag does is tell the search engines when that page was last modified or changed.
John Mueller has said that Google does use this tag when going through your XML Sitemaps.
Changefreq Tag: <changefreq>
The Change Frequency Tag is another optional one. Originally it was meant to serve as an indicator to search engines about how frequently the content on your URL might change.
Google has gone on record to state that you don’t need to bother with it, and hints that there are better things to be spending your time on.
“Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.
...so what I’d really recommend is using the timestamp.”
John Mueller - Google Webmaster Hangout, May 2015
Priority Tag: <priority>
This optional tag allows you to tell the search engines how important you consider the page to be on the site. This tag has always been optional, and the keen-eyed among you will have noticed in the quote from John Mueller above he mentions that Google doesn't use the Priority Tag.
“Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.”
Where should your sitemap live?
This one is pretty simple. Your .xml file should live in the root directory of your domain.
Technically you can name it anything you want - which is fine if you’re going to try and hide it from your competitors and their SEO team (it can also be placed in a subfolder) - but I’d recommend sticking with something sensible.
How to structure XML Sitemaps
Many developers and SEOs will simply take a list of all the URLs on the site and chuck them into one hefty XML Sitemap. While there is nothing inherently wrong with this, I prefer to see a sitemap that has had some thought put into how it's structured.
While we can extract the data that we need from a big old list it can get a little unwieldy.
Let's take a look at a couple of ways you can break the information down and make your life easier in the long run.
Break it down by media type
The obvious one is to break parts of the XML Sitemap down by media type. Separate off your images and videos. Put them in their own sitemaps. It will make auditing the media types a hell of a lot easier in future - whether you’re doing that with Google Search Console or some wonderful auditing software like Sitebulb.
Break it down by page type or template
Breaking it down by page type or page template is something else I like to see in an XML Sitemap.
With page types and templates, you might look at creating separate XML Sitemaps for:
- Blog posts
- Category or services pages
- Subcategory pages
If you have specific types of pages for SEO work that you've carried out on a site, you may want to separate these off as well. This could be something like a knowledge database for your products.
Different types of XML Sitemap
Most simple sites won’t need anything more than one XML Sitemap with a list of all the pages that you want to be indexed. However, if your site is bigger than a few thousand URLs, you might want to add additional XML Sitemaps.
Essentially these are Index Sitemaps and Child sitemaps.
Not only does this make things easier for Google, but it gives you more visibility on different sections of your site.
Google allows for you to add the single Index Sitemap rather than loads of XML Sitemaps.
The Index Sitemap is a master XML file. This is an XML file that simply links to your Child Sitemaps and looks like this.
In this example, you can see that they have broken down specific parts of a big site into something more manageable.
Video XML Sitemaps
Video XML Sitemaps are a great way to increase the number of videos your site has indexed by Google. In addition, they allow you to specify which YouTube channel and playlist will be used for each video on your website, as well as whether or not it is publicly accessible.
Read more about guidelines and the required tags:
- Video Sitemaps and Examples | Google Search Central
- How to create an XML video sitemap — Serpstat Blog
Images XML Sitemaps
The Image XML Sitemap helps marketers create an organized, searchable index of images on their website. This allows for better organization of content and makes it easier for Google crawlers to find your site's visuals.
Read more about guidelines and the required tags:
Google News XML Sitemaps
A Google News XML Sitemap is simply an xml file that contains links to all the articles published on your website in chronological order. Creating this type of file ensures that Google News crawlers can find and index every article from your site, so they show up in the SERPs.
Read more about guidelines and the required tags:
Should I use static Sitemaps or dynamic Sitemaps?
So the difference here is pretty obvious. A static XML Sitemap is one that you create based on the pages that are currently on your site.
Every time you add a new page to the site, you’re going to need to add that page to the sitemap. So that’s not a great way of going about things.
Most websites are adding new pages to their site on a fairly regular basis. Unless it’s your only option, you don't want to be using static XML Sitemaps.
If it is the only solution, you need to be asking questions of your web developers. Seriously.
Dynamic XML Sitemaps, on the other hand, do what it says on the tin. Every time you add a new page to the site, that page is added to the sitemap. No need for extra work.
How do I create an XML Sitemap?
Right, you've got the knowledge. But all that knowledge is useless unless you can put it into action.
Let's get cracking on showing you how to create your XML Sitemaps - whether they are static or dynamic.
Creating static XML Sitemaps
Yeah, this is still an option. But why? Who has the time for this? Manually creating your XML Sitemap is a fairly simple, but very time consuming, process of using a text editor to list all the pages you want Google to discover and index. This might be okay if your site has just 4 or 5 pages. Otherwise, do it another way.
There are a bunch of handy dedicated XML Sitemap generators out there. Most of the time, all you need to do is pop your homepage into the tool, and off it goes and creates an XML Sitemap for you.
SEO tools & crawlers
While sitemap generators are handy, they just don't cut the mustard in the real world. If you can't add a dynamic XML Sitemap to your site, then you’re better off using a crawler tool. If you're an SEO, it's like that one of your existing tools will be able to do this for you - Sitebulb, DeepCrawl, Botify and Screaming Frog all have XML Sitemap generator features. They give you more control over the sitemap you’re generating, and you have more data at your fingertips.
Creating dynamic XML Sitemaps
Via your CMS
If you’re using any of the following CMS, you’ll find that they automatically create your XML Sitemap for you, so you don't need to do anything.
Plugins + tools
Some CMS - the obvious ones here are Joomla and WordPress - don't automatically create an XML Sitemap. But there’s no need to panic, Mr Mannering. Both of these have an extensive number of plugins that make creating XML Sitemaps a breeze.
Here are a few for you to check out.
Another option is to create your XML Sitemap using a server script. Now I’m no developer, so I’m not going to dig into this. I’d only get it wrong. Know your strengths, Wayne. Your developer, on the other hand, will be able to sort this out for you. Just make sure that you brief them precisely on what you want.
Submitting XML Sitemaps
So you’ve got your XML Sitemap sorted. You’ve got your important pages in there.
You’re on fire.
How do you get about submitting your XML Sitemaps to different search engines? I got your back. Here are some handy resources.
How to submit an XML Sitemap to Google:
- Build & Submit a Sitemap | Google Search Central
- How to Submit an XML Sitemap to Google Search Console - Seer
How to submit an XML Sitemap to Bing:
How to submit an XML Sitemap to Yahoo:
How to submit an XML sitemap to Yandex:
Auditing XML Sitemaps
Now you know what an XML Sitemap is, how to find it, how to structure it, and all that jazz, you need to make sure that you’ve got your house in order. You need to audit your XML Sitemaps to ensure that you’re getting the most SEO value from them.
When auditing, you need to be looking for.
- Syntax / validation
- Broken URLs / 404s / 5XX
- Redirecting URLs
- Non indexable URLs
- Too many URLs
- Pages that crawlers have found that aren't in your sitemap
But how do you find this information? There's no way that you can do this with just your brain and lists of pages. You need tools. You need help.
You can get this information from a bunch of different places. Here's a starter for ten for you.
Google Search Console
Your first stop. Why? Because it's free, and you probably already have access to it.
Search Console has a few features that will help you maximise the value of your XML Sitemaps. So let’s have a quick goosey look.
XML Sitemap errors in Google Search Console
Head over to the Sitemaps section of Google Search Console and - if you've submitted your XML Sitemaps - you should see something like this.
Clicking on the little bar graph icon to the right-hand side will give you more information about the errors that Google has found in any given sitemap.
And then you can scroll down to get more details on specific URLs and the errors.
Dig away, find errors and get them fixed. Made that sound simple, didn’t I?
There are a bunch of errors that Google will report to you. Some of the most common you’re likely to see are.
- URLs not accessible
- URLs not followed
- URL not allowed
- Empty sitemap
- Sitemap file size error: Your sitemap exceeds the maximum file size limit
- Invalid URL in sitemap index file: incomplete URL
- Invalid XML: too many tags
- HTTP error [specific code]
- Sitemap contains URLs that are blocked by robots.txt
If you need some more information then Google goes into depth about these and more on their sitemaps management support page.
Of course, Sitebulb can help you with auditing your XML Sitemaps. They aren't going to leave you in the lurch. Ever.
Cos they're lovely guys, and they want to make your auditing life easier, what they've done that other tools haven't is provide some context on the importance of the hints, alongside the potential errors it finds.
These Hints require immediate attention, as the issue may have a serious impact upon crawling, indexing or ranking.
These Hints are very important, and definitely warrant attention.
- Canonicalized URL in XML Sitemaps
- Disallowed URL in XML Sitemaps
- Forbidden (403) URL in XML Sitemaps
These Hints are worth investigating further and may warrant further attention depending on the type and quantity of URLs affected.
XML Sitemaps Insights
Insights are neither issues nor opportunities, and often don't require any action at all - they are brought to your attention as they may provide a useful avenue of investigation.
Handy huh? I think so. I know so.
If you want a full-on guide on setting up, running, and using Sitebulbs wonderful XML Sitemap auditing features, you can do so over here.
There are loads of XML validator tools available - but what are they? Essentially they exist to allow you to check that your XML document code is “Well Formed” and free of errors. Errors are not good. The W3C XML specification says that a “program should stop processing an XML document if it finds an error”.
Here are some nifty validator tools to help you along.
- W3Schools XML Validator
- XML-sitemaps.com XML Sitemap Validator
- MySitemapGenerator.com XML Sitemap Validator & Visual Viewer
Some other handy tools
There's a bunch of other tools out there that can help you with your XML Sitemap auditing.
- Visual Sitemap Generator from XML Sitemap file or URL
- Seomator Sitemap Test
- SEOptimer XML Sitemap Checker
- Greenlane Chrome Extension for Sitemap Testing
Before we wrap it all up, I think we need to touch on HTML Sitemaps. The oft-forgotten sitemap nowadays. But one that is still important.
What is an HTML Sitemap?
An HTML Sitemap is an HTML page that you have on your site. It lists all the pages on your site that are important to users, and links directly to each of them.
Usually found in the footer of a site, a HTML sitemap also gives search engines another way to discover and index pages on your site.
How are HTML Sitemaps different from XML Sitemaps?
So there are two main differences. The XML sitemap is a specific type of file and is solely for search engines. The HTML sitemap is often formatted similarly to other HTML pages on your site and is for users rather than the bots.
There's a strong argument that HTML Sitemaps are good for accessibility, essentially helping users with disabilities navigate around sites. If you want to read more about accessibility and SEO, here's a couple of great places to start.
Do HTML Sitemaps help SEO?
Yes. Yes, they do.
I've mentioned already that they help Google find and index pages. In addition to this, they allow you to create additional internal links to your important pages.
On a site we worked on, we used the HTML Sitemap to help pass page rank down to unique pages that we had created by using a system to turn some filtered pages into static pages that could - and do - rank.
Indeed that's a wrap. Hopefully, you’re now armed with all the information you need on what's required to plan and deploy an XML Sitemap. They're more important for SEO than people give them credit for.