Find us at BrightonSEO 25/26 April - STAND 18! Fill your swag bag!

How to Find & Fix Orphan Pages}

How to Find & Fix Orphan Pages

Published 17 March 2021

It's a sad little phrase - 'orphan web pages'. It conjures up the image of Oliver saying “Please sir, I want some more”...

Orphaned pages, by and large, are problematic for SEO - so you probably should feel a little sad when you find them.

But what are orphaned pages? How do they occur? And how can you fix them?

We’re not the first to create a guide on this, so we’re going to go one better. We’ll look beyond what articles on this topic typically cover and look into the points they do cover in more detail.

Specifically, we're going to look at why you need to apply critical thinking to handling orphaned pages.

We’ll also look at how you can pitch these fixes to clients, get their buy-in and most importantly, make sure they get fixed.

Table of contents:

What are orphan web pages?

Orphaned pages are essentially any page on your site that doesn't have a link from ANY other page on your site.

Sometimes orphaned pages are created intentionally - more on that later. But more often than not, from what we’ve seen in 100s of audits is that orphaned pages are created accidentally.

That’s a problem because it means you're not tapping into the page’s full potential.

Why are orphaned pages bad for SEO?

So you now know what orphaned pages are. Next, you now need to know why they’re bad for your SEO.

They might not get indexed

Google’s pretty good at finding pages even if they've been accidentally hidden and are, theoretically, inaccessible to search engines.

Search engines can and do pick up orphaned pages if they are featured on your site map.

They may even find them if another site has linked to them in the past.

Just because a page is orphaned now doesn't mean that’s always been the case.

But you can’t rely on search engines getting it right. We’re all for taking the odd risk at Sitebulb, but why take a chance with your site’s pages? Don't be leaving traffic on the table.

Even if they are found, link equity will be low, and they will struggle to rank

Even if search engines manage to find your orphaned pages, they can still be detrimental to your SEO. This is because there’s no way for link equity to flow to or from them.

Without that link equity, those pages won’t have a chance of ranking for anything even a tiny bit competitive - not even the keywords that Ahrefs mark as being really easy to rank for (they do think a lot of keywords are easy to rank for - trust us).

Those poor orphaned pages are lingering in the ether.

Your site hasn't passed any love on to them - you're essentially telling search engines that you don't give a hot damn about these pages.

But that's not the case, is it? Some of these pages hold a special place in your heart. You want them to rank.

There aren't many times that an SEO doesn't want a page to rank.

So why do pages become orphaned?

What causes orphan web pages?

One of the main causes of orphan pages is that they've simply been forgotten about. Changes are made to the website's navigation and/or structure, and these pages aren't included in the plans. In this way, the number of orphan pages often grows gradually over time.

However, there are some specific instances which can be particularly problematic in generating orphan pages. Here are a couple that we often come across:

Pages never added to the site structure

Usually, when a page is published it gets added to the structure of a site, and is linked to from various pages. A product page might belong to several categories, so it gets linked from those category pages and corresponding menus. A blog post might have different categories and tags, so it'll get links from each of those.

But sometimes, if a page isn't given a category, or added to the structure in some other way, it never gets any links, and isn't actually accessible from anywhere.

It sounds crazy, but it happens. A lot.

And when those pages aren't linked to internally. When they don’t form part of your site architecture. Search engines have no way of finding them.

Pages missed in a site migration

Many of us have assisted in website migrations. Typically they are pretty involved processes with lots of moving parts.

And one of the most common issues with site migrations gone wrong (which incidentally we have pitched to Channel 5 as a new TV show) is being left with orphaned pages.

Most migrations need someone who knows their technical SEO, so they don't bugger it up completely.

But...

Many sites migrate without any help from an SEO. Maybe the dev company has an in-house SEO and they say they can handle it. Don't get us wrong - some can. But some can’t.

Maybe the stakeholder decides they don't need to pay another agency thousands of pounds for work they don't understand, and struggle to see tangible benefits from. That's the perennial problem with “hidden” work. How do you show progress? How do you show its worth?

And that's when you see problems like this:

sistrix visibility index screenshot

And this:

sistrix visibility index screenshot

Images via sistrix

That's a lot of information right there so feel free to grab yourself a sandwich and then we’ll look at finding orphaned pages.

How to find orphaned web pages

Now you’re armed with all the information you need to find those forgotten pages. Next you need all the data.

All the data - and then you need a way to pull this data together to help you find those pages with no internal links.

There's no two ways about it. You need a lot of data. And one source is never enough.

You need all known URLs. URLs known to Google. URLs known to Bing. URLs that your own site knows you have.

When you have all that data you’re also going to need some kind of crawler to help you manage that data, to help you find those pages.

Let's take a look at the data you need, where you get it from and how to combine that with crawler data to get you to where you need to be.

Collating orphan page source Data

We've already established that by definition, orphan pages will have no links from within your own website. So to find them, we need to look to external data sources.

You’re going to need to dig deep - and believe us when we say only using one source of data is never a good plan. You need to collate from multiple sources.

Google Search Console

Let's go straight to the source. Search Console contains a couple of metrics to help you identify potential orphaned pages.

  • Pages that have had no clicks
  • Pages that have no or low impression numbers

You can now get 16 months worth of data from GSC; you may as well go for that.

Feel free to run them through your favourite crawler to eliminate pages that are 404s or 410s, and no longer exist.

Google Analytics

Google Again. You can grab data from Google Analytics.

Some things that you need to be looking out for when digging through your Analytics data are:

  • Pages that have received no organic traffic at all
  • Pages that are getting visits from other sources but not much organic traffic

Again feel free to check that these pages are still serving a 200 status.

Your Own XML Sitemaps

It's not very often you come across a site that doesn't have an XML sitemap, but it does happen.

Head over to your XML sitemap and download the data. You won’t find orphaned pages from this data alone, but you may need it later down the line.

Bing Webmaster Tools

Don't forget about Bing - it still exists, you know, and their Webmaster Tools is pretty good.

Bring On The Crawler

Once you've got your data sources, you'll need to use a tool to combine all the different sources, and crawl your website to identify orphan pages.

The theory behind it is this: you need to compare the URLs found from your external data sources, and the URLs found from crawling your own website. Any URLs which can be found via the external sources, but not your website crawl, are orphan pages.

Luckily there are tools can do the heavy lifting for you.

Sitebulb

If you’re going to be using Sitebulb, then we have some jolly good news for you. Most of the data you need to gather can be automatically grabbed by Sitebulb to help you find orphaned pages.

This includes:

  • Google Search Console
  • Google Analytics
  • XML Sitemaps
  • Custom lists

If you're looking for a detailed, step-by-step guide - have a look at this piece from Patrick on finding orphan pages with Sitebulb.

Screaming Frog

Some of you may already be Screaming Frog users rather than Sitebulb users so it's worth mentioning that you can streamline the process of finding orphaned pages in Frog by adding XML sitemaps and hooking it up to Google Analytics and Search Console.

It's not as pretty though ;)

If you want a web-based crawler to help you identify orphan pages, then you can check out Botify, Depcrawl or Oncrawl.

Just mention that Wayne and the Sitebulb guys sent you and that they are a lovely bunch of people.

How to Fix Orphaned Pages

You’ve got the data. You know there are issues. How do you go about fixing them?

Well. This is SEO. So...

It depends.

All on what those pages contain. Let's take a look at a few scenarios.

Duplicate or thin content pages - there's a good chance the pages are thin and/or duplicate. If they’re not needed, you might just want to get rid of them. Give them a 404 or 410 status code. If they are thin but useful for users, you might want to add a canonical. Just make sure they are somehow accessible to the user.

Important pages - in this instance, we might be looking at product pages on an ecommerce site. If you need these to be indexed and want them to rank, you need to include them in the navigation. Of course, this is not always feasible (budgetary reasons or a massive dev lead time) so you need to get inventive.

An interim solution might be adding them to the HTML sitemap en masse so they can be indexed and get the benefit of passed down link equity.

Thin content with backlinks - so there's an issue right there, but what do you do now? Potentially you may want to redirect those pages to another relevant page. This fixes the fact they are orphaned, and also addresses (or should address) lost inbound link equity.

There are many scenarios here for which there isn't a one fits all solution.

You need to think. You need to engage your brain (more on that a little further down).

When is an orphan page not an orphan page?

Oh yeah. Hold your horses - we’re not entirely done yet. Sometimes pages are orphaned intentionally. Sometimes it happens unexpectedly. What do we do about those?

This isn't an exhaustive list but here are a few to look out for.

Intentional orphan pages

Intentional orphan pages are pages that have been created as orphans, well, intentionally... By and large they should be noindexed - but that doesn't always happen!

PPC Landing Pages - this is probably the most common reason for intentionally creating an orphaned URL. Not every site is a suitable destination for PPC traffic, so it needs specific landing pages. As they are often similar to other pages they shouldn’t be indexable. There’s also no reason for them to be linked to internally.

Test Pages - all sites, of all types, are likely to have test pages. They could be left over from dev tests. Or more likely, clients playing around with content and not realising that they are pushing pages live. This is usually a quick fix. Set them to 404 or 410.

Duplicate URLs

This is a little more contentious as these tend to fall under the remit of duplicate pages. It's worth mentioning them though as they often show up when you’re looking for orphaned pages.

However...

You shouldn’t treat them as orphaned pages in your recommendations as this could cause further problems. Never just pump out a spreadsheet without analysing its contents. Take:

  • Non-Canonical https/http or www/non-www
  • Trailing Slash Issues
  • Syntax errors

Are these duplicate page issues? Or orphan page issues?

Well… both.

When you have your data sources there could be a load of pages like this.

Unfortunately, many technical audits are delivered in a spreadsheet. This usually happens when the SEO responsible has used their auditing tool of choice, and it spits out all the pages that it can see have no incoming links - i.e. orphaned pages.

Sometimes these issues are buried in that data. Instead of analysing that data it's often just exported and banged into a spreadsheet, and handed it over to the dev team.

But hold your horses. Those kinds of audits are very rarely of any value to the client. They can often impact the client financially, as well.

When it comes to this kind of thing, you need to apply critical thinking.

Critical thinking is vital.

Applying critical thinking

We’re not calling anybody out here, but most of us have seen other agencies’ audits right? You take on an account from another agency, and the client shares the other agency’s work. It happens. The client doesn't want to be paying for the same work twice.

And you've seen them, right? You know those audits. They’re presented in Excel or Google Sheets. They have the issue, and why it's important.

So far, so good. Nothing wrong with that.

Then they have the list of URLs, and you just think WTF?

They've just copied and pasted a list of URLs.

“How much did you pay for this audit again, lovely new client?”

Odds are there’s been no critical thinking - and critical thinking is essential. You don't want to make suggestions that are just flat out wrong. And you don't want your client to be paying for work that doesn't need doing, or won't move the needle enough.

Ask questions. Use your eyes.

Here are some of the things you need to consider.

Is this page important?

Herein lies one of the problems of exporting directly from crawlers. Even if a fix could be applied across all pages, it doesn't mean it should be. It's always a good idea to check in with the devs and find out how they might go about fixing them.

You then need to see how important these pages are. Think:

  • Is there search volume?
  • How competitive are the SERPs for some of these keywords?
  • Do they have the potential to drive revenue? There's no point fixing pages that won’t deliver.

Does it currently rank for anything? Can that be improved? Does it have any backlinks?

Not all orphaned pages are equal, especially on bigger sites. Let's say Google finds the page in a sitemap (in this case the page was in the XML sitemap but doesn't have any internal links). Google has it in the index and can rank it.

Pull ranking data for all of the important URLs.

Pull backlink data for all of the important URLs.

If a URL has backlinks, you want it in your site architecture.

Are there duplicates or near duplicates?

If they are, ask yourself whether this needs fixing.

Do they need to be, erm, de-orphaned at all? Adopted? Did we just coin a phrase?

It won't catch on.

How to pitch orphan page fixes to clients

So you’ve identified potentially troublesome orphan pages, you’ve applied critical thinking, and you want to take your findings to the client.

You want to fix them - but you need signoff. So how do you go about that? How can you increase your chances of getting the green light?

With the right pitch.

Back up your data with advice from Google

You’re just an SEO, right? What do you know?

Despite clients employing us to work on their sites and improve their rankings and conversions, you're always going to run into clients who can't or won't heed your advice.

It can be demoralising - but don't let it get you down.

Over the years we’ve learnt to have a thick skin (have you seen our release notes - we’ve had some stick for them over the years). But you can preempt issues if you can get clients listening.

Give them words from the horse’s mouth. Go to Google. Quote Google. Show the client that Google says this is important.

As we like to help our fellow SEOs here’s a couple of quotes you can pop into your next pitch for fixing orphaned pages. You’re welcome.

Orphaned pages may be noindexed.
John Mueller, Google Webmaster Hangout, 2016

john mueller google webmaster hangout screenshot

“If you have pages without internal links, then Google Search will assume they aren't critical or important for the site. Google won't give them as much weight in Search.”
John Mueller, SEO Office Hours, 2020

Explain the importance of fixing orphaned pages

You're the SEO - you know the importance of fixing them. You may have data to back this up from previous clients. If you have it, use it.

But you're not talking to other SEOs. You're talking to CEOs, or marketing department heads that don't have knowledge of technical SEO.

So set out the importance of the issue - but don't linger on it too long.You’ll start to lose your audience.

  • Keep it short and sweet
  • Show why its important in as few words as possible
  • Compare data to their competitors - the competitors they care about
  • Use language that won’t alienate them - don't use jargon

This is where the next part comes in. How you can really sell it to those that don't care about SEO or technical jargon.

You can use analogies.

Use orphan page analogies

There's a good chance your client doesn't care that much about SEO. Even if they do, they might not be able to grasp the concept that easily.

That's where you need to bring in analogies. Luckily at Sitebulb we’re generous folk and will share one with you that you can use when explaining orphaned pages to your clients.

You set the scene.

You run a grocery store. It's a beautiful little store. You organised everything well. You even have all your fruit and veg colour coded as you walk through the store.

Over there you have your juicy oranges and the bendiest of bananas. Everything is great. The customers arrive, they go to the fruit section, they can see the oranges, and check that they are ripe. They look along and there's those bendy bananas. They look good.

But you're a decent grocer.

You have lots of different fruits.

You've got passion fruits, some juicy mangos, a lovely selection of papaya - and maybe even the odd durian.

Or maybe not...

The thing is you've forgotten to put them on display. They’re still sitting in their boxes, behind the fruit counter.

The customer can’t see them. They can't buy them. They're hidden from view.

And that's what orphaned pages are like. They’re hidden from your customer. Potentially an oversight similar to our friendly grocer who got distracted and forgot all about them.

Ok so maybe we took this analogy a little far. But we have used it. And it does work.

Don't overwhelm clients with too much data (keep it simple)

They don't want it. They don't need to understand the nuance.

Show them how much of their site is orphaned, but don't lay the blame game (it was the devs guv).

If it was from a botched migration, show how much traffic and revenue those pages used to receive.

And never ever be scared to highlight competitors that don't have the same problems.

That is always a great way to get sign off.

Conclusion

And that’s orphaned pages in a nutshell (yes we came back to the grocers).

To push the Oliver reference that we so subtly dropped in at the start of this guide a little further, you probably don't “want any more” when it comes to orphaned pages. Most of these pages you need to nurture back to life. Link to them... Ensure that they are indexed by the search engines, and use them to help you grow your organic traffic.

oliver twist - no more orphan pages

Please, sir, I want some more.

― Charles Dickens, Oliver Twist

You might also like:

Wayne Barker

Unrepentant long-time SEO, consultant at Boom Online Marketing, and guest writer for Sitebulb.

Similarly sweary as Patrick, but does a much better job of hiding it (usually).

Sitebulb Desktop

Find, fix and communicate technical issues with easy visuals, in-depth insights, & prioritized recommendations across 300+ SEO issues.

  • Ideal for SEO professionals, consultants & marketing agencies.

Sitebulb Cloud

Get all the capability of Sitebulb Desktop, accessible via your web browser. Crawl at scale without project, crawl credit, or machine limits.

  • Perfect for collaboration, remote teams & extreme scale.