Technically duplicate URLs
This means that the URL in question is technically identical to at least one other indexable URL. This could be URLs that are only different based on case, or have the same query string parameters and values (but in a different order).
Why is this important?
If this sort of duplication occurs, you have a relatively serious issue, whereby identical URLs are being generated and are accessible to search engine crawlers.
If this results in large scale duplicate content issues on the site, you could trip quality algorithms like Google's Panda, which can depress organic search traffic to the site as a whole.
What does the Hint check?
This Hint will trigger for any internal, indexable URL which has a technically identical URL to at least one other internal, indexable URL.
Note: since the duplicate content check is only for indexable URLs, URLs which are canonicalized are not included in the analysis - as the canonical tag 'handles' the duplicate issue.
Examples that trigger this Hint
The Hint will trigger for any internal, indexable URL that has a technically identical URL, which is also internal and indexable.
For example, it would trigger for 2 URLs that have the same query string in a different order:
- https://example.com/page/?a=1&b=2
- https://example.com/page/?b=2&a=1
Similarly, it would trigger for 2 URLs that only differ on case:
- https://example.com/page/
- https://example.com/Page/
How do you resolve this issue?
As with all duplicate content issues, the seriousness of the issue largely depends on the scale - in general, if only a few pages are affected, it is probably not affecting the site to any meaningful degree. If there are thousands of duplicates, however, the scale might be large enough to trigger a quality algorithm like Panda.
The problem with this issue in particular is that it could quickly get out of hand if duplicate URLs are accessible to search engines at scale, so it is certainly worth addressing.
Depending on the type of problem, you'd need to deal with it differently:
- If duplicate query strings are being generated, work with a developer to understand why this is happening, and ideally fix the script that generates them. Fixing the root problem is a better solution than addressing this with redirects or canonicals, which can have a knock-on effect on crawl budget.
- If URLs exist that are duplicate on case, then the best way to handle this is remove any links to URLs with upper case characters (so they will not get crawled), then set up a 301 rule as a fallback. If you can't set up a 301 rule for some reason, set a canonical tag as a fallback instead.
How do you get more data from Sitebulb?
Within Sitebulb you can either dig in to a specific URL and look at the duplicate content that way, or you can export all the duplicate content and sort in Excel.
To find details of specific URLs with technical duplicates, click on the blue URL Details button from the URL List.
The URL Details tab will slide across, and you then need to navigate to Duplicate Content -> URLs, and you'll see all the duplicate URLs underneath.
To export ALL the duplicate URL data, click on the green Export Hint Data button in the top right hand corner.
This will give you a nicely formatted Excel sheet showing you all the technically duplicate URLs, allowing you to easily sort and pick through the data.