How to audit duplicate content

There are many reasons to keep an eye on duplicate content on your website—from poor user experience to potential keyword cannibalization, and impact on performance. Sitebulb allows you to find duplicate content at scale, and the dedicated ‘Duplicate Content’ report provides you with useful Hints to identify potential issues.

This guide explains how to set up and navigate the Duplicate Content report, to identify duplicate and near-duplicate content that may require attention.

Setting up the Duplicate Content audit

The ‘Check Duplicate Content’ setting is enabled by default when you set up a new audit. You can find this under the advanced Search Engine Optimization settings in the Audit Data setup.

Here, you can also find the ‘Check Similar’ setting. When enabled, Sitebulb will also report on near-duplicate content, providing you with data about URLs with a close margin of similarity.

Where to find duplicate content

You’ll find your Duplicate Content report in the left-hand menu in Sitebulb.

Finding the duplicate content report

You will also find Duplicate Content data under the Content tab within the URL Explorer.

Navigating to the duplicate content data tables from the URL Explorer

What does the Duplicate Content audit check for?

The audit comprehensively checks for duplication across your pages. 

  • HTML Content: Substantially similar HTML content to at least one other indexable URL.
  • Titles: URLs that have the exact same page title as at least one other indexable URL. 
  • Meta Descriptions: URLs that have the exact same meta description as at least one other indexable URL.
  • H1 Headers: URLs that have the exact same header 1 (h1) tag as at least one other indexable URL.
  • Technically Duplicate URLs (duplicate path and query): URLs that are technically identical to at least one other indexable URL.

The report breaks down this information into distribution of duplicate vs unique content in each of the above categories, as well as crawl depth, page path, and HTML template, allowing you to identify patterns—for example, subfolders with duplicate pages.

Duplicate by Path table in the Sitebulb duplicate content report

Similarity Report

As well as absolute duplicates, Sitebulb can analyze HTML content for similarity. You’ll have to enable the ‘Check Similar Content’ feature in your Search Engine Optimization advanced settings to get this data.

Accessing the similarity report in Sitebulb

Within the report, you’ll find a percentage similarity score for any URL that has a significant amount of identical HTML content to at least one other URL.

Similarity report - URLs with similar content data table

How to find Duplicate Content issues

The Duplicate Content Hints will neatly categorize duplication issues on your site, tagged by priority. From here, you can navigate to the relevant list of URLs for each Hint and export the relevant data to begin optimizing where needed.

Duplicate Content Hints

Finding Duplicates for a Specific URL

You can delve deeper into your audit by investigating duplicate data for specific URLs.

When viewing URL details, navigate to the Duplicate Content tab in the left-hand menu, to find URLs of content that is duplicated with the analyzed page.

URL Details - Duplicate Content analysis tab

Addressing Duplicate Content

As with all other areas of technical SEO, duplicate content can be more or less of an issue for your particular website, depending on the scale, types of pages, and cause of the duplication.

In some cases, duplicate and near-duplicate content may be inevitable, but in other cases, knowing about your duplicate content may prompt the implementation of best-practice solutions like optimizing metadata, canonicalizing legitimate versions of a page to one URL, or consolidating similar resources into one content-rich page. Read more about duplicate content and SEO in this guide.