Comparing response and rendered HTML

As search engine crawlers (and in particular Google) continue to integrate rendering into their crawling and indexing process, as SEOs we need to pay increasing attention to the effects of rendering on our web pages.

Traditionally, crawlers - both search engine crawlers and 3rd party crawlers like Sitebulb - would utilise the response HTML to extract links and content. These days however it is more complicated, as search engines are also rendering webpages (and firing JavaScript) before indexing of a page is completed.

This means that if you only ever crawl your site using a traditional 'source' HTML method, you may not be seeing the complete picture. Sitebulb has offered a JavaScript rendering option - our Chrome Crawler - since launch, and we have recently added a method for detecting the differences between response and rendered HTML, at scale.

Why is this important?

If the rendered HTML contains major differences to the response HTML, this might cause SEO problems. It also might mean that you are presenting web pages to Google in a way that differs from your expectation.

For example, you may think you are serving a particular page title, which is visible when you 'View Source', but actually JavaScript is rendering a different page title, which is the one Google end up using.

Sitebulb's response vs render report allows you to understand how JavaScript might be affecting important SEO elements, enabling you to explore questions such as:

  • Are pages suddenly no longer indexable?
  • Is page content changing?
  • Are links being created and modified?

If these things are changing during rendering, why are they changing? 

And perhaps more pertinent still: should they be changing?

Why might it not be a problem?

It might not be a problem because it might be completely intentional. Many sites use JavaScript frameworks that load in pretty much all of the page content during rendering. On these sites, the differences in response vs render is by design.

All this is to say that differences in the rendered HTML is not inherently bad, and the intention of the comparison feature is twofold;

  1. Highlight differences to aid understanding in how content is being loaded.
  2. Provide a starting point for further exploration and examination.

And one other thing, that might not be obvious - if you have a site whereby no content is changed during rendering, you don't need to concern yourself with this sort of thing, and crawling with the HTML Crawler is perfectly adequate for carrying out audits.

How to use the response vs render comparison

The first thing to note is that this report is only available using the Chrome Crawler, which you need to select during the initial audit setup:

Select Chrome Crawler

Make your other data analysis selections, and start the audit running. When using the Chrome Crawler, it will automatically create the Response vs Render report, which is accessible in the left hand menu:

Response vs Render Report

You will be presented with 6 pie charts, which show the effects of rendering on each of 6 key SEO elements: Meta robots, Canonical, Title, Meta Description, Internal Links, External links.

Response vs render

The pie chart segments correspond to:

  • No Change - the element is identical in the response and rendered HTML
  • Created - the element was not present in the response HTML, and is only present in the rendered HTML (therefore has been 'created' by JavaScript)
  • Modifiedthe element was present in the response HTML, but the content is different in the rendered HTML (therefore has been 'modified' by JavaScript)
  • Duplicated the element was present in the response HTML, but is present twice in the rendered HTML (therefore has been 'duplicated' by JavaScript)
  • Deleted - the element was present in the response HTML, but is not present in the rendered HTML (therefore has been 'deleted' by JavaScript)

Clicking on the corresponding chart segment (or number in the data table below) will bring you to a URL List showing you all the affected URLs, and the relevant data:

Response title changed in rendered HTML

The intention of this report is as a diagnostic device - use it to explore the affects of JavaScript, and then dig in further if you see something that warrants further attention.

The most straightforward outcome is of course that everything is listed as 'No Change.' This means you don't need to dig any further, and in fact means that the HTML Crawler is sufficient for future analyses, as the page content is not dependent on JavaScript, which effectively means that Response HTML = Rendered HTML (at least for the sake of SEO).

Response vs render SEO elements

The 6 key elements are shown as different pie charts in the report:

Meta robots

This chart shows the effect of JavaScript rendering on meta robots directives found on the page (i.e. this does not take HTTP headers into account). If there are differences in meta robots between the response and rendered HTML, this may cause indexing issues.

You want to pay particular attention to:

  • URLs that are 'noindex' in the response, yet 'index' in the render
  • URLs that are 'index' in the reponse, yet 'noindex' in the render

Bear in mind that 'index' is the default status, and 'noindex' is an explicit instruction to not index the page content.

This is particularly important when you consider that if Google find 'noindex' in the response, they will not render the page at all. Any kind of mismatch in the meta robots should be investigated as a matter of priority, as it can impact indexing and therefore rankings.

Canonical

This chart shows the effect of JavaScript rendering on canonical URLs found on the page (i.e. this does not take HTTP headers into account). If there are differences in the canonical between the response and rendered HTML, this may cause indexing issues.

With this one, if the canonical URL is different in the rendered HTML, the important question to ask is, 'is this the correct canonical URL?'.

Title

This chart shows the effect of JavaScript rendering on page titles. Differences between the page title found in the response and rendered HTML may mean that JavaScript is modifying the page content in unexpected ways, which may warrant further investigation.

In some respects this should be considered less important than the two above, as it does not impact whether a page will be indexed or not. However it does impact what content is indexed, and what title may display in the SERPs. As a larger consideration, it might be an indicator that page content is being more widely modified by JavaScript.

Meta Description

This chart shows the effect of JavaScript rendering on meta descriptions. Differences between the meta description found in the response and rendered HTML may mean that JavaScript is modifying metadata in unexpected ways, which may warrant further investigation.

Although this does not affect indexing, it can affect how pages appear in the SERPs, which in turn can have an impact on CTR. The biggest concern with meta descriptions is: 'if JavaScript is changing the meta description, are we happy with the version present in the rendered HTML?'.

This chart shows the effect of JavaScript rendering on internal links. Differences between the internal links found in the response and rendered HTML means that JavaScript is adding or modifying links, which may affect crawling/link discovery, anchor text optimisation and internal PageRank distribution.

After meta robots, this is possibly the most important of the elements analysed in this report, as link signals feed into Google's evaluation of page strength and relevancy.

For both internal and external links (below), the pie chart segments are actually slightly different:

  • Created - the link was not found in the response HTML, so it appears that JavaScript created it.
  • Modified - the link was found in the response HTML, however JavaScript has modified either the anchor text or the href URL.
  • No - not added or altered by JavaScript at all.

The analysis process is also slightly different - if you click through any of the segments you will actually be brought into the Link Explorer (rather than a URL List). As such, we have separate and more comprehensive documentation for exploring which links have been created or altered by JavaScript.

This chart shows the effect of JavaScript rendering on external links. Differences between the external links found in the response and rendered HTML means that JavaScript is adding or modifying links, which may indicate that external links are being injected without the site owner’s awareness.

Comfortably the least important of all these options, this is mostly to do with ensuring that external links are not being added to your content with your awareness, which can happen if a JavaScript library decides to inject a link into your content.

Further Resources

If you want to read more about the basics of auditing JavaScript websites, have a look at our guide on How to Crawl JavaScript Websites.