Rendered
As search engine crawlers (and in particular Google) continue to integrate rendering into their crawling and indexing process, as SEOs we need to pay increasing attention to the effects of rendering on our web pages.
Traditionally, crawlers - both search engine crawlers and 3rd party crawlers like Sitebulb - would utilise the response HTML to extract links and content. These days however it is more complicated, as search engines are also rendering webpages (and firing JavaScript) before indexing of a page is completed.
This means that if you only ever crawl your site using a traditional 'source' HTML method, you may not be seeing the complete picture. Sitebulb has offered a JavaScript rendering option - our Chrome Crawler - since launch, and we have recently added a method for detecting the differences between response and rendered HTML, at scale.
Why is this important?
If the rendered HTML contains major differences to the response HTML, this might cause SEO problems. It also might mean that you are presenting web pages to Google in a way that differs from your expectation.
For example, you may think you are serving a particular page title, which is visible when you 'View Source', but actually JavaScript is rendering a different page title, which is the one Google end up using.
Sitebulb's response vs render report allows you to understand how JavaScript might be affecting important SEO elements, enabling you to explore questions such as:
- Are pages suddenly no longer indexable?
- Is page content changing?
- Are links being created and modified?
If these things are changing during rendering, why are they changing?
And perhaps more pertinent still: should they be changing?
Why is this important?
Rendering webpages is a resource intensive task, and it takes significantly longer than simply grabbing source (response) HTML content.
This is why Google essentially crawl URLs in a two-stage process: their 'first look' is of the HTML response, then they render the page and have a 'second look' at the rendered HTML, then they update the index based on what they found in the rendered HTML.
As a result, URLs can and do enter the index initially based on this 'first look', i.e. the HTML response, and there will be an indeterminate amount of time between this and the 'second look', i.e. the rendered HTML.
So it is important that the HTML responses contains all of the core elements as you want them to be included in the index:
- Meta robots
- Canonicals
- Page Title
- Meta Description
Otherwise, you may find that pages are getting indexed with the wrong data.