
Webinar: Mythbusting! In The Age of AI, is Structured Data Dead?
Published 06 June 2024
Sitebulb's Patrick Hathaway was joined by Jarno van Driel, Crystal Carter, and Dave Ojeda to discuss LLMs and structured data - do they actually use it or not?
From recent conversations with two of the biggest advocates of structured data out there (Dave Ojeda and Jarno van Driel), it would seem that LLMs ignore structured data when crawling, and focus instead on the text/semantics itself. But this is the exact opposite to what most SEOs seem to believe - that LLMs need structured data and we should mark up everything under the sun.
So - we wanted to get to the bottom of this...
Watch the webinar recording below
Subscribe to the Sitebulb YouTube channel! You can also sign up here to be the first to find out about upcoming webinars.
Video transcript
Patrick Hathaway:
Hi, everyone. Thank you all for joining us today for our first panel webinar of the year. I think it's going to be a really good one. Today we're going to be digging into some myths and misconceptions about the value of structured data for AI and SEO. We have around 400 people registered today, so I'm super pumped there's so many other folks interested in learning more about this stuff. Please go ahead and say hi in the chat. My name is Patrick and I'm the co-founder and CEO of Sitebulb. We also have Sitebulb's marketing manager, Jojo, with us in the chat. We have a fantastic panel for you today, Crystal Carter, Jarno Vandrieu, and David Jader, who will get to introduce themselves very shortly. But first, if you aren't already a Sitebulb fan, let me tell you a little bit more about us.
So Sitebulb is a website auditing and crawling tool which combines deep technical SEO insights with a user-friendly interface. We're the only SEO caller on the market that can cater to everyone, from freelance solo SEOs, all the way up to global enterprise brands, as we offer a desktop caller and a cloud caller. Our desktop product is perfect for consultants, smaller in-house or agency teams. So if you want to check that out, please go ahead and download our free 14-day trial from the website. And we also have Sitebulb Cloud, which is our solution for larger in-house teams and agencies. So which starts quite small but scales all the way up to enterprise. This is a great option if you've been using one of the big cloud crawlers and have seen your price ratchet up year-on-year. So if you have any interest in discussing Sitebulb Cloud, please go over to the Book a Demo page and book in a call with me.
So one of the things you can do with Sitebulb, is audit structured data. And in fact, Sitebulb's structured data report is pretty much the best one out there because it validates against both schema.org and Google Search feature guidelines. And aggregates all the issues it finds rather than just dumping them in a massive list. So I just want to briefly mention our upcoming Sitebulb Masterclass, with my colleague, Miruna, who will run you through how to audit structured data using Sitebulb, including the validation, finding common issues and best practises. So go to that link on the screen to register or click the link in chat. Okay, so final bits of housekeeping, then we can crack on. We are recording the webinar today and we'll be sending the recording out tomorrow. Including, all the links we just mentioned. So don't worry if you're not able to make it for the whole thing.
And then we will have some time at the end of the webinar for your questions. So please put them in the little Q&A tab next to the chat. Please don't put them in the chat box itself. And you can go and upvote other people's questions in there if you can't think of any yourself. So on to the webinar. The idea for this webinar came about because of a LinkedIn post that I shared, ruminating on the value of structured data markup for AI. I had essentially assumed that if you use schema to map out and markup everything on all your pages, this would be really helpful and valuable to large language models. And like I thought, it would make it easier for them to digest and understand the content. And what we're going to find out today, among other things, is how wrong I was about this. And I know already from lots of messages and comments on LinkedIn, that there's lots of other SEOs out there who believe the same as me. So strap in.
So I've talked way too long already, so I'm going to ask my guests to introduce themselves today. Dave, we'll start with you. Could you please let everybody know about your background and experiences, and how they relate to our topic today?
Dave Ojeda:
Good. I'm Dave Ojeda. I've been doing SEO organic and local for 17 years now. Within the last eight to nine years, my specialty has been structured data. Working with clients to understand the entities and relationships that are there on their site, and how to get those relevant rich results that everyone was looking for. And I'm happy to be here.
Patrick Hathaway:
Okay, and welcome, Dave. And Jarno, you as well, please.
Jarno van Driel:
Hi, I'm Jarno van Driel. I'm an international structured data and semantic SEO consultant. I originally was a front-end and accessibility developer, the early 2000s. And pretty early on I ran into structured data, got involved in that, thinking, "Hey, that's interesting. We can take accessibility and explain meaning to a whole other level." And because of that, got involved in schema.org as of day one, really. Been working on it ever since.
Patrick Hathaway:
Amazing. Welcome as well for you, Jarno. And then finally, Crystal?
Crystal Carter:
Yes, my name is Crystal Carter. I am the head of SEO communications at Wix and Wix Studio. And I have been a schema fan probably since 2018. I am a big fan of what it can do, what it's working really well, and how it can really transform visibility for businesses of all sizes. I think it's a great, great tool. I'm also very interested in LLMs and how they affect the search process, and how they affect our work as SEOs. So I'm really interested to be talking about this topic.
Patrick Hathaway:
Awesome. Welcome everybody. So let's get on with it. So as I mentioned earlier, the idea of this webinar basically came after I suggested on LinkedIn that investing in on-page structured data would help large language models understand your content better. At which point both Dave and Jarno chimed in to say, "No, not really." So Dave, why don't you kind of give us a more thorough explanation as to why you disagree with this statement. And actually just before you do, I want to focus this question specifically on the specific things you think are no longer relevant or valuable. And we'll come onto things that work later on.
Dave Ojeda:
Okay, well, I kind of pictured it as, it works to a certain extent in certain places for certain ways. And what that means is, when you look at e-commerce, merchant centre, job postings, those are something that are definitely working, definitely have an ROI that we can perceive. And for Google, they make money, so therefore there's something that they look at. If we take that all away, we start looking at, what does it do in terms of entity relationships, to expose those entity relationships to Google to understand? One of the biggest things that has happened over the course of time has been Google's ability for natural language processing, in terms of understanding what is on the page, has improved tenfold. And with the start of LLMs, it has improved even more so.
So what happened is, we now have natural language processing, which has the ability to understand what is on the page. And in a way, the structured data for these areas, other than the ones I mentioned, are like crutches. They helped Google get along until they didn't need to. And at this point I think that we're at a place, and moving forward and getting better every day, that we don't need that. We have the context on the page with which the LLMs, and Google itself, are understanding how to use it and how it relates to each other in an entity level basis.
Patrick Hathaway:
Awesome. Jarno, what more do you want to add on that, in your perspective?
Jarno van Driel:
I think one of the issues surrounding structured data is that at a certain moment a lot of SEO started to give it a lot more credit than it actually deserves. There's a lot of value in the processes of getting to a very well-descriptive piece of markup, but those processes often involved actually updating the page itself, improving the content, that they are structuring re-information, abiding by certain definitions. So in the end, if you optimise a webpage based on the structurization you need to do for that markup, you'll often see good results. But a lot of SEOs mistakenly then say, "Yeah, it's because of the markup that search engines are able to understand that information." And that's simply not true. From day one, when all the big search engines were actually using markup for their search results, the true proposition always has been a low computational heavy method of enriching the search results, at a web scale. Because structured data applies to the entire corpus of the web, and rich results are generated based on the entire corpus of the web.
LLMs are too expensive to do that in return. It's not a matter of whether LLMs can do that, they're just way too expensive, and structured data markup is simply a very cheap method of enriching search results. The problem right now is, we don't have any of such enrichments in generative results yet. I don't expect it's because they don't want to. It's still early days, we're still looking at MVP style products. And nobody's busy optimising those products quite yet. They're still trying to figure out what actually makes the money. And once they've got that figure out, I be surprised if we see markup return to enrich generative results as well. But it could be a while.
Patrick Hathaway:
So when we were discussing the topic on LinkedIn, I think one of the shocking things, for me, was... Because both of you, Dave and Jarno, are probably two of the biggest cheerleaders that I had known, of structured data in the community. And I certainly know Dave, because we had a few calls over the years. And essentially Dave helped us figure out how we should try and aggregate and expose structure date issues within Sitebulb. And you showed me some of the things that you were doing for clients, back in the day. Are you no longer doing or recommending the things that you were recommending previously?
Dave Ojeda:
No, I think that, going back to what I do with clients now is, we look at, rich results are the most important thing that are giving people potential ROI. So we're always looking for that, and we always have been. We're looking to expose what you could tell your client or you could tell your CEO, "Okay, this is making a difference." Now going into the level of description, in terms of what's in structured data. Before we could go like, "Let's go external vocabulary for every single entity, let's go into the nth degree of describing this product." I'm not doing that as much. It was very hard to quantify an ROI on that. And these days it's definitely even harder.
And so what happens is, the reality is there's only so much time and money that anyone has, whether it's a client or agency. So you have to look respectively, and say rich results are still a possibility when they are relevant to the person's industry or service offerings. But beyond that, I'm taking more of a conservative approach of marking things up. Making sure the main entities, of course, are related, but not going to the depth I normally would, because that money, I feel, could be used elsewhere. And content itself, I would rather put it into that.
Crystal Carter:
Yeah, if I can jump in? I think one of the things I'm hearing is something that I've certainly been feeling, which is, you get involved with the structured data for a couple of reasons, rich results are definitely one of the reasons why you want to get involved with them. For some industries, like jobs, for instance, that's how I got interested in structured data. I had a client who was in the recruitment field and we basically had no choice. We had no choice but to get involved with Google for Jobs, et cetera. Or that's what they were calling it at the time. Otherwise, we just weren't in the game. And that's the same with e-commerce as well. It's very important to be part of that and have those elements as well. And I think that what we're seeing is that Google is shifting how they're doing rich results as well. So I'm not seeing rich results for events very often. They deprecated how-to and FAQ, except for very few instances, and things like that.
And also, in some cases where you would normally see rich results for events, I'm starting to see AI overviews that are summarising where are the best events in New York this weekend, for instance. And they're summarising. To Jarno's point, I think he was saying they're going to pick and choose which things are best served with LLM tools, as these tools are getting developed. Because I certainly find this as a user. Sometimes I'm like, "Oh, GPT for work. Summarise this thing." And sometimes I'm like, "I can just use a function in this Google sheet to extract that little piece of text from the page. I don't need to run a LLM full request for this. I can just scrape it the same way I normally did." So there's some things that will better search through mechanical means, through traditional scraping methods, like we've seen with structured data in the past. And there's going to be some things like summarising, aggregating, et cetera, that might be better served with an LLM tool. And I think that the idea of structured data has always been an evolution. It's always evolved.
You would start doing structured data on a page and then Google would add in that Rich Result later on. But you'd already put it on the page so you were ready to go. And I think that what Dave's talking about, of maybe not doing every entity, it's very similar to the progression we've seen with meta descriptions, for instance. I remember back in the day when I started doing SEO, we hand wrote every meta description for every page, because it was just that important. Nobody does that anymore. There'll be a couple of pages that are your money pages and maybe it does make a difference if you change that particular keyword. And so maybe there's five or six pages that you actually hand write, or whatever. But most of them you either have a script that does it, or something like that. So I think that it'll come down to, like Dave's saying, the money pages will have actual full detail, and maybe the other pages won't, as we do get into the LLM space. Because of the way that the technology's evolving.
Patrick Hathaway:
So one of the things actually that someone, I can't remember who it was, but they DMed me about this, what we were talking about today, and essentially said, "If we're saying that structured data is not valuable for LLMs, where's the evidence for that? Where's the data to back this up?" I don't know if anyone saw this, but Mark Williams-Cook recently shared a video of a four-hour in-depth dive on how LLMs work. And basically explaining how all the HTML, including the schema market, is just stripped in pre-training, and they just wanted that text. Right?
Crystal Carter:
I think that's absolutely important to understand. And I watched that video. So thanks, Mark, for sharing that. And I think it's super useful and really important. But I think what's also important to remember is that there are lots of different types of models. I've got a talk and an article on SEO for LLM, brand visibility. And I talk about, there's two different types, really, when we think about it. There's one that's like Claude for instance, which is not connected to the web directly, and that has its own training data. And it's like an encyclopaedia, and you basically ask it questions and it gives you information. And then there are search enabled LLMs, like SearchGPT, like Gemini, like Copilot. And those are using search results to ground them in reality. And if you look at Gemini, for instance, it uses Versal and that scrapes the web first. So it takes whatever your query is, it distils it into a search query, goes into Google, and then pulls that back into the LLM output before they give you the full output. So they're using search results.
And Bing is doing the same thing. So ChatGPT as part of their thing, they've extended to lots of different things now, but as part of their information Copilot, they're also grounding themselves in Bing search. So they will be using some of the information from there. And I've seen evidence for some, I don't know if now is the time to share, or maybe later, that sometimes there seems to be some correlation. It's not always, but there seems to be some correlation with results that have rich results and with the sources that are shown in LLM. So recipes, for instance, was one of the first places where Google started investing time in standardising content with structured data. And if you look up something like, "How do I make a vegan cake?" If you look at when I'm Incognito, the rich results that you get there are virtually the same. I think I checked it and it was something like seven out of-
Patrick Hathaway:
Is now a good time to... Sorry to jump.
Crystal Carter:
Yeah. Shall I share that?
Patrick Hathaway:
Jojo might need to help with the share.
Crystal Carter:
Yeah, if I can share. Sorry, then I'll let other people talk as well.
Patrick Hathaway:
Jarno, we've not heard from you for a little bit. Why don't you pitch in while Crystal finds her Share button. Hang on, she's found it.
Crystal Carter:
Okay, sorry, sorry. So this is the query, "How to make a vegan cake?" And these are all rich results for different recipes, et cetera, within ChatGBT. This, Loving It vegan, that's showing there. Right there, number 2. We also have Thinly Spread, which is also showing within the output as well. There it is. And then this one is the Perplexity one. And I think I have that one pulled up as well, which is here. So on Perplexity, the sources that they have there, those include loads of them as well. So Banana Diaries, School Night Vegan, Thinly Spread, these are all included on the rich results on Google. And they're also included as sources on some of these LLMs. And while I'm not saying that they're direct correlation, I'm not saying it's exactly the same thing, but what I will say is that if the search results are influencing what's showing in a search enabled LLM, then if rich results are influencing your search positioning, then it will have a knock on effect. So not directly, but if they're connected, then it could have some influence.
Patrick Hathaway:
Yeah. So that's really interesting. So with rich results then specifically, are there some that are more effective or less effective than they were? And has the effectiveness shifted to specific niches?
Jarno van Driel:
The latter, most of all. If I look at things like product markup, Google's got a lot riding on the whole product schema, product variants. It's their way of expanding into products and websites that normally weren't all that interested in providing product feeds. So yeah, they've got a lot riding on that. But if you go beyond products and job postings, then over the last two years, the ROI of structured data markup, especially regarding rich results, has been vaporising, and real fast. And that's not even so much because markup isn't working anymore. markup is still performing the way it has always done, but Google's been heavily expanding on the amount of non-markup driven search features. And then, some time ago we got generative results on top of that. So again, it's not that markup doesn't work any longer, it just got a lot of more real estate space to compete with.
And I think that's the biggest issue right now, when you're looking at the return of investment in markup, even with rich results, if you look at things like video markup, it's great, but there are so few actual video showing up in the search results. And even then, they're often just YouTube videos as opposed to videos coming from websites directly. So yeah, the big workhorses of the past reviews, same thing. Reviews mostly work nowadays if you've got the media business profile, but actually on-page markup beyond products doesn't achieve all that much any longer. So yeah, it's a difficult time when you look at the ROI of rich results in general. So let alone looking at markup beyond rich results.
Patrick Hathaway:
So Crystal, I saw you nodding along there. Dave, have you got any similar or contrasting views to Jarno on that?
Dave Ojeda:
I agree on everything Jarno was saying. I think that in the end it goes back to time and budget, what you want to put towards. And these days, back in the days when we had FAQs and how-to rich results, I mean that drove people's budgets. But I want to get those on the search results, I want to do things. But what people don't understand, even back in that day, is, they didn't actually need structured data for it all the time. I'll give you a quick example. There were times when FAQs were in their heyday, that I would occasionally see, let's say, FAQ answers that actually would have the HTML links in there, to other pages that they were diving into. But there was no structured data markup for it. Google just understood what was on the page and replied that to a Rich Result on its own.
And now we're talking years ago, even at that point they didn't need this markup to do what they needed to do. Was it helpful for them? Yes. Could they have potentially done it? Yes, because they were actually doing it. Maybe there were site tests that they were doing, that I was seeing, but it was one thing. For those things, they don't necessarily need markup. Which goes back to the whole thing of natural language processing, of, they are just getting better and better at understanding what is on the page and how it relates to other things. I totally agree with everything Jarno said.
Jarno van Driel:
If I might dive into the last part Dave said, if you look at the output of something like Google's NLP API, you run that against content, I did that not so long ago, and I was pretty much amazed how it got 98% accuracy, based on the content. I fed it through the NLP API. And then you get back to what you've been seeing a lot over the last couple of years, where people are entityfying all their keywords. Everything is becoming a thing or a defined term, and they provide Wikipedia and Wikidata links, and you name it. Well, we actually have come more or less to the point that if the knowledge is readily available out there, you don't need to do that any longer, because NLP simply is too good. It actually matches against the correct Wikidata or Wikipedia entries. So there's often little to add through that type of markup unless something is truly ambiguous. Although then even nowadays, you're better off trying to explain that through natural language.
Patrick Hathaway:
Yeah. Go on, Crystal. You got something to add there?
Crystal Carter:
One of the other things that I think that people sometimes overlook, and I always point this out as well, is that structured data markup, like schema.org markup... So a couple of points. One is, that's not the only type of structured data. People think that if you say structured data, that's all of the things, however, that's only schema.org structured data. But anytime you put a header, a list, a numbered list, a table, that's all structured data, that's all structuring of data. So on-page optimization in that way, which, while they're not able to read necessarily that it's a table, they're able to understand content priority, if you're relating your keywords to the headline of your article and all of that stuff. Those sort of NLP app optimizations that are related to your headers and your lists, et cetera, et cetera. That is useful and that is a structured data thing.
The other thing that's useful, in terms of thinking about structured data and the entities that Jarno's talking about, is that the mapping that has taken place from structured data, in terms of informing the schema information, or informing the information that you need to have about your business, about your activities, about whatever you're doing, that has value as a guiding point in terms some of these tools, what some of these things want to hear about the business. If you had a drive-through coffee shop for instance, or a fast food restaurant, there's certain things within fast food restaurant, in schema.org, that it'll have, "Does it have delivery? Does it have curbside pickup? Does it have the customer service number?" That thing. Which will guide your on-page, which does have value, even when you're thinking about strictly NLP.
I think that that is valuable. And if it comes back around that these tools figure out different ways to incorporate the crawling of structured data within the page, then all the better. But I think it also adds value in terms of using it as a vocabulary for how we discuss our, which is what it is, for how we discuss the things that we have in online.
Dave Ojeda:
To that point, I totally agree with Crystal. What I do when building out new sites with clients, I go to schema.org. I understand schema.org because I'm in it all the time, but to understand building pages, building representations, building descriptions of entities, all that information/education is within schema.org. Just looking at it, you can understand the baseline, what should be talked about for certain types of pages, certain types of entities. And so by building upon that, you're allowing yourself to basically get the best content that relates information clearly and succinctly, which is all that LLMs are asking for. They will take garbage, they will take artificial language, but if you could provide them clear, consistent information, and how it relates to other things, that you are going to be winning the game long-term LLM. So for me, the biggest takeaway here is, if you haven't look at schema.org, play around with it, understand how things that you want to talk about are related to other things, and just that education alone, in terms of content, you will have a great ROI long-term.
Patrick Hathaway:
Yeah. So I just want to reference actually another webinar that was put out recently by schema, which I'll share the link in a sec, if you haven't seen it already. But I think it was a really interesting webinar. It covered a different subsection of the topic, I suppose, to what we're covering today. But I think the big takeaway, and the complete alignment between everything said today and everything they said, was this idea that schema helps you focus on the data accuracy and integrity, which in turn helps everything, like an NLP and AI, and all of those sorts of things. So yeah, I would definitely recommend going, checking that out.
And one of the other things that they mentioned as well, was this idea that you can build your own knowledge graphs to power your own AI agents. Which, some of the examples were really cool. I was like, "This is interesting stuff and I really like the idea of it." But it also feels very inaccessible to the average website owner. And so I kind of want to dig into that a little bit more. What, in particular, we've kind of said that structured data is not necessarily being digested by LLM, so I want to explore what actually is on-page structured data markup good for?
Crystal Carter:
So if I can just jump on the point where you said it feels inaccessible. Team Wix has been building in this way for years. So we make it so that when people add in their business information about their business on Wix or Wix Studio, that automatically goes into your organization's structured data. When you add in your product details, we automatically create a product schema markup for you. When you add in the information for your blog, we automatically create a blog schema markup. We have various different things that are full through. We talked about video markup. Anytime you upload a video, anytime you do an event, we structure the build of the page around that so that it's more accessible for people. And you can get it and you can modify it, and you can add in all the details as well. But we've got the baseline, and it's validated and it's ready for you, all there.
And I think that that goes to the point that if you're using schema as a guideline, it's also important to find ways to reduce the legwork that you have to do in order to generate it. So we have an automation that's built in, and even if you wanted to do schema, we have an AI assistant that helps you to create this book schema within our JavaScript framework, for instance. So where you can reduce the legwork, that will make the value that much more valuable.
Patrick Hathaway:
So TLDR, if everyone just moves across to Wix, it's going to do all the work for you. So Jarno, come on. Let's say a website's not on Wix and you want to try and figure out what things you want to focus on, what's it good for?
Jarno van Driel:
If we're talking about a run-of-the-mill website, SME sized, first of all, focus on the rich results. That's probably for that type of business, the easiest way to get a return of investment on markup, is just, keep it at rich results. Beyond that, like we mentioned earlier, it's design of content. I have an example of a website of a US-based lawyer I did not so long ago, I had a look at. Its service pages weren't actually describing their services, they were describing the type of lawyers their business had. So Child Custody Lawyer page is supposed to describing what that service offers and how they support you. They were describing what a child custody lawyer is, and what it does. That's a type of Wikipedia definition that should not be on service pages. That's where, especially in SME sized businesses, where you see people make the mistake with content. It needs to be as elaborate as possible and contain as much Wikipedia style information as possible. No, it needs to be to the point, it needs to be about what it is you offer.
For those type of businesses, it can really help if you have a look at schema.org to determine what actually is a service, what type of values go with the service. Again, going back to that content design. And that's what I've been using, structured data, at least the processes behind structured data markup, for years already. And that's where the true value lies. If you do that well, the rich results are just a compliment, but the improvements on-page and in the actual content, that's what really drives the results. And that's not just schema networking, especially for niche websites, data has it in its background, the link to open data cloud. There are tonnes of ontologies out there built by industry specialists that mention all kinds of stuff about ontologies. Well, if you're a writer and you need to write something for an industry you're not that familiar about, have a look at some ontologies. You often find those a great starting point to learn more about a certain industry.
Patrick Hathaway:
Nice. Dave, anything you want to add onto that perspective?
Dave Ojeda:
No, I think that once again, I agree with Jarno. I think that we've comprehensively looked at this in terms of what we can do with schema markup. And also, once again, we're trying to think of what we could do. What it's doing now for us that it may not have done before? But then again, what takeaways can we get from this? Even if structured data is not doing as much as it used to do, or as much as we perceived it to do, what can we pull from this experience, like Jarno was saying, to help us produce quality, consistent, concise content that LLMs are going to want, and are going to hopefully prioritise that information when it comes to defining our businesses, our entities, and everything else?
So I'm agreeing with everyone. But once again, it goes back to, this is a takeaway that I still think that many people just, they think of applying structured data on what is on a page, as opposed to what Crystal reiterated before, is of looking and taking what should be on the page. Have schema.org assist us in that definition and then apply that to the page. Once we get that applied, and that understanding there from a language perspective, then we go back to the point of, let's actually apply some level of structured data on it, depending on what the ROI will bring us, with results or not.
Patrick Hathaway:
Yeah, awesome. I think that there is a perfect quote in there which kind of summarises exactly where we've got you today. I don't know if Crystal's had some sort of power cut.
Crystal Carter:
No, sorry, the room's on a timer.
Patrick Hathaway:
Yeah. Okay, excellent. All right, let's just move the discussion a little bit. I want to reiterate, there's a Q&A, so please go ahead and put questions in. There are plenty in there already. So if you see any you like the look of and you don't have any yourself, you can upvote them. So we'll have 10 more minutes of our questions then we'll go and jump into the Q&A. So I wanted to talk a little bit about brand visibility and the entity stuff. We mentioned this earlier, and Jarno, you mentioned it, and Crystal also, I know you've spoken quite a lot about this, helping with disambiguation. Obviously I have seen some of your examples before, you've used like Barbie, right? And big brands can do this sort of thing. Is this worth doing? Or could or should it be done by normal sites of businesses?
Crystal Carter:
What do you mean? Establishing your entity?
Patrick Hathaway:
Yeah.
Crystal Carter:
Yeah, absolutely. So I've done it for myself. I'd said that I had a look and when ChatGPT first broke, I was like, "Hi, ChatGPT, who's Crystal Carter?" And they were like, "Who?" They didn't know who I was. That was fine. And I was like, "Okay." And they took my little ego. And then a few months later I asked them again, and they said, "Oh, Crystal Carter works at Wix. That's Crystal Carter. She does SEO and she likes Beyonce, or something." And so that tells you that you can move forward in those things. And somebody in the questions asked about the Knowledge Graph. And you can claim your knowledge panel. And knowledge panels used to be fairly reserved for big folks, but I recently claimed my own knowledge panel. And I know a few other folks who aren't rock stars, necessarily, and they have knowledge panels as well.
So I think that that's worth thinking about. And when we think about schema and how that impacts that, having your person schema, having your website, that is your official source of truth, that points to all of those other elements that are part of your Knowledge Graph, and make up part of your knowledge panel, is really, really critical. And when LLMs go to get information about you, they will go to the search results. And if you have curated those via your structured data, via your website, via all of those things that help the Knowledge Graph to know what's going on about you, then it will be more accurate and more clear. And I think that one of the other things that's really important is, and this again goes back to schema being a useful guide, a useful copilot even, if you will, is that you need to be distinct.
One of the things that I see very often is, you have a lot of businesses that just say the same thing. So Jarno was talking about a solicitor, a lawyer client, and they will say the same, "We're an award-winning law firm helping people with divorce." Or something like that. And if you go to another one, it'll say the exact same thing. But you need to be very distinct about who you are, and make sure that your entity is very distinct, and the way that you described yourself is distinct, and that when people come looking for something, when they need something that you can do, that the LLMs can find you.
Patrick Hathaway:
Which also just aligns with just general good business positioning advice as well. Right?
Crystal Carter:
Yeah.
Patrick Hathaway:
Cool. So this is a bit of a weird tangential question. I don't even really know, maybe it's just the same answers. But when I was researching the questions for this webinar, I came across this story of an AI overview, where Google served up an entirely fake description of a non-existent movie, Encanto 2, right? Which was just entirely based on Fan Fiction Wiki, and essentially, this idea that LLMs are gullible and hallucinate. And is there anything that we can do with structured data, or with linking through to entities in our content, that can help avoid this problem?
Jarno van Driel:
Not from a domain. Not limited within a single domain.
Dave Ojeda:
No, no. Yeah, I think that once again, it's going to be your LLMs are doing a comprehensive approach of scraping and getting all that data, and as you know, either correctly cite the data or incorrectly cite the data, or as we know, just plain make it up the answer. Which is probably part of what you received, Patrick, when you saw that. So going from the grounded fashion, what structured data can do, you're only applying it to one source, and who knows how many sources are constituting what that answer is produced from with an LLM.
Crystal Carter:
And in terms of the making things up, we got got. So The SERP'S Up podcast, someone was listening to The SERP'S Up podcast that we did with Dana DiTomaso. Basically, there was an AI overview that generated a transcript that was not the transcript of that podcast at all. And we're not Joe Rogan's podcast, we're not some gigantic, enormous podcast, but they basically made it up entirely. And I think that with regards to the AI overviews, we're kind of in beta mode. We can do our best to make sure that we've got a good source of truth, but they're testing a lot of stuff right now.
So I think we're going to see lots of bugs that come up out of that, in that space. I think the best thing that you can do is to make sure that the information that you have on your website is accurate, so that if somebody says, "Oh, I saw this on an AI overview." You can go, "It was on the website like this. This is what I published, this is what I said. This is my actual transcript, which you can see we published two weeks ago." Or something like that. "And whatever you saw in the AI overview is not to do with me." And I think that's really, really important, to make sure that you've got your receipts if someone sees something strange on an AI overview.
Patrick Hathaway:
Just on that topic. I had really interesting call with a prospective customer, who essentially offer like legal advice, right? And as a result of previous advice being referenced, they have to keep this enormous archive of all the old things, even if that's not the current advice. And you can just imagine LLMs going and looking at some of that old stuff and constructing answers based on advice from 10 years ago, which is completely defunct. And yeah, really interesting problem.
Crystal Carter:
Yeah. Yeah, it's very strange, especially if they're going on a plurality of data. So if there's lots and lots of people who've said this is the way you do it, that could be an issue.
Patrick Hathaway:
Okay. Right. So we will move on to Q&A in a moment. I've got one more question then, which I'll pose to each of you. So go back to the topic of the webinar today, in the age of AI, is structured data dead? Crystal, you go.
Crystal Carter:
I think, no. I think it's different. So yeah, I think it's not dead, it's different. I think that we are seeing incredible uptake in search-enabled LLMs, like SearchGPT, like Perplexity. Perplexity is growing by tens of millions every single month, intensive monthly active users. And they're using web results to ground their information. Gemini is doing the same. And if web results are still pulling through rich results, which will impact what things people click, which pages rank, then it will also impact LLM responses. So I don't think that structured data is dead. I think people should be strategic, and I look forward to seeing how it will evolve.
Patrick Hathaway:
Awesome.
Dave Ojeda:
I mean for me it's like SEO, it all depends. The reality is, for real ROI-based situations, such as merchant centre, e-commerce, job posting, those are what matter today. I say less so, generically, of everything else. I will say a caveat, just like Jarno had said, it depends on where we go with LLMs and what they may want. And maybe it's not LLMs that are specifically going to be LLMs that are exterior. They could just be internal Knowledge Graphs that you are building down the road, that you require, let's talk about on an enterprise level, that truly there'll be huge benefits of information that you can draw from that insights down the road, even more so than today. I think we're just in the beginning part of that. So once again, depends on the situation, it depends on the ROI. And once again, it all goes back to how much budget, how much time you have to implement, and what's your ROI, and understanding what it is today. Because as we know, down the road, it may be different. And down the road for us in the world of LLMs, is three weeks from now.
Patrick Hathaway:
Jarno, you go. Your answer now.
Jarno van Driel:
I think the caveat they've just exposed, everything we just said today should come with a disclaimer, "Could be outdated in an hour from now." So probably that's the biggest thing. Hard statements are hard to make because we don't know how those generative products will evolve from here on. I don't think market will play a role truly helping them understand the content. But I think that point of view has always been exaggerated for traditional search as well. The understanding part of things has never been as big as people made it out to be. It's mostly been about driving search enrichments at a low computational cost. And since generative AI is computational heavy, I wouldn't be surprised if we see markup make a comeback. So anybody that throws it in the trash right now, that could come to haunt or bite you in the end.
So I'd be pragmatic. And most of all, keep looking at the processes involved around creating markup because especially in e-commerce, the list of properties you need to be able to fill out, well good luck. The majority of CMS systems don't offer you that out of the box. So especially in e-commerce, there is a lot of work to be done. I've got plenty of clients who are holding up six months or a year just to modify their backend systems to get that data available.
Patrick Hathaway:
I think pragmatic is a really good word for what the advice has been today then. All right, so I'm going to move to questions. Reminder. You can upvote peoples... We'll do as many as we can. We've got about just less than 15 minutes left to go. So I'm going to pick the first one. So very specifically then, "What about structured data in media, like news article, live blog, person, new media organisation, web page, website, breadcrumbs? Combining all this information into a graph is expensive for the product. Is it worth it?"
Crystal Carter:
I think that this will depend on your business and your business model. So if, for instance, your primary business objective is to sell, I don't know, backpacks, or something like that, and you have a blog on the side that is just a general, like, "Oh, do you need a backpack to go hiking? Or do you need a business one?" Or that sort of thing, a general related content blog, then it might not be worth investing loads and loads of time and effort in custom schema generation for those blogs. However, you mentioned person there, media organisation, if you are a news media organisation and that is your business, then it may very well be worth doing that for some of the other features. And also, we're starting to see things like, Perplexity recently released a sports thing, where they're doing sports tracking and sports information, and things like that. So they're starting to add some features to some of these tools as well. So it might be useful for that.
She also mentioned person markup. If the person writing it, for instance, has a knowledge panel, then it may very well be useful to have that, because it will show up in the knowledge panel, and it will contribute to the Knowledge Graph about that person, which will feed into lots of different relationships. So it depends.
Patrick Hathaway:
I knew we would get there. I knew we'd have that. All right, let we go to the next one then. So feel free to jump in if you want to take one of these questions. So Chris says, "Do you think there is any value with implementing the list item carousel schema on PLP pages? I've seen PLPs list everything in the list item carousel schema, that's adding to the page weight." Another interesting issue.
Jarno van Driel:
That greatly depends on the geographical location of a business. Google has a better programme right now for certain types of verticals, but that's only being run in the European economic zone, due to the digital marketing acts. Europe is forcing Google to show certain search results complementary to the things Google shows out of their own regular systems, just to keep the competition going. At least, that's the intent of the digital marketing acts. So if you're in the European economic zone, there now actually are carousels driven, based on PLP markup. Outside the European economic zone, up for debate. I've done tests in the US, where I didn't see it make any difference, and I've seen tests where it did make a difference. It's inconclusive.
Patrick Hathaway:
Okay, going to move on then. I'm going to put this one on, but I think we've kind of covered this, but it's the top up-voted one. And I've been telling you all to upvote, so we should do it also. It's Simon, who is always first into all of the webinars. So he deserves to have his question on stage. "So are we saying that we should encourage a business to set up schema because it acts as an enhanced content template for the business to write their content, even though the schema itself will not make a difference?"
Dave Ojeda:
I would say, no matter what, as I said before, use schema.org as an entrance way of writing clear and concise content, relating the entities, the businesses, what they do, how they do it. I think that that is the number one takeaway there. And once again, it all depends on size of business. If this is a mom-and-pop store, you'd be better off utilising that knowledge to write better content, as it relates to each other within those pages, than you would putting basic schema markup on there. Now if it's a smaller place, you might be able to have a rank math or something, put that basic markup on there. But going forward, I think the trend is all going back to just properly applying an ontology schema.org to get better written content.
Patrick Hathaway:
Yeah, I mean to be fair, Simon suggested this question 40 minutes ago, and I think we then proceeded to answer it through this. But also, now Kyle Faber just put in the chat that as a result of answering your question, you need to provide Dave with a fresh loaf of bread for answering that question.
Dave Ojeda:
I would love it, Simon.
Patrick Hathaway:
Yeah, so get your starter going. Right, what have we got? Let's see, the next question. Here we go. So, "How do you measure success? Is it just, are rich results present or not, or are you able to measure the impact SD has on traffic?
Crystal Carter:
So I've always done it, so if you can see rich results, if it shows up in the enhancements report in Google Search Console, that's always a good sign, in terms of, it's valid, it's working, we have an opportunity. You should see an increase in impressions when you've got new schema markup working, and you should see information there. And ideally, you should see it in increases in clicks, and all of that stuff. So normally when I implement schema markup, and y'all might have more sophisticated ways of checking this as well, but normally I just mark the date and then see the changes in impressions between that time. And that tends to be valuable for most folks. There's also knock-on effects that enhance the rest of the site. And you normally end up doing some on-page updates as well because your schema markup should reflect what's on the page.
So yeah, that's where we normally see that. And then also again, we talked about ROI, of how much time it takes you to generate it. So previously, like in the peak schema days, you would see a big improvement in rich results from lots of different elements, and you'd be able to say, "Well, we spent half the day doing this, but we've seen an increase of 50% in traffic." Or something. And so it's very clear to demonstrate the ROI, and then move on to something. But these days you might need to make that window much longer, set the expectations a bit differently as well.
Patrick Hathaway:
Before we move on, does anyone have any more sophisticated ways of measuring this or you all do the same? Yeah, that's fine.
Jarno van Driel:
The problem with measuring the effects of structured data is, if I go back to 2015, I actually did a year long case study into a website, where we exclusively used markup to optimise the website. Because the problem with measuring the effects of markup is that they're often mixed up with other improvements. So it's very hard to make any definitive statements about that. So you either have to do it in isolation, and honestly, that often doesn't make sense to do, because are you really going to stop the improvements on a site for three or six months, just to test the markup? That seems like a waste of time and energy. You're better off probably make the improvements you know of. And I have not met anybody yet who has a backlog of zero. So unless you have a backlog of zero, I would suggest no, don't test that in isolation.
Dave Ojeda:
Call it a day.
Jarno van Driel:
Exactly.
Patrick Hathaway:
Yeah. Crystal's going-
Dave Ojeda:
Crystal's running around the room [inaudible 00:55:03].
Patrick Hathaway:
This is like she's doing some Beyonce dance like that. Right?
Crystal Carter:
Honestly, lesson learned. I've never seen this before, but yeah, note to self...
Patrick Hathaway:
Don't book that room. All right, so I'm just going to take the last questions we've got, in order. "Do you guys see difference between targeting the SERP results and targeting the Google Knowledge Graph?"
Crystal Carter:
Yes.
Patrick Hathaway:
Yes.
Crystal Carter:
With the Knowledge Graph, it has to do a lot with what other people say about you, not just what you say about yourself. It also has to do about your spread of information across various different spaces. You can't just get it all tracked if you don't also have profiles on relevant places, if you aren't essentially notable in some way. So you have to make sure that you have lots of high quality profiles across the kinds of places where people would expect to see you. So for instance, if you work in SAS, you might need a crunch base profile. If you're working in music, then they would expect to see you in Spotify. If you're working in journalism, expect to see various write-ups, and things like that. You would expect to have an author page of an actual author, I don't know, like Dickens, or whatever, like an author-author page on Amazon, or something.
Patrick Hathaway:
Goodreads.
Crystal Carter:
Yeah, exactly. Yes, Goodreads is a good one. So it's not just about what's on your page, but also about the spread of it. It does go back to you need to have those things graphed on your person schema, on your organisation schema, on your homepage, or on your main page. And that informs all of that. And also when you go to claim your knowledge panel, that will help your case, that you are the person who looks after that website, for instance. But yeah, it's not just about what happens on your domain.
Patrick Hathaway:
Cool. Actually, the next question is actually about authorship. So let's put that one on. Does authorship markup still have a place in LLM's understanding the experience and calibre of human writers?
Crystal Carter:
I can jump in, but I feel like I've said so much.
Jarno van Driel:
To be honest, I think overall, the examples I've seen, generative AI is such a mess, that sometimes it shows high authoritative brands, sometimes it makes up information, sometimes it pulls in quotes from disputable sources. So does it matter? Could be, but I'm not convinced. If the results were consistent, maybe. But the results just aren't consistent enough to have any opinions about that, in my opinion.
Dave Ojeda:
I understand what Jarno... That is such a small piece of what LLMs are taking in, in terms of adjusting the data, that I don't think it plays a real factor.
Crystal Carter:
I think rather than the authorship markup, I think that the existence of an author page is probably more valuable, and the author page should have author markup. And that's quite an easy one, really, because it's not one that needs to be changed every five minutes. It doesn't necessarily need to be super dynamic. But the existence of an author page, Rand Fishkin shared an example of his speaker bio, and how that's always the same whenever it goes out. And he was saying that he sees that proliferate across LLM responses about him, for instance. Additionally, an author bio is a really, really good way to talk about distinct entities. And author bio is a really, really good way disambiguate you as an entity, and to attach yourself to other known entities. So there's only one Crystal Carter that works at Wix, right? There's only one. There are other Crystal Carters in the world, but there's only one Crystal Carter that works at Wix.
Another good example is, if I was to say Steve McQueen is a man known for his films. There are two Steve McQueens that have existed in the film industry. There's one who drove fast cars, way back in the day, and then there's one who's working now, and who's a director and a Turner Prize winner. Now, if I said Steve McQueen is a Turner Prize winning director known for his films, that's only one. So when you think about your authorship pages, don't just think about it from the schema markup page. Absolutely mark it up. Why not? Because it can contribute to your knowledge panel, but make sure that it actually says something about you that is distinct, and will help LLMs to understand who you are and what you do, and attach you to the business that you're involved in. The other named entities that are related to you and your activities.
Jarno van Driel:
If I might hook into that, I always love to quote Jason Barnard from Keller Cube with this. His point of view on this is that Google uses the information about the entity found on its entity homepage, and corroborates that information about the entity found on the web, in a process called reconciliation. And when you're looking at entity homepages, so first of all you should have an entity homepage with content that aligns with the information out there on the web. But secondly, markup can help play a role in that, in a sense that if markup confirms the information found on the web, that helps consolidate that information. And markup does allow Google to directly import that information into its Knowledge Graph. But for that to happen, it needs to be corroborated at enough sources out there, to be able to determine whether it's true or false, or true or false enough.
Patrick Hathaway:
Awesome. Right. Well, that's about all we've got time for today. Thank you so much everybody for watching, for all the great questions. And huge thanks, of course, go to Dave, Jarno and Crystal for so generously giving up their time and expertise. We will be emailing out the recording tomorrow to everyone who registered. So if you missed the beginning, don't worry, you can catch up. So since we concluded that, in fact, structured data is not dead, we still need to be auditing structured data. So just a reminder that two weeks today, we have the Sitebulb Masterclass on auditing structured data. I've just put the link in the chat, so hopefully we'll see you there. Thanks again for watching. We'll see you on the next one.
