Structured Data for News Publishers
I explain the types of structured data a news publishing site should have, and how these can be implemented for optimal results.
For this newsletter I’m digging into one of my favourite topics: Structured data. I’m not going to explain in detail what structured data is - Google does a pretty good job of that in their official documentation:
Structured data is a standardized format for providing information about a page and classifying the page content; for example, on a recipe page, what are the ingredients, the cooking time and temperature, the calories, and so on.
Basically structured data is extra markup you put in your HTML code that tells machine systems - like Google - exactly what type of content is on the page. It makes Google’s life easier. And much of SEO is making Google’s life easier.
Structured Data Formats
The structured data that Google supports is based on the schema.org vocabulary. This is not a perfect marriage: In their documentation, Google states that you shouldn’t rely on the schema.org website but on Google’s own documentation instead.
In the context of SEO, we care about structured data primarily for its impact on Google. So, while sometimes Google’s requirements will be different from what’s stated on the schema.org website, in those cases you’ll want to follow Google’s rules.
You can implement structured data in different ways. The two most common approaches are in-line with the page content’s HTML (microdata and RDFa), or in a separate snippet using JavaScript object notation (JSON-LD). Google explicitly prefers the latter:
I also prefer JSON-LD, because it significantly eases any troubleshooting you may need to do. By keeping the structured data in a separate snippet in JSON-LD, you make it much simpler to test, implement, and fix.
A JSON-LD structured data snippet can theoretically sit anywhere in a webpage’s HTML code. In my experience, it’s best to have it as part of the <head>, and fairly high up in the <head> as well. This seems to reduce the chance of the snippet not being picked up by Google.
Most importantly, the structured data should be present in the raw (unrendered) HTML code. It should not rely on client-side JavaScript to be injected into the webpage.
The reason for this is speed. While Google does render webpages as part of its indexing process, that rendering is done relatively slow. Can take minutes, can also take hours or even days.
News has to start ranking in Google’s results straight away. Google cannot wait for its own datacentres to complete a full render of a news article, as this could mean the article isn’t shown until it’s already out of date.
So the initial indexing of a news article is based on its HTML code only. And that means all the SEO-critical components of an article (headline, full content, <title>, canonical tag, Open Graph, etc. - and structured data) need to be part of the HTML code before (and after) any client-side JavaScript is loaded.
I still see some implementations where structured data is loaded with JavaScript, for example with Google Tag Manager. This is fine for non-news webpages, but for news it simply doesn’t work.
Required Structured Data
So which structured data does a news publishing site actually need? Well, none of it is mandatory. A site can rank just fine in Google’s results, both news-specific and general, without any structured data at all.
But I believe certain structured data snippets are highly advisable, as they help Google understand the context and purpose of your content better. This can help with your content’s appearance in Top Stories, Google Discover, and other news-specific areas.
These are the structured data snippets I would strongly recommend a publisher implement:
Article/NewsArticle on article pages
LiveBlogPosting on live coverage article pages
That’s it. These two are the structured data snippets I think every publisher should definitely have. Everything else is optional - more on that below.
Before we move on to other structured data snippets, let’s answer a few common questions about these two.
What subtype of Article structured data should you use?
In its Article structured data documentation, Google says it supports the BlogPosting, Article, and NewsArticle structured data types.
In my view, it doesn’t make any difference which one you use. They’re all valid. I generally recommend NewsArticle for stories aimed at the news cycle, and Article for evergreen content. I think BlogPosting is still supported by Google because many news sites started out as blogs, but personally I wouldn’t use BlogPosting anymore.
Can you use NewsArticle subtypes?
There are more granular subtypes within NewsArticle, such as ReportageNewsArticle, OpinionNewsArticle, ReviewNewsArticle, etc.
I don’t think there’s an SEO benefit to using these. But they seem to work fine for Google, and validate as articles in Google’s Rich Results Test.
Which attributes should your Article snippet have?
In their documentation, Google is fairly clear about which attributes are required and recommended. None are required, but the more recommended attributes you provide the easier Google can understand the article.
These are the attributes Google recommends for every Article structured data snippet: headline, image (ideally three, one for each of the preferred aspect ratios - 1x1, 4x3, and 16x9), author (with name and URL), datePublished, and dateModified.
That’s it.
Most implementations will have more attributes defined, such as description, publisher, URL (for the article itself), mainEntityOfPage, and complimentary attributes such as keywords, articleSection, and sometimes even the full articleBody.
But none of those are actually needed.
When you look at how Google displays news stories in its search results, it makes sense why so few attributes are recommended.
All that Google shows is the publisher, the headline, the image, and the timestamp.
I find it interesting that there is no recommendation to include publisher attributes in your article structured data, despite it being such an obvious part of the article ‘s visual presence. I think this is because Google establishes the publisher details on the hostname level, and doesn’t extract it from an article’s structured data.
A notable example is how content from The Athletic is shown on Google with The New York Times branding. The article structured data on The Athletic clearly defines it as part of The Athletic, but because the content is now published on www.nytimes.com it gets that hostname’s branding instead:
The additional recommended attributes for authors is, I believe, a way for Google to establish authorship and expertise. Beyond that, everything else is just noise.
Because every publisher has their own implementation, with some defining a shedload of attributes and others defining very few, there’s no way for Google to attach strong value judgments to the presence or absence of additional attributes. Rewarding more structured data would create a two-tier web space where sites with more development resources or better CMSs win over those less fortunate, regardless of the quality of the journalism.
Which is also, of course, the main reason why there is no hard requirement for any structured data. It’s all optional.
Should I declare my paywall in the Article structured data?
Yes - more on that here.
Do you need LiveBlogPosting to get the red Live badge?
Yes. But there’s no guarantee that you get the Live badge, even with a full implementation of LiveBlogPosting markup. It seems Google needs to somehow ‘approve’ the site for red Live badges.
I’m honestly not sure what it takes to get approved for a Live badge, but LiveBlogPosting structured data appears to be a hard requirement. Regularly publishing live articles with LiveBlogPosting seems to be part of the approval process.
Can LiveBlogPosting exist with NewsArticle?
Theoretically, yes, you can have both LiveBlogPosting and NewsArticle structured data on one article page. You won’t get penalised for it. However, LiveBlogPosting covers everything Google needs to interpret a live article as such, so the presence of NewsArticle structured data is unnecessary.
My LiveBlogPosting doesn’t validate in the Rich Results Test
Yes, welcome to the club. I have yet to see a LiveBlogPosting implementation that doesn’t show at least one error in Google’s Rich Results Test. And yet, it still seems to work fine, and the article gets the coveted live badge in Top Stories.
Despite the fact Google obviously supports live articles, there is no official Google documentation on LiveBlogPosting markup available to the public. There is some documentation available to those who were given access as part of the original Google pilot program for live coverage, but this seems to be dormant.
Generally I recommend copying the LiveBlogPosting snippet from a competing website that gets the red Live badge, and refer to the schema.org page on LiveBlogPosting to enhance your snippet. Use the Rich Results Test and the Schema Validator to try and get your snippet as close to perfection as possible.
Recommended Structured Data
Of course there’s more to structured data than just article markup. There are several more snippets that I would recommend every publisher implement. These are:
Organization
Organization structured data should be implement on the site’s homepage (and only there, ideally). In this snippet you define your business’s name, website, logo, associated social media presence (in the sameAs attribute), and any contact points (such as the customer service desk) that you want the public to be aware of.
If you have a physical address for your headquarters, I’d recommend providing those attributes too. An actual office address is an extra level of credibility.
For news publishers, you can use NewsMediaOrganization instead of just Organization. This is a more granular subtype that makes it explicitly clear what your business’s purpose is.
VideoObject
For pages that have a video embedded, you’ll want to implement appropriate VideoObject structured data.
Note that if the video is not the main content of the page - so if it’s just the ‘banner’ above the article, or embedded in the article content - Google will not accept the page as a video page. It’ll be shown in Google Search Console as ‘No video indexed’ even when your VideoObject markup is fully valid.
This is because Google only wants to show a video result if the URL is a dedicated video page.
I think it’s still worthwhile to include VideoObject structured data on articles where the video is not the primary content, even though you may not be rewarded by Google in any way.
BreadcrumbList
I’m a fan of implementing Breadcrumb structured data. The direct SEO benefits are marginal; you basically get a somewhat nicer-looking breadcrumb when the article is shown in regular search results:
However, I think there’s more to breadcrumbs. I’m in favour of visual breadcrumbs on articles, as these show the content hierarchy and send strong signals about the position the article occupies in the site’s overall structure.
Breadcrumbs also help with sending link value to the article’s parent categories, and serve as an additional navigation method.
When you implement visual breadcrumbs, you should also implement Breadcrumb structured data.
Person / ProfilePage
Having author pages for your regular writers has been a best practice for a while. On these author pages, you should also consider implementing Person or ProfilePage structured data.
This structured data makes the content on the author page more easily digestible for Google, allowing the search engine to establish the author’s entity in the Knowledge Graph and identify expertise and authority.
Make sure you implement the ‘sameAs’ attribute on your author pages, with URLs listing the author’s social media profiles and other publications they write for. This all helps with the author’s entity in Google’s knowledge graph.
SearchAction
Lastly, I recommend implementing SearchAction structured data on the site’s homepage. With SearchAction you let Google know that your site has an internal search function, and how that search function can be accessed.
For large-ish news brands, Google will often show a search box as part of the brand’s search result - a so-called Sitelinks Search box:
If you do not provide SearchAction structured data, anyone using this search feature on Google will get another Google search result - just limited to the site in question:
With SearchAction, Google should theoretically trigger your own internal site search when someone uses the search box on your branded Google result.
I say ‘theoretically’, because in this example The Telegraph does actually have SearchAction structured data but Google is still generating its own SERP instead of using the Telegraph’s search function.
I suspect it could be related to the fact The Telegraph’s search function is JavaScript-powered. (Yes, it’s always JavaScript’s fault.)
Wrapping Up
There’s a lot more to say about structured data, as there are plenty more opportunities for implementation. Job listings should have JobPosting markup, a frequently asked questions page should have FAQ markup, etc.
Google’s own extensive documentation on structured data is a great resource, so please dig into that to see all the types that are officially supported and how to best implement them.
I may do a follow-up newsletter at some stage on additional structured data snippets. Let me know in the comments which structured data types you’d like me to dig into.
Miscellanea
It’s been a fun time in the world of SEO and publishing! Here’s a roundup of recent articles and stories I found worthwhile:
Official Google Docs:
Interesting Articles:
The Subprime AI Crisis - Ed Zitron
Want to fight misinformation? Teach people how algorithms work - Nieman Lab
Here’s how 7 news audience directors are thinking about Google’s AI Overviews - Nieman Lab
No god in the machine: the pitfalls of AI worship - The Guardian
New Research: So Far, AI is Not Disrupting Search or Making a Dent in Google - Sparktoro
Latest in SEO:
The Sites Dominating Google’s Top Stories SERP Feature in 2024 - Detailed
Google’s Top stories looks broken: Are news publishers to blame? - SEL
Aggregation and SEO: Best Practices - WTF is SEO?
Linking the News: Off-page SEO for News Publishers - Sitebulb
Interview With Google's Search Liaison On The August 2024 Core Update - SER
Google: Core Web Vitals Aren't As Important As Some People Might Think - SER
Google Issues Manual Actions Over Google Discover Policy Violations - SER
Lastly, some usual self-promotion. I was a panellist at the recent Future of Media Technology conference, where my panel spoke about the ‘interesting’ relationship between publishers and tech platforms in the age of LLMs. Here are two recaps from that session:
Google killing publisher voucher codes overnight part of wider trend, says Mail exec - Press Gazette
AI: should publishers sign up or sue? - Adam Tinworth
Recently I was interviewed on the topic of search and publishers, looking at underlying reasons for news sites’ decline in Google traffic. Here is the resulting article:
Searching for search traffic - Editor and Publisher
If you like to watch hour-long videos of SEOs blabbering about SEO for news and publishers, I got just the thing for you. Sean Bianco from The SEO Club invited me onto his podcast and we had a blast:
How to Rank in Top Stories and Discover (video) - The SEO Club
News and Editorial SEO Summit 2024
We’re just over a month away from our fourth annual News and Editorial SEO Summit. This year has a truly packed lineup across the two days of our online livestreamed event.
If you haven’t get secured your ticket, giving you access to the live event as well as recordings of all the sessions, better get a move on!
That’s it for this edition of SEO for Google News. Thanks as always for reading and subscribing. See you at the next one!
Thank your very much for this content, Barry! Just today I read a lot about this topic in your old articles. It's funny that you're publishing a new article today! :)
Would you advise using WordPress plugins that simply adapt the structured data or would it be better to implement it in the code yourself? If you are leaning towards a WordPress plugin, can you recommend a specific one?