Best Practices for Paywalls and SEO

More publishers are exploring subscription models to generate revenue. Paywalls are powerful mechanisms for monetisation, but there are SEO risks involved.

Feb 14, 2023

I remember when paywalls first arrived as a monetisation channel for publishers. I have to admit that I was skeptical at first. Online news had been something people were used to enjoy for free, and I had doubts about the viability of paywall models.

I was happy to be proven wrong, and now paywalls are not a controversial topic in online publishing. Most paywalls are successful, with websites reporting positive financial results.

For publishers that haven’t embarked on a paid subscription journey, you could say it’s not a matter of ‘if’ but of ‘when’.

It’s not as simple as slapping a subscription form onto your website, however. Paywalls need to be carefully planned before implementation, with the impact on all aspects of a website’s traffic channels and revenue streams considered before making the leap into a subscription model.

Paywall Types

Generally, we can identify four types of paywalls:

Hard paywall

With a hard paywall, a publisher puts all their content behind a subscription and there is no method of accessing the content without signing up.

Hard paywalls usually mean the site’s homepage and section pages don’t require a login, and will list articles as any publisher would. The paywall only kicks in when someone attempts to read an article.

I know some publishers have gone one step further and put even their homepage behind a paywall (a super-hard paywall). Personally I think that’s a step too far, creating a significant barrier to entry for your potential subscribers.

Freemium

A freemium model means the publisher offers some articles for public access, requiring no login or subscription, with the rest of the site’s content behind a paywall.

Freemium models also come in different gradients, with some publishers offering only a small number of free articles and others having the bulk of their content openly available.

Metered Paywall

With a metered paywall, a reader will get a set number of articles to read for free before they’re asked to sign up to a subscription. This model ensures the publisher’s audience gets a chance to sample the content before parting with their hard-earned money.

Sometimes you see a mix of metered and freemium, where users can read a set number of free articles before getting a signup prompt, but some articles are always behind a paywall and not part of the free sample.

Dynamic Paywall

This is a new-ish form of paywall, which can be summarised as a ‘personalised metered paywall’. Software installed on the publisher’s website delivers a personalised experience catered to each user, only showing a paywall form when the software determines the user is highly likely to sign up to a subscription.

Dynamic paywalls come in a variety of flavours, depending on how the software is designed and implemented. What they have in common is that every user is profiled and has an opportunity to consume some free content before the paywall blocks further access.

Paywalls and Google

As this is a newsletter about SEO first and foremost, I won’t dig into the business cases for each paywall type. Instead I’ll try to answer the question I get asked most often when paywalls come up as a topic: Which paywall model is best for SEO in Google?

Google does not have an inherent bias against paywalled content, providing the website lets Google know that its content is behind a paywall.

Publishers with paywalls can still see their subscription-only content rank in Google search results, in all areas of search: Top Stories boxes and other news carousels, the news tab, the Google News vertical, in the Discover feed, and as classic ‘ten blue links’.

But - and this is a big caveat - publishers do need to make sure their paywalled content can be seen by Google, so it can index some or all article content and apply relevant ranking factors.

Google collaborated with publishers to better understand how paywalls impact on ranking signals. Their findings conclude that there are two preferred approaches: metered paywalls and ‘lead-in’ paywalls (where a portion of the article is offered for free, such as the headline and first paragraph, before the paywall kicks in):

Metering

Since Google behaves like a user without cookies and without history when it crawls your website, metered and dynamic paywalls don’t offer any obstacle to full crawling and indexing of your paywalled content.

Every Googlebot crawl request will be seen as a first-time visit, so your metered or dynamic paywall won’t kick in yet and Google has free access to all your articles.

This means that for SEO, metered and dynamic paywalls are more or less identical to completely free websites.

‘Lead-in’ Content

What Google means with a ‘lead-in’ is that the article headline and opening text should be accessible to Google when it crawls and indexes the paywalled content.

At a bare minimum, Google needs to be able to index a headline and an introduction paragraph (80 words minimum) for an article to be considered a rankable document in Google.

I will explain the ‘how’ of ensuring Google can see that below under Paywall Implementations.

I should note that, in my view, paywalls that only offer ‘lead-in’ content to Google generally perform worse than paywalls that allow Google full access to all article content. I believe this is because lead-in content is shorter and contains fewer signals for Google to base evaluations around quality and expertise on.

isAccessibleForFree

Next, you need to ensure Google understands when an article offers paywalled content, so it can differentiate your paywall from an attempt at cloaking.

The way to do this is with your NewsArticle structured data. In the structured data snippet on your paywalled articles, you need to define the isAccessibleForFree attribute, with the value of false if the article content is (wholly or partially) behind a paywall.

Additionally, there needs to be a cssSelector attribute, which has the value of your article page’s CSS class where the paywall kicks in.

Basically, you need to show Google exactly where in your HTML the paywall begins, so that Google understands which parts of your page are freely readable and which parts require a login.

Paywall Implementations

When it comes to technical paywall implementations and their impact on SEO, I generally see four different types. I’ll explore these in order of best to worst for SEO.

Note that these four different paywall implementations are independent of the four paywall business models I described above. These paywall implementations can apply to any paywall model.

Also, I made up the names of these four paywall implementations, so you may use entirely different terminology. I couldn’t find a standard way to label these, so I made up my own.

1. User-Agent Paywalls

With a user-agent paywall, the website will serve different HTML to regular users and to Google. Regular users get paywalled HTML, which can be fully content-locked without any free element. Verified Googlebot user-agents, however, receive different HTML which contains the full article content as well as a complete NewsArticle structured data snippet.

This way, you can ensure your content is fully crawlable and indexable for Google, while ensuring your paywall is not easily circumvented by savvy users.

If your user-agent detection uses reverse IP lookup to verify Googlebot visits, this paywall approach is almost unbeatable for all but the most determined crackers.

With user-agent paywalls, you absolutely have to use the isAccessibleForFree structured data attribute (with the false value). Failing to do so can lead Google to conclude that you’re cloaking your content, which can result in painful ranking penalties.

Pros: As Google can see all your content and links, there is no inherent SEO downside to a user-agent paywall. Users are generally unable to bypass your paywall.

Cons: Can be more difficult to implement than other approaches. You’ll need the noarchive meta robots tag or HTTP header to ensure a ‘cache:’ command in Google doesn’t surface the content.

2. JavaScript Paywall

JavaScript paywalls rely on client-side JavaScript to show a paywall overlay to users. The article HTML will have the complete article content, and often the NewsArticle structured data will also have a complete articleBody attribute.

A JavaScript paywall also needs to have the isAccessibleForFree structured data attribute implemented with the false value.

In the context of news, Google will initially index an article based purely on the HTML source and without executing client-side code, so a JavaScript paywall essentially offers the entire article for Google to crawl and index.

However, JavaScript paywalls are relatively easy to circumvent by users; simply disabling JavaScript in their browser generally suffices.

Pros: The full article content and links are indexable for Google, offering no inherent SEO downsides compared to free articles.

Cons: Users with a modicum of technical ability will be able to read your paywalled content without much bother.

3. Structured Data Paywall

With a structured data paywall, you do not have the article’s content in the HTML, but you do have the articleBody attribute in the NewsArticle structured data in the HTML source.

Essentially, the full article content is present only as the value of the articleBody attribute in the page’s structured data, which means it can be read in the source code and extracted using simple free structured data validation tools. Article content beyond the ‘lead-in’ content is not included in in the article’s regular HTML markup.

Pros: Structured data paywalls offer more content for Google to index and rank, allowing it to evaluate quality and E-A-T which often results in better visibility in Google search.

Cons: Tech-savvy users can extract the articleBody and read the content without signing up to the paywall. Google can’t see your article’s internal links (as these are only defined in regular HTML), so there is diminished SEO value compared to the previous two paywall implementations.

4. Content-Locked Paywall

With a completely locked paywall, there is no way for Google (or tech-savvy users) to find the content of the article without signing up to a subscription. An article’s content is entirely hidden from all users - including Googlebot - that are not logged in to the site’s paywall subscription.

The HTML source code of an article behind a locked paywall is generally quite short. It may contain the lead-in content, but no more. The article’s full content is not present in the HTML source.

The NewsArticle structured data of a locked paywall is also quite sparse. It generally has the headline attribute defined, and often also the description attribute, but it will not have the articleBody attribute or, at most, will have a brief summary in the articleBody.

This means that there is no way for Google to extract the full content from the article HTML.

Pros: Content-locked paywalls are fairly impossible to circumvent. Even tech-savvy users will not be able to get to the content and bypass the login.

Cons: Google can’t see the full content either. This means Google can’t properly evaluate the article’s quality, E-E-A-T signals, topical focus, internal links, etc. This generally results in lower rankings, as there is less information for Google to base its rankings on.

First Click Free

Many of you will remember the First Click Free programme Google launched back in 2008, where paywalled websites would open up their paywalls for a visit coming directly from Google. Only when a user would click on to a second article, would the paywall kick in.

This programme evolved over time and eventually retired in 2017. Many publishers still have systems in place where a user from Google gets to read an article for free and the paywall form only shows up when the user clicks through to further articles.

Essentially, First Click Free serves as a form of a metered paywall, so all SEO considerations that apply to metering also apply to First Click Free implementations.

Registration Paywalls

Some publishers ask users to create an account before allowing them to read the full contents of an article. There is no request for payment, but without an account the user is unable to continue to read the publisher’s output.

This is also a form of a paywall, even though there’s no financial payment. The requirement to create an account (and allow the publisher to monetise the user in different ways) is still a paywall, so all the paywall considerations above apply to ‘registration only’ content as well.

Paywalls and Engagement Signals

Now let’s talk about the elephant in the room. Even with the most porous and circumventable paywall implementations, there is still the matter of how users behave when they land on a paywalled website.

Google uses the ‘return to SERP’ engagement signal in their long term ranking evaluations. A ‘return to SERP’ is when a user clicks on a webpage on Google’s search results, and rather quickly comes back to the search result and clicks on a different webpage.

Such a ‘return to SERP’ - a ‘bounce’ in web analytics terminology - is a negative ranking signal. It tells Google that the first webpage the user clicked on did not fulfil the user’s purpose.

If a website has many of such ‘return to SERP’ happen when their webpages are shown in Google’s results, over time Google may choose to show fewer webpages from that website in its results.

Google has a laser focus on offering the best possible search results to its users, and a website where users keep bouncing away from does not meet Google’s criteria.

This is the real long-term SEO impact of paywalls: You will accumulate more ‘return to SERP’ signals, which in time cause diminished visibility in Google’s results.

You can mitigate ‘return to SERP’ signals with First Click Free implementations (a key reason why many publishers still use it) and smart paywall metering with regards to visits coming from Google.

Allowing users coming from Google to always read the full content of an article greatly reduces ‘return to SERP’ signals and can prevent long term SEO damage your paywall may otherwise cause.

Paywalls and Links

Another side-effect is that paywall articles are much less likely to be cited and linked-to as sources. When someone wants to quote a source and link to that source, a free article provides a much better user experience than a paywalled piece.

A 2023 study from BuzzSumo showed that on average paywalled content earn 60% fewer links than free content.

As Google still relies on links for various authority-related ranking signals, the result is that your site’s authority signals from external links diminish over time and aren’t replaced at the same rate by newer links.

That diminishing authority in Google will have a negative impact on your site’s ability to achieve top rankings in Google’s search results.

Paywalls and AMP

Lastly, some brief thoughts on paywalls and AMP. Historically it’s been a huge challenge to ensure paywalls work alongside AMP articles.

AMP, as a different tech stack from your regular website, can break the user experience; a user can be logged in to your paywall, but when visiting an AMP article from a Google result the user may still be confronted with a paywall login form.

While there are technical implementations that allow your regular paywall to mirror your AMP paywall, these are complicated and hard to implement. This is yet another reason why AMP is eagerly being ditched by publishers.

As AMP is a dying standard, hopefully we can stop worrying about what works and doesn’t work in AMP very soon, and just return our full focus on ensuring our publishing website is as good as it can be.

Paywalls & SEO Summarised

Paywalls are increasingly common and offer an attractive monetisation opportunity. When implemented correctly, with the full content and links of an article being available to Googlebot, a paywall doesn’t have any inherent negative repercussions for your website’s SEO.

However, paywalls can cause long-term SEO damage through reduced engagement signals and fewer earned backlinks and citations. This can be mitigated to a degree by smart metering and/or allowing visitors coming from Google results full access to your articles.

There are many different levels of paywall, both in terms of strictness and technical implementation. I’ve tried to capture the most common types in this newsletter, but there will be many paywall implementations that don’t neatly fit into the categories I’ve described here.

If you’re in doubt about your paywall’s setup with regards to SEO, feel free to get in touch with me and we can arrange a paywall SEO sanity check.

Lastly, WTF is SEO have also published an excellent newsletter on paywall strategy, which is definitely worth a read.

Miscellanea

Here are some interesting stories and insights I’ve come across recently.

There’s been lots of hullabaloo about Bing’s ChatGPT and Google’s Bard generative AI systems. Some highlights:

After Bing announced their ChatGPT implementation, it was Code Red at Google.
The hasty launch of Google Bard didn’t go entirely to plan.
Google also published official guidance on AI-generated content.
There are countless examples of generative AI providing factually incorrect information.
Not to mention that you can easily create an entire fake news website with generative AI.
And there’s a chance generative AI breaches copyright when it provides paywalled content as answers.
Since we’re on the topic of paywalls, I was interviewed for a Times article on ChatGPT & Bard which is - you guessed it - behind a paywall.

Suffice to say, the debate about generative AI will continue for a while. I believe generative AI can become a real threat to the free and open web. For now, however, I don’t think human content creators have to worry about their livelihoods just yet.

Yandex Leak

For me, the more interesting recent novelty in the SEO industry was the leak of some of Yandex’s source code. Loads of fascinating nuggets to be found, and this investigation from Mike King is probably the best article to read on the topic.

That’s a wrap on another edition! We’ve now surpassed 5500 subscribers (SEO for Google News is neck and neck with WTF is SEO - the first newsletter to reach 10K subscribers gets a cake from the other) and I really appreciate all of you who subscribe, share, and comment on my articles.

The next newsletter may contain a surprise, so stay tuned!

Shout-out to Svetoslav Petkov, Mirko Obkircher, and Tech Business News for pointing out a few omissions, which I've since added into the article.

Expand full comment

Mikhail Skoptsov

Thank you for this information. I was wondering - would you know what sort of paywall model Substack employs and what effect its usage has on traffic and SEO?

Eg. if I turn on the pay option on my newsletter and lock part of an article's content behind a paywall, it would be a content-locked paywall with lead-in content, is that correct? In such a case, would the paywalled part of the article be completely inaccessible to Google?

2 replies by Barry Adams and others

6 more comments...