Pages Preview - Protect from Search Index Crawlers

What’s the easiest way to stop search engine crawler bots from indexing my preview deployment, or does Cloudflare automatically take care of it?

I want to ensure that www.mypage.com appears in search engine result, but that preview.mypage.pages.dev does not.

For such scenarion there is a native solution. No need to block anything,

Just use the canionical tag

<link rel=canonical href=https://www.mypage.com/>
and set your domain (Abs. URL) as value of the href, and not the preview.mypage.pages.dev one. So Google and other search engines know, that the original content is there and they should not display the “preview.mypage.pages.dev” links in the search.

Notice that all pages, which are containing such tag must, match the real URL. So if your subsite is /about-me then the canonical for this page should be:

<link rel=canonical href=https://www.mypage.com/about-me>

Give it a day, or even better make Google recrawl your page and then wait some day untill google adjusts its results according to your change.

Hope that helps.

2 Likes

That’s perfect - I knew there would be a simple solution, just couldn’t work it out myself!

Thanks for your help.

Canonical URLs are a good idea, though Google does sometimes just see them as a ‘suggestion’, and will try and figure things out itself, which may result in the wrong page being indexed.

My recommendation would be one of the following, to prevent indexing of the pages.dev site entirely:

  1. Add something like this to your _headers, to instruct search engines not to index when on the pages.dev domain:
https://:project.pages.dev/*
	X-Robots-Tag: noindex

For more info: https://developers.cloudflare.com/pages/platform/headers/

  1. Or setup Cloudflare Access on your pages.dev subdomain which puts a login wall in front it: https://developers.cloudflare.com/pages/platform/known-issues/#enabling-access-on-your-pagesdev-domain
3 Likes

Also useful!

I implemented the header solution you proposed as well as the canonical solution.

I did try Cloudflare access first, but it seems its incompatible with my restrictive CSP, and I don’t have the mental capacity to open that can of worms today!

Thanks for your help.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.