CDN Being Crawled and Indexed by Google

Scott-Thomas

I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases.

It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt?

Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL.

Thanks!

Scott

irvingw

It sounds like you got a hold of the problem.

Verify the subdomains in WMT

Block the CDN subdomains with robots.txt

Request site removal in WMT for the subdomains

make the canonicals absolute

Keep the blocked subdomains in WMT, when you log in you will see a message by the subdomains saying "Critical issue with your site" which is just telling you that the site is blocked.. I like to keep it in there so I can see it's still blocked.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

CDN Being Crawled and Indexed by Google

Browse Questions

Explore more categories

Related Questions

Should I "no-index" two exact pages on Google results?

Google not Indexing images on CDN.

Google indexing despite robots.txt block

How to fix Google index after fixing site infected with malware.

How to remove all sandbox test site link indexed by google?

Pages removed from Google index?

How to determine which pages are not indexed

Is Google caching date same as crawling/indexing date?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved