Will an XML sitemap override a robots.txt

KCBackofen

I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed.

I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why.

Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?

mememax

The robots file will avoid google to show further information on the disallowed pages but it doesn't prevent indexation.

They're still indexed (that's why you're seeing them) but with no meta desc nor text taken from the page because google wasn't allowed to retrieve more information.

If you want them to start showing info, you'll jsut need to remove that rule from the robots.txt and soon you'll start seeing those pages information showing, but if you want them out of the index you can use GWT to remove them from the index after you've included in each page the noindex meta tag which is the only command which will prevent indexation.

KCBackofen

I assumed the same thing, but I performed a site command search while they were prospects, and they had 1 result present with the explanation of "A description for this result is not available because of this site's robots.txt – learn more"

They uploaded an xml sitemap before I could tell them to remove the robots.txt. and 1 week later, the entire site is now in the index.

I have used the robots.txt to properly block websites, it usually takes 2-3 for all results to drop out the index, so I don't know how that could explain it either.

Zachary_Russell

I agree, the only way I could think this would work would be if the robotx.txt file was on the root domain. I agree, check Webmaster tools, they will tell you under the sitemaps section about "Error: URL was blocked by robots.txt).

One thing to remember is that robots.txt is technically a suggestion to ask search engines not to crawl your site. They can choose to ignore it, though personally I don't know of any cases in which this happenned.

TakeshiYoung

An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by robots are being submitted.

Now, robots.txt does not prevent indexation, just crawling. So if the pages were indexed before they implemented robots.txt, they may continue to be indexed. Google will also display just the URL for pages that it's discovered, but can't crawl because of robots.txt.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Will an XML sitemap override a robots.txt

Browse Questions

Explore more categories

Related Questions

Sitemap.xml strategy for site with thousands of pages

Disallow wildcard match in Robots.txt

2 sitemaps on my robots.txt?

Robots.txt on subdomains

Google indexing despite robots.txt block

Robots.txt Sitemap with Relative Path

Robots.txt file getting a 500 error - is this a problem?

Is "last modified" time in XML Sitemaps important?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved