Does Bing ignore robots txt files?

Nightwing

Bonjour from "Its a miracle is not raining" Wetherby Uk

Ok here goes... Why despite a robots text file excluding indexing to site

http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google?

Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below.

http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg

Any insights welcome

Nightwing

Thanks Clever PHD - we are now adding your recommendations to our preview sites

CleverPhD

I know this does not sound related, but Matt Cutts explains this same situation on Google. It is probably the same reasoning for Bing.

http://www.mattcutts.com/blog/robots-txt-remove-url/

Looking at your screen shot, it looks as if all that is being shown in Bing is just the URL, no title tag, description, no other information.

What Matt says is that they did not technically crawl the url, but they are aware that it exists. Example, there is another page linking to it with related content or the anchor tag on the link relates to the keyword search you are performing.

You are searching for the URL specifically and so it makes sense that they would show the URL as it relates to that search, but they are not showing any information from the page as they do not have it as they did not spider it, again, they are just aware of the URL. Kind of like talking to a lawyer eh?

If you search for any other keywords does this excluded site show up? Probably not. If the do, then they are probably only showing the URL like in the example above.

The video has more details. Here are the solutions he gives, I will outline them as well

Use the Bing URL removal tool - bing bang boom. Done.
(my new favorite) Let the page / site be indexed but then show an noindex nofollow meta tag on the page / site. There is a subtle but important difference in the meta tag vs the robot.txt file. The spiders have to be able to crawl the page to be able to see what they are supposed to do with it.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710

"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it."

The thing is, if you have a robots.txt file that says don't crawl the site, then the spider never gets to the noindex meta tag to know to delete the page from the index. It sounds a little backwards, but when the page is already in the search index, you have to let the spider crawl it to then see the noindex tag so that the search engine will know to remove it from the index.

Here is what you can do as this seems to only be an issue with Bing and just with the home page. Open up the robots.txt to allow Bing to crawl the site. Restrict the crawling to the home page only and exclude all the other pages from the crawl.

On the home page that you allow Bing to crawl, add the noindex no follow meta tag and you should be set.

All of that said. If you have a single URL listed in bing with no meta data, it may not be worth all the above effort as you are not ranking for any valuable key words, but that is your call

It is always interesting to see how the spiders and engines think so I wanted to pass this along.

Cheers!

PS - If you have a ton of pages like this - then you just would allow Bing to crawl them all and add the noindex nofollow tag to all of them.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Does Bing ignore robots txt files?

Browse Questions

Explore more categories

Related Questions

Bing Webmaster Shows Domain without WWW

Robots.txt & meta noindex--site still shows up on Google Search

Removing CSS & JS Files from Index

Is there a limit to how many URLs you can put in a robots.txt file?

Creating a CSV file for uploading 301 redirect URL map

Robots.txt to disallow /index.php/ path

OK to block /js/ folder using robots.txt?

Is blocking RSS Feeds with robots.txt necessary?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved