Robots.txt, does it need preceding directory structure?

Milian

Do you need the entire preceding path in robots.txt for it to match?

e.g:

I know if i add Disallow: /fish to robots.txt it will block

/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything

But would it block?:

en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything

(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!

As basically I'm wanting to block many URL that have BTS- in such as:

http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob

But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:

http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy

Thanks for listening

Milian

Yes this is what I thought, but wanted some second opinions.

Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:

/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look

PinpointDesigns

You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish

You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*

This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/

In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.

Hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt, does it need preceding directory structure?

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

Block session id URLs with robots.txt

Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google

How is Google crawling and indexing this directory listing?

Recovering from robots.txt error

Meta NoIndex tag and Robots Disallow

Robots.txt is blocking Wordpress Pages from Googlebot?

All page files in root? Or to use directories?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved