Blocking Dynamic URLs with Robots.txt

AndrewY

Background:

My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page:

www.mysite.com/widgets.html

...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page:

http://www.mysite.com/widgets.html?price=1%2C250

http://www.mysite.com/widgets.html?price=2%2C250

http://www.mysite.com/widgets.html?price=3%2C250

As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations.

Question:

Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry.
To implement, I was going to do the following in Robots.txt:

User-agent: *

Disallow: /*?

Disallow: /*=

....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution?

Thank you!

TaitLarson

If you are happy with any URLs with query strings not being indexed your robots.txt will work fine.

Do any or your URLs with question marks in them have links to them? If so you might want to be careful blocking google from indexing them. I would think you'd lose the benefits those links would pass to your site.

AndrewY

Tait,

Thanks for the answer. I think the canonical tag would be ideal, but in terms of implementation, it would require some substantial code modification to the site / PHP code as I have a lot of categories, and adding this manually to each one would be very time consuming.

Would preventing the spiders from indexing any URLs with a "?" or "&" (which would only be dynamic URLs variations) cause any problems? Or is this just not an ideal best practice?

Thanks!

TaitLarson

I don't know if there's a good solution with robots.txt given your URL structure. However, you could use the rel=canonical link tag in the header to force google to treat many of your URLs the same way. This would help you avoid duplicate content penalties.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Blocking Dynamic URLs with Robots.txt

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

Inactive Products - Inactive URLs

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Removing UpperCase URLs from Indexing

Canonical URL & sitemap URL mismatch

Robots.txt: how to exclude sub-directories correctly?

Urls missing from product_cat sitemap

Changing a url from .html to .com

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved