Should I set up a disallow in the robots.txt for catalog search results?

JordanJudson

When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?

AlanBleiweiss

One step at a time = long term success. I wish you the best with it Jordan.

JordanJudson

Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.

SteveOllington

Totally agree with Alan, it can cause circular navigation problems for crawlers too.

AlanBleiweiss

Jordan,

Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.

I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).

I'd also suggest you have a big push ahead in dealing with near-duplicate content.

For example:

http://www.durafaucet.com/mk850-orb.html

http://www.durafaucet.com/kitchen-faucets/mk850.html

Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.

And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).

So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Should I set up a disallow in the robots.txt for catalog search results?

Browse Questions

Explore more categories

Related Questions

Page disappears from Google search results

Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'

Robots.txt Syntax for Dynamic URLs

Is Google suppressing a page from results - if so why?

Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?

Notice of DMCA removal from Google Search

Removing robots.txt on WordPress site problem

Image search and CDNs

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved