Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have two robots.txt pages for www and non-www version. Will that be a problem?
There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version.
Technical SEO | | ramb0 -
Removing site subdomains from Google search
Hi everyone, I hope you are having a good week? My website has several subdomains that I had shut down some time back and pages on these subdomains are still appearing in the Google search result pages. I want all the URLs from these subdomains to stop appearing in the Google search result pages and I was hoping to see if anyone can help me with this. The subdomains are no longer under my control as I don't have web hosting for these sites (so these subdomain sites just show a default hosting server page). Because of this, I cannot verify these in search console and submit a url/site removal request to Google. In total, there are about 70 pages from these subdomains showing up in Google at the moment and I'm concerned in case these pages have any negative impacts on my SEO. Thanks for taking the time to read my post.
Technical SEO | | QuantumWeb620 -
Abnormally high internal link reported in Google Search Console not matching Moz reports
If I'm looking at our internal link count and structure on Google Search Console, some pages are listed as having over a thousand internal links within our site. I've read that having too many internal links on a page devalues that page's PageRank, because the value is divided amongst the pages it links out to. Likewise, I've heard having too many internal links is just bad in general for SEO. Is that true? The problem I'm facing is determining how Google is "discovering" these internal links. If I'm just looking at one single page reported with, say, 1,350 links and I'm just looking at the code, it may only have 80 or 90 actual links. Moz will confirm this, as well. So why would Google Search Console report different? Should I be concerned about this?
Technical SEO | | Closetstogo0 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
Parked domain is first in search results
We have several brand related domains which are parked and pointing to our main website. Some of these websites are redirecting using a 302 (don't ask, that's a whole other story), but these are being changed. But it shouldn't matter what type of redirect they are no? Since there has never been any traffic and they are not indexed? But it seems that one of them was indexed: exotravel.vn. A search for our brand name or the previous brand name (exotravel and exotissimo) brings up this parked domain first! How can that be? The domain has never been used and has no backlinks. exotravel.vn is redirecting and I submitted a change of address weeks ago to Google, but its still coming up first in all brand name searches for exotissimo or exotravel.
Technical SEO | | Exotissimo0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
NoIndex/NoFollow pages showing up when doing a Google search using "Site:" parameter
We recently launched a beta version of our new website in a subdomain of our existing site. The existing site is www.fonts.com with the beta living at new.fonts.com. We do not want Google to crawl the new site until it's out of beta so we have added the following on all pages: However, one of our team members noticed that google is displaying results from new.fonts.com when doing an "site:new.fonts.com" search (see attached screenshot). Is it possible that Google is indexing the content despite the noindex, nofollow tags? We have double checked the syntax and it seems correct except the trailing "/". I know Google still crawls noindexed pages, however, the fact that they're showing up in search results using the site search syntax is unsettling. Any thoughts would be appreciated! DyWRP.png
Technical SEO | | ChrisRoberts-MTI0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0