Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Subdomain Removal in Robots.txt with Conditional Logic??
-
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live.
My specific situation is this:
I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content.
Should I:
a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense)
b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning...
Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
-
Here's how I dealt with a similar situation in the past.
Robots.txt on each of the dev subdomains and on the live domain. Dev subdomains robots.txt excluded the entire subdomain, and subdomains were verified in GWT and removed as needed.
Made live subdomain robots.txt read-only so it didn't get overwritten. Should have made dev subdomains robots.txt read-only as well, since they sometimes got refreshed with the live content (there was a UGC database that would occasionally get copied to a dev subdomain, and we'd have robots.txt get copied over too and dev subdomain indexed).
Set up a code monitor that checks the contents of all of the robots.txt daily and sends me an email if anything is changed.
Not perfect, but I was at least able to catch changes soon after they happened, and prevented a few changes.
-
you can't put logic in robots.txt and subdomains are seen as different sites, so you need to create separate robots.txt files for each subdomain and block them in their respective robots.txt files.
You'll need to also add the Google verification code and verify them, then in GWMT you can request to have the subdomain removed from Googles index, that's the fastest way.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Homepage was removed from google and got deranked
Hello experts I have a problem. The main page of my homepage got deranked severely and now I am not sure how to get the rank back. It started when I accidentally canonicalized the main page "https://kv16.dk" to a page that did not exist. 4 months later the page got deranked, and you were not able to see the "main page" in the search results at all, not even when searching for "kv16.dk". Then we discovered the canonicalization mistake and fixed it, and were able to get the main page back in the search results when searching for "kv16.dk". At first after we made the correction, some weeks passed by, and the ranking didn't get better. Google search console recommended uploading a sitemap, do we did that. However in this sitemap there was a lot of "thin content sites", for all the wordpress attachments. E.g. for every image in an article. more exactly there were 91 of these attachment sites, and the rest of the page consists of only two pages "main page" and an extra landing page. After that google begun recommending the attachment urls in some searches. We tried fixing it by redirecting all the attachments to their simple form. E.g. if it was an attachment page for an image we redirected strait to the image. Google has not yet removed these attachment pages, so the question is if you think it will help to remove the attachments via google search console, or will that not help at all? For example when we search "kv16" an attachment URL named "birksø" is one of the first results
Technical SEO | | Christian_T0 -
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Images, CSS and Javascript on subdomain or external website
Hi guy's, I came across webshops that put images, CSS and Javascript on different websites or subdomains. Does this boost SEO results? On our Wordpress webshop all the sourcescodes are placed after our own domainname: www.ourdomainname.com/wp-includes/js/jquery/jquery.js?ver=1.11.3'
Technical SEO | | Happy-SEO
www.ourdomainname.com/wp-content/uploads/2015/09/example.jpg Examples of other website: Website 1:
https://www.zalando.nl/heren-home/ Sourcecode:
https://secure-i3.ztat.net//camp/03/d5/1a0168ac81f2ffb010803d108221.jpg
https://secure-media.ztat.net/media/cms/adproduct/ad-product.min.css?_=1447764579000 Website 2:
https://www.bol.com/nl/index.html Sourcecode:
https://s.s-bol.com/nl/static/css/main/webselfservice.1358897755.css
//s.s-bol.com/nl/upload/images/logos/bol-logo-500500.jpg Website 3:
http://www.wehkamp.nl/ Sourcecode:
https://static.wehkamp.nl/assets/styles/themes/wehkamp.color.min.css?v=f47bf1
http://assets.wehkamp.com/i/wehkamp/350-450-layer-SDD-wk51-v3.jpg0 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
How to remove a sub domain from Google Index!
Hello, I have a website having many subdomains having same copy of content i think its harming my SEO for that site since abc and xyz sub domains do have same contents. Thus i require to know i have already deleted required subdomain DNS RECORDS now how to have those pages removed from Google index as well ? The DNS Records no more exists for those subdomains already.
Technical SEO | | anand20100 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0