Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URLs dropping from index (Crawled, currently not indexed)
I've noticed that some of our URLs have recently dropped completely out of Google's index. When carrying out a URL inspection in GSC, it comes up with 'Crawled, currently not indexed'. Strangely, I've also noticed that under referring page it says 'None detected', which is definitely not the case. I wonder if it could be something to do with the following? https://www.seroundtable.com/google-ranking-index-drop-30192.html - It seems to be a bug affecting quite a few people. Here are a few examples of the URLs that have gone missing: https://www.ihasco.co.uk/courses/detail/sexual-harassment-awareness-training https://www.ihasco.co.uk/courses/detail/conflict-resolution-training https://www.ihasco.co.uk/courses/detail/prevent-duty-training Any help here would be massively appreciated!
Technical SEO | | iHasco0 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
De-indexed from Google
Hi Search Experts! We are just launching a new site for a client with a completely new URL. The client can not provide any access details for their existing site. Any ideas how can we get the existing site de-indexed from Google? Thanks guys!
Technical SEO | | rikmon0 -
Is Google caching date same as crawling/indexing date?
If a site is cached on say 9 oct 2012 doesn't that also mean that Google crawled it on same date ? And indexed it on same date?
Technical SEO | | Personnel_Concept0 -
Duplicate content problem from an index.php file
Hi One of my sites is flagging a duplicate content problem which is affecting the search rankings. The duplicate problem is caused by http://www.mydomain.com/index.php which has a page rank of 26 How can I sort the duplicate content problem, as the main page should just be http://www.mydomain.com which has a page rank of 42 and is the stronger page with stronger links etc Many Thanks
Technical SEO | | ocelot0 -
Instant Indexing
I've been working on a site for a while now, methodically building content and building trust and authority. Lately I've noticed that anything I publish there appears to be instantly indexed by Google, which surprises me. I haven't had this happen before so I'm curious. I'd be interested to hear the experience of others.
Technical SEO | | waynekolenchuk0