Robots.txt blocked internal resources Wordpress

Mat_C

Hi all,

We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.

Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?

Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?

Thanks for your thoughts!

Mat_C

Thanks for the answer!

Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073

However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.

JordanLowry

I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:

User-agent: *
Disallow: /wp-admin/

Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.

I hope that helps. Let me know how that works out for you!

Mat_C

Thanks for the clear answer.

I've changed the robots.txt to:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

This should avoid problems with not indexing (parts of) cached content.

Or should I leave all the Disallows out?

JordanLowry

Hey there --

Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.

However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.

Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.

So, yeah this might have some impact on your SEO.

Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.

So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.

Hope this helps some.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt blocked internal resources Wordpress

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

Block session id URLs with robots.txt

SEO Best Practices regarding Robots.txt disallow

Rankings rise after improving internal linking - then drop again

Block in robots.txt instead of using canonical?

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

How to Disallow Tag Pages With Robot.txt

Increasing Internal Links But Avoiding a Link Farm

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved