Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure On Site - Currently it's domain/product-name NOT domain/category/product name is this bad?
I have a eCommerce site and the site structure is domain/product-name rather than domain/product-category/product-name Do you think this will have a negative impact SEO Wise? I have seen that some of my individual product pages do get better rankings than my categories.
Technical SEO | | the-gate-films0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Can you have a /sitemap.xml and /sitemap.html on the same site?
Thanks in advance for any responses; we really appreciate the expertise of the SEOmoz community! My question: Since the file extensions are different, can a site have both a /sitemap.xml and /sitemap.html both siting at the root domain? For example, we've already put the html sitemap in place here: https://www.pioneermilitaryloans.com/sitemap Now, we're considering adding an XML sitemap. I know standard practice is to load it at the root (www.example.com/sitemap.xml), but am wondering if this will cause conflicts. I've been unable to find this topic addressed anywhere, or any real-life examples of sites currently doing this. What do you think?
Technical SEO | | PioneerServices0 -
How does Google find /feed/ at the end of all pages on my site?
Hi! In Google Webmaster Tools I find *.../feed/ as a 404 page in crawl errors. The problem is that none of these pages exist and they have no inbound links (except the start page). FYI, it´s a wordpress site. Example: www.mysite.com/subpage1/feed/ www.mysite.com/subpage2/feed/ www.mysite.com/subpage3/feed/ etc Does Google search for /feed/ by default or why do I keep getting these 404´s every day?
Technical SEO | | Vivamedia0 -
Googlebot Crawl Rate causing site slowdown
I am hearing from my IT department that Googlebot is causing as massive slowdown/crash our site. We get 3.5 to 4 million pageviews a month and add 70-100 new articles on the website each day. We provide daily stock research and marke analysis, so its all high quality relevant content. Here are the crawl stats from WMT: http://imgur.com/dyIbf I have not worked with a lot of high volume high traffic sites before, but these crawl stats do not seem to be out of line. My team is getting pressure from the sysadmins to slow down the crawl rate, or block some or all of the site from GoogleBot. Do these crawl stats seem in line with sites? Would slowing down crawl rates have a big effect on rankings? Thanks
Technical SEO | | SuperMikeLewis0 -
Do we need to manually submit a sitemap every time, or can we host it on our site as /sitemap and Google will see & crawl it?
I realized we don't have a sitemap in place, so we're going to get one built. Once we do, I'll submit it manually to Google via Webmaster tools. However, we have a very dynamic site with content constantly being added. Will I need to keep manually re-submitting the sitemap to Google? Or could we have the continually updating sitemap live on our site at /sitemap and the crawlers will just pick it up from there? I noticed this is what SEOmoz does at http://www.seomoz.org/sitemap.
Technical SEO | | askotzko0 -
Replace Header Text With Image
I have a static website that I would like to retheme. I have the mockup, and its spliced. The website holds nice rankings right now, and I want to keep them in place. The one thing that will change with this new design is the header will no longer be text, but instead an image. Is there a way to ensure googlebot still sees the H1 tag header exactly how it is now but use an image for the header instead? I dont want any blackhat tricks that will get me banned. Just wondering if there is a simple way to have googlebot see the header as text (not ALT img txt) so the site does not appear to have changed at all. (It hasnt, I only am changing the graphics and colors of background, and header image for better branding.
Technical SEO | | getbigyadig0