Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
-
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated!
Thanks
-
I agree with Chris. With such large websites it would be advisable having a sitemap index and then splitting the index into various individual indexes such as Pages, Products, Categories, images, media, tags etc.
-
The easiest thing i can think of is to write a script that works with your dispatcher to create a site map. The format I would use is add the page and all of the "product images" on the page to the map and move to the next. At the same time I would use an auto increment variable to keep track of how many lines you have written. When you get around 50k, write out the name of the next site map file that the program will create and have them chained together this way.
-
That's a great help Chris, thank you! And thanks to all for your help!
-
Typically, a sitemap is going to include every page on the site. As Francesca said, each sitemap can be up to 50K urls and if you need multiple sitemaps then you create a sitemap index that points to the rest of the sitemaps.
-
Thanks for the feedback!
I will look into screamingfrog for sure.
@Lesley - we are using a custom platform (in house) so we don't have that functionality. The issue is that we have a lot of inventory (millions) of cars. We have built (and are releasing new functionality today) to provide internal links so that Google can crawl all the inventory easily (users can too :). My question about sitemaps has boiled down to this: Do we need to build the sitemap to include every single page (all the inventory) or do we provide a "map" so that google can find the top pages and then crawl the inventory from there. Again the site is bestride.com. If anyone wants to take a look at the site, that would be fantastic!
Thanks
-
Are you using a custom platform or an off the shelf e-commerce package? Most off the shelf packages actually have a module that can create a site map and a lot have it where you can cron it too.
-
Of course, you can also use the moz's crawl test report at http://pro.outdoorsrank.com/tools/crawl-test
-
Hi Kristin,
Each sitemap.xml can support maximum 50.000 URLs. So, If you have a site with more than 100K, It'd be better to create 2 or 3 o 4 etc sitemaps.xml in order to contain all URLs. Hope it is useful.
Kind regards!
Francesca
-
You can use screamingfrog to create your sitemap. You just need to license it for crawl more than 500 URI.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why My site pages getting video index viewport issue?
Hello, I have been publishing a good number of blogs on my site Flooring Flow. Though, there's been an error of the video viewport on some of my articles. I have tried fixing it but the error is still showing in Google Search Console. Can anyone help me fix it out?
Technical SEO | | mitty270 -
Create Page Titles from H1 using Yoast?
I'm working on a site that has 280 blog posts that have either been migrated from an old CMS site or created on the Dev version of the new WordPress site. We've written 280 unique meta descriptions so they don't truncate but it there a quick way I can export the current H1s and then import them into Yoast so they are set as the Page Titles? I've written unique Page Titles and meta descriptions for all the Service and Products page and just want a way to speed up the blog posts as their H1s are really good and what I would use as Page Titles anyway. Any help, greatly appreciated!
Technical SEO | | Marketing_Today0 -
Blog Page Titles - Page 1, Page 2 etc.
Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks
Technical SEO | | O2C0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Can dynamically translated pages hurt a site?
Hi all...looking for some insight pls...i have a site we have worked very hard on to get ranked well and it is doing well in search. The site has about 1000 pages and climbing and has about 50 of those pages in translated pages and are static pages with unique urls. I have had no problems here with duplicate content and that sort of thing and all pages were manually translated so no translation issues. We have been looking at software that can dynamically translate the complete site into a handfull of languages...lets say about 5. My problem here is these pages get produced dynamically and i have concerns that google will take issue with this aswell as the huge sudden influx of new urls....as now we could be looking at and increase of 5000 new urls. (which usually triggers an alarm) My feeling is that it could be risking the stability of the site that we have worked so hard for and maybe just stick with the already translated static pages. I am sure the process could be fine but fear a manual inspection and a slap on the wrist for having dynamically created content?? and also just risk a review trigger period. These days it is hard to know what could get you in "trouble" and my gut says keep it simple and as is and dont shake it up?? Am i being overly concerned? Would love to here from others who have tried similar changes and also those who have not due to similar "fear" thanks
Technical SEO | | nomad-2023230 -
ECommerce: Best Practice for expired product pages
I'm optimizing a pet supplies site (http://www.qualipet.ch/) and have a question about the best practice for expired product pages. We have thousands of products and hundreds of our offers just exist for a few months. Currently, when a product is no longer available, the site just returns a 404. Now I'm wondering what a better solution could be: 1. When a product disappears, a 301 redirect is established to the category page it in (i.e. leash would redirect to dog accessories). 2. After a product disappers, a customized 404 page appears, listing similar products (but the server returns a 404) I prefer solution 1, but am afraid that having hundreds of new redirects each month might look strange. But then again, returning lots of 404s to search engines is also not the best option. Do you know the best practice for large ecommerce sites where they have hundreds or even thousands of products that appear/disappear on a frequent basis? What should be done with those obsolete URLs?
Technical SEO | | zeepartner1 -
Javascript to manipulate Google's bounce rate and time on site?
I was referred to this "awesome" solution to high bounce rates. It is suppose to "fix" bounce rates and lower them through this simple script. When the bounce rate goes way down then rankings dramatically increase (interesting study but not my question). I don't know javascript but simply adding a script to the footer and watch everything fall into place seems a bit iffy to me. Can someone with experience in JS help me by explaining what this script does? I think it manipulates the reporting it does to GA but I'm not sure. It was supposed to be placed in the footer of the page and then sit back and watch the dollars fly in. 🙂
Technical SEO | | BenRWoodard1 -
Adding 'NoIndex Meta' to Prestashop Module & Search pages.
Hi Looking for a fix for the PrestaShop platform Look for the definitive answer on how to best stop the indexing of PrestaShop modules such as "send to a friend", "Best Sellers" and site search pages. We want to be able to add a meta noindex ()to pages ending in: /search?tag=ball&p=15 or /modules/sendtoafriend/sendtoafriend-form.php We already have in the robot text: Disallow: /search.php
Technical SEO | | reallyitsme
Disallow: /modules/ (Google seems to ignore these) But as a further tool we would like to incude the noindex to all these pages too to stop duplicated pages. I assume this needs to be in either the head.tpl or the .php file of each PrestaShop module.? Or is there a general site wide code fix to put in the metadata to apply' Noindex Meta' to certain files. Current meta code here: Please reply with where to add code and what the code should be. Thanks in advance.0