Sitemap for 2 million+ URLS

2 周 前
I have a website running on Nopcommerce 4.5 which has approx. 2 million Product/Category/Blog URLS and more coming soon.

The sitemap is very slow when opened looking at the fact it creates it in runtime. I also add approx. 200k+ custom URLS from a text file into the sitemap.

Keeping the above in mind should I still be using the built-in sitemap functionality?

Is it normal for sitemap to take such a long time? example the sitemap can go up to
sitemap-135.xml (I add 10k URL per sitemap). Will this be an issue with google indexing the URLS, please let me know if I should follow some other route.

Thank you
1 周 前
Taking a quick look at the code, although I see that it does support creating multiple sitemap-N.xml files, it's not clear to me how robots (like Google) know how to reference those N files.  I don't see any indication of generating a sitemap_indexc.xml file.

And yes, " it creates it in runtime" - it does look like the response to sitemap-N.xml URL is generated on the fly.   It would be nice if there was a "cache" where those files could be stored on the disk and then generated less frequently.  You can submit a feature request here
https://github.com/nopSolutions/nopCommerce/issues

In the meantime, you could manually generate / cache those files.

Manually create a Subfolder in wwwroot for example, \sitemaps.
Place all your sitemap files into the sitemaps subfolder.
Verify public access by navigating to the URL in your browser.
   For example: http://example.com/sitemaps/sitemap-1.xml
Manually generate sitemap_index.xml (as per spec)
     http://example.com/sitemaps/sitemap_index.xml
Modify robots.txt (or Submit in Google Search Console)
  Sitemap: http://example.com/sitemaps/sitemap_index.xml
  You can add above in Administration >  General Settings > robots.txt section >> "Additions rules" field.
1 周 前
New York wrote:

it's not clear to me how robots (like Google) know how to reference those N files.  I don't see any indication of generating a sitemap_indexc.xml file.


The sitemap index file is generated in this line, its content is described in the documentation. It allows robots to know how to reference other sitemap files.

New York wrote:

It would be nice if there was a "cache" where those files could be stored on the disk and then generated less frequently.


This already has been implemented in 4.60, see details here.
1 周 前
Hey,

I did a lot of custom changes with the code so jumping to 4.6 is not an option for me at the moment.