Populating large product database

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.
13 years ago
Hi Everyone,

my name is Alex, i'm new to nopCommerce, so please pardon me if i sound like a noob.

I've searched this forum and many others, I've tried suggestions that were already given to others doesn't work for me

at this point i need to populate over a million (1,000,000) products, the only information i really need is price, sku, weight.
I dont need image, or lots of description. By using administration panel to upload excel files would take months to upload, and by trial and error, i concluded that this way it takes max of just under 5000 items at a time, that's already with adjusted http request in web.config etc.

tried importing by using SQL tools, i created a product in admin panel, filled out as much info as i could, exported to a flat file Nop_Product and Nop_ProductVariant using SQL import/export tool. deleted the item and tried importing the same exported flat files. Nop_product imported succesfully, but Nop_ProductVariant does not import, error VS_ISBROKEN.

Although by importing only Nop_Product the item shows up in the admin panel, but i would still have to manually add variants price, weight, etc.

Also tried creating SSIS package with combinations of dataflows, derived, columns, and conversion, and all sorts of variations with no luck

I might be missing something obvious, but i just cant figure this out. if someone can please help me, i'm even willing to pay a reasonable amount :)

another question might be simple, how can i show weight variant on a product details page for customers to see.

Thank you guys in advance.
13 years ago
I def wouldn't upload 1,000,000 products via http.

I would recommend creating a small console or windows forms app that processes your data file. That way, you can ensure a consistent connection.

Also, nopCommerce doesn't really support that many products. It can be, but there a lot of small customizations that would have to be done. My company (Adept Web Studios) and another (Thought.co.uk) have been working with nopcommerce to get this support. We have been working with lucene.net to get this support. Here a few sites we have running with lucene.

www.datalineonline.com (adept web studios)
www.shortersclub.co.uk (thought)

My point is that you should be prepared to do a good amount of work to get this support.
13 years ago
Alex
Not sure why you had problems using SQL to pump data into your database. What is your data source? I did the same as you, and pumped a large amount of data into my database, and I still use the same SSIS package to perform this task when ever we receive an updated product list. I have coded my package to verify that the product is not already in the database, and if the product does already exist, select the info about that product. Since there may be only a small modification that I need to update.

There is more to your error than what you listed. The error will state exactly what the issue is.

I created a category, and added one product manually thru the admin tool. I then used SQL Server to tell me what tables were just modified, or that contained more than one row. I simply mapped table for table, and row for row where I needed the data to end up. I did end up with 5 separate data pumps that I combined into one single package. But the second will not execute until the first part has finished successfully.

I create another database to test with before I actually fire this off against the database. In this database, I removed all indexes, and determined the constraints and loaded those tables in order.

What are you using for your data source? I suggest that you pump that into a staging database, check the rows and then use the staging database. Also I changed the connection string to point at this database. Everything was fine and the site showed all products with all categories the way it was supposed too. I then loaded the database that I was going to use in the prod site. I took a backup and restored that database onto the prod server.

So I know that it can be done. Does not really matter the number of rows that you are pumping in or out.

Andrew SQLDBA
13 years ago
Thank you guys for your reply

@theonlylawislove, Lucene seems like a nice project for optimization, i would definitely be trying it out.
i know it will take a bit of tweaking and customizations, i'm ready to dive in, just need someone to point me to right direction.

@AndrewSQLDBA, I tried different sources, the original data is in was CSV file, 4 columns SKU, name, weight, price.

first i formatted it to fit the admin exported excel file, then split it into parts to upload through admin panel, but this method is not the most productive, although it works.

then i exported excel to a fit the table schema of Nop_Product and Nop_ProductVariable, so now i have 2 comma separated flat files, created SSIS package to upload in steps:

1. FlatFile source Nop_Product -> OLEDB Dest
2. FlatFile source Nop_ProductVariant  -> OLEDB Dest

so like you said one task waits for the other to finish.

first problem i ran into is collation issue, between unicode and asci, so i tried to match collation of source to destination by converting source, no luck

1. FlatFile source Nop_Product -> Conversion -> OLEDB Dest
2. FlatFile source Nop_ProductVariant  -> Conversion -> OLEDB Dest

Still same problem

Lastly i did what you mentioned, emptied out the database, created a product through admin, exported directly from the database table to a file, deleted the item in the database and tried to re upload it. Nop_Product was success but Nop_ProductVariant some kind of issue (all i can remember was VS_ISBROKEN), i'll post the exact error i received once i get home.

Again, i want to thank you guys taking time out to help me ))
13 years ago
Not sure why are you attempting to export the data from the current database, and attempting to use that as a map. It will not work that way. Simply open the database and look at the table(s) in question.

You should be pumping data from your comma delimited source, directly into the database. I am not home, or I would tell you the table.column names that you are needing.

To begin my testing, I simply performed a right-click the database that I used as staging in SSMS, Select [Task] --> [Import Data]. From there, simply follow the prompts. I selected the source and destination, mapped the columns to the columns in the source. This is where I could pump in only a few rows. I ran this, it pumped the data into the table, I repeated this process until I was able to get all the data into all the tables that I needed, in the correct order. I always saved the task as a SSIS Package.

After executing each in order, and verifying that the data was good. I then created a SSIS package from all the small packages used in the import task. I put them together in correct order, added some code to perform check to see if the data was already in the table. I used MERGE function, instead of all the other steps coded by hand to perform a SELECT, Compare, Update or Insert. If you read the SQL BOL, that is what MERGE can do for you, except it is much more simple.

I am not sure why you are trying to use OLEDB destination. Just use the import wizard and pump in only one column into a staging database that has only one table, then pump another column. Once you have that down pat. Truncate that table, and try pumping in all four columns into that table. If you can do that, then the rest of the code can be accomplished within a few queries from your staging database, to your Live Data database. Once you get that filled with data. Perform a full backup and restore that onto your production database.

I did not do this all in one day, it took me a couple times to get everything just perfect. But that was due to my data source. I found that some of the money values had a dollar sign and some did not. I have to cleanse the data on the way in because of that.

Andrew SQLDBA
13 years ago
Andrew thanks for the pointer, at this point i think the only problem i'm having is my data source not formatted correctly. because everything you did I've tried in different variations. I will have to dig through the data make sure it's flawless. I will post my findings and try to keep a log of what i'm doing maybe make a tutorial of some kind.

btw, is there anything better for csv editing other than excel?

again thank you for your input
13 years ago
djakay wrote:
Andrew thanks for the pointer, at this point i think the only problem i'm having is my data source not formatted correctly. because everything you did I've tried in different variations. I will have to dig through the data make sure it's flawless. I will post my findings and try to keep a log of what i'm doing maybe make a tutorial of some kind.

btw, is there anything better for csv editing other than excel?

again thank you for your input


Any good text editor (ultraedit, notepad++, ...) is better than Excel. Pay extra attention for decimal/thousands delimiters on price, also for commas in description.
I would write a simple app for validation, going thru 1 milion records manually ... no thanks.
13 years ago
You can actually use SSIS. There is a code feature that will move the bad row(s) into another table, so that you can see the problem data. At that point you can more easily correct the data and pump that into the table that you need.

Let the computer do the work for you. Do not attempt to do this by hand.

Andrew SQLDBA
12 years ago
AndrewSQLDBA wrote:
You can actually use SSIS. There is a code feature that will move the bad row(s) into another table, so that you can see the problem data. At that point you can more easily correct the data and pump that into the table that you need.

Let the computer do the work for you. Do not attempt to do this by hand.

Andrew SQLDBA
Hi Andrew, I'd be interest to see the mapping you have for Products/varients and Attributes, I have a few data loading projects that I need to do and was not looking forward to the mapping of the tables, this always gives me a head ache......Regards Rob
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.