Working with CSV files on a web server.

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Working with CSV files on a web server.

Postby ajroach42 » Wed Sep 22, 2010 7:44 pm UTC

I'm administering a website running Joomla and Virtuemart.

I have a CSV file ( approximately 800K rows X 22 Columns) that I'd like to import into my product database. The CSV file is a 130MB catalog file, full of all the products my distributor has for sale. This needs to happen about once every six months, and can be initiated by hand. Virtuemart includes support for importing CSV files (through a module called CSVI), but it is buggy at best. Sometimes I can get CSVI to work for very small files (less than 50 rows), most of the time it gives me errors that are both unhelpful and nonsensical.

I'm a student, studying programing and web development. As of now, the majority of my experience is with Bash and PHP. I don't have much experience with mySQL (which is what the database for Virtuemart is done in.) I'm trying to write a script that will load a line of the CSV file into memory, and pull the relevant information into the database. It would be easier if all the information was relevant the CSV file had it's fields in the same order as the database. I don't have the knowledge to rearrange the CSV file from within PHP or from Bash.

I would imagine that doing this from within PHP would be bad. PHP isn't designed to work with such large files, is it?

My natural assumption was to do the whole operation from a shell script, but I wouldn't even know where to begin. I've never accessed a database from the command line before, it's always through phpmyadmin or directly from a PHP script.

I've been told that Perl is good for dealing with large amounts of text, but I've never used Perl before. I'm willing to learn, if that's the best way to handle the situation.


I'm looking for suggestions, pointers. I don't expect anyone to write the code for me, just a nudge in the right direction or some advice from someone who has done this before.
Last edited by ajroach42 on Thu Sep 23, 2010 1:08 am UTC, edited 1 time in total.

Indefinity
Posts: 87
Joined: Thu Dec 24, 2009 7:49 pm UTC

Re: Working with CSV files on a web server.

Postby Indefinity » Thu Sep 23, 2010 1:08 am UTC

There's a function PHP called "explode" http://php.net/manual/en/function.explode.php

This will split up a string into an array of smaller strings. You can use that to split up a line at the commas, and then each index in the array corresponds to a different value in the row. There are examples near the bottom of the manual page that I pasted above that show how to do this in more detail.

I've never done any file I/O with PHP, so you'll have to look elsewhere for how to read a file line-by-line.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 1:10 am UTC

I read about "explode" but I didn't understand what it did.

Your explanation was far more satisfactory, and I can already see the script forming. Slowly (and painfully) I am starting to understand how this would work.

User avatar
Area Man
Posts: 256
Joined: Thu Dec 25, 2008 8:08 pm UTC
Location: Local

Re: Working with CSV files on a web server.

Postby Area Man » Thu Sep 23, 2010 1:26 am UTC

You can import csv directly using phpMyAdmin. You may have to set the php timeout limit for for such a large file.
Bisquick boxes are a dead medium.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 2:49 am UTC

Area Man wrote:You can import csv directly using phpMyAdmin. You may have to set the php timeout limit for for such a large file.


If I'm going to import the file through phpmyadmin, I'm going to have to find a way to rearrange the fields in the CSV or learn a more about mySQL. The CSV file, as it comes from our distributor, has 22 columns. They are in entirely the wrong order, and some of them are completely irrelevant.

What would be the proper way to go about rearranging everything? 180MB of CSV isn't something that you can just pop into Excel.

User avatar
Area Man
Posts: 256
Joined: Thu Dec 25, 2008 8:08 pm UTC
Location: Local

Re: Working with CSV files on a web server.

Postby Area Man » Thu Sep 23, 2010 3:15 am UTC

(ed. forgot to mention, you'll have to increase the upload limit, too, or split the csv into chunks on newlines)

ed2. just had a quick look, csv import has a "Column names" field, if it does what I think it does, ignore the rest.

Create a dummy table with column order matching the csv, and names matching the real import table.
Once that's done, export the dummy/temp table as sql which you can import into the real db.
(you should, of course, take 5 minutes to test this on a small scale, but I believe it will work.)
Bisquick boxes are a dead medium.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6569
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Working with CSV files on a web server.

Postby Thesh » Thu Sep 23, 2010 3:31 am UTC

You have to be careful about what you use for parsing csv files. Just using a split function doesn't work if the text in the csv file contains commas. For this reason, text may be enclosed in quotes. If the text is enclosed in quotes, then two quotes will be used to escape a quote. You would probably have to write your own code to take this into account or find example code (which I am sure you can find through googling). I would agree that the best method is to just use existing tools to import into a database.

I wouldn't do this on a webserver either; run the script to rearrange/fix on your local machine. You can then upload the complete CSV through PHPMyAdmin.
Summum ius, summa iniuria.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 3:55 am UTC

It's a pipe | delimited file, so I don't have to worry about the coma problem, do I?

I'll try everything using the column header method in the morning, starting with a small test file and report back here. I've already upped my max execution and memory constraints because I tend to use PHP to do extraneous heavy lifting.

Since it's going to take so long to execute, could I bump up it's nice number? I am on a shared server, but they are pretty liberal with their terms so long as you don't hog the CPU all day.

According to the email I just got, the file I have is one of 6 that I'll be needing to import. I guess I'm setting aside a nice big chunk of time for it all to execute.


I read that PHP is bad at creating CSV files. There is a project, hosted on google code I think, that aims to fix this. I am, eventually, going to be creating and editing CSV files from the database within PHP scripts. Does anyone have any experience or helpful pointers for that? I've been doing a lot of reading, trying to come up with the best way to do it, but sometimes a fresh perspective (or a bit of experience) really helps

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6569
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Working with CSV files on a web server.

Postby Thesh » Thu Sep 23, 2010 5:21 am UTC

ajroach42 wrote:It's a pipe | delimited file, so I don't have to worry about the coma problem, do I?

I'll try everything using the column header method in the morning, starting with a small test file and report back here. I've already upped my max execution and memory constraints because I tend to use PHP to do extraneous heavy lifting.

Since it's going to take so long to execute, could I bump up it's nice number? I am on a shared server, but they are pretty liberal with their terms so long as you don't hog the CPU all day.

According to the email I just got, the file I have is one of 6 that I'll be needing to import. I guess I'm setting aside a nice big chunk of time for it all to execute.


I read that PHP is bad at creating CSV files. There is a project, hosted on google code I think, that aims to fix this. I am, eventually, going to be creating and editing CSV files from the database within PHP scripts. Does anyone have any experience or helpful pointers for that? I've been doing a lot of reading, trying to come up with the best way to do it, but sometimes a fresh perspective (or a bit of experience) really helps


You don't have to worry about the commas, but one other thing you may have to take into account is newlines (depending on the file, it may or may not be a problem). I would personally write a generic function to parse any delimited file; it hardly requires any extra code, and you don't have to worry about the delimiter appearing in the text. Whether or not the quotes are a concern depends on whether or not the vendor puts them in there. When I have written code to output CSV files in the past, I always put anything that is text in quotes just to be safe. CPU isn't the only concern, another problem can be memory; if the file is large, you should do something like read and insert 100 rows at a time instead of trying to load the entire thing into memory.

Outputting CSV files is the easiest thing you can do, it's parsing CSV files that is the problem. PHP's built in CSV stuff does suck, and won't handle most CSV files with text in them.
Summum ius, summa iniuria.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 6:10 am UTC

Thesh wrote:... if the file is large, you should do something like read and insert 100 rows at a time instead of trying to load the entire thing into memory.

Outputting CSV files is the easiest thing you can do, it's parsing CSV files that is the problem. PHP's built in CSV stuff does suck, and won't handle most CSV files with text in them.


My original plan was to just split up the CSV file into a few chunks by hand, but the file is so large that I don't think thats going to be an option. How would I go about reading x rows at a time into memory? If you can point me toward the proper function, I'll work out the details.

I tried doing things line by line, deleting the first line, and then reading in the next line of the file. I waisted a lot of time coding up a script to do that, and I can't even remember which function I used then. It didn't work well, and was slow, and took a lot more memory than I expected it to. (In retrospect, I was probably reading each line into a new array. I don't have the code in front of me, but I'll check in the morning.)

I'll look at everything with fresh eyes, and take a gander at the few pages of code I've got, and the options that have been discussed here, and see what I can do.


Thanks for all the help everyone! You've been fantastic so far.

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Working with CSV files on a web server.

Postby Steax » Thu Sep 23, 2010 6:24 am UTC

Keep in mind that you can always split the CSV file at any newline, if they're getting too big to manage on your server.
In Minecraft, I use the username Rirez.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 2:44 pm UTC

Steax wrote:Keep in mind that you can always split the CSV file at any newline, if they're getting too big to manage on your server.

Good point.

I can split the file into pieces automatically, rather than doing it by hand. Then I can do up a simple script to import everything based on the column headers.

Again, everyone, you've been fantastic. Thanks! I know where to come if I run into any issues.

ajroach42
Posts: 7
Joined: Wed Sep 22, 2010 7:25 pm UTC

Re: Working with CSV files on a web server.

Postby ajroach42 » Thu Sep 23, 2010 5:21 pm UTC

Code: Select all

awk -F "|" '{print $10 "-" $9 "|" $11 "|" "<p> artist:",$10,"</p> <p> album :",$9,"</p>" "|" $12 "|" $1 "|" $14 "|" $16 "|" $17 "|" $21 "|" $22 "|" $6}' Documents/cat.csv >>upload.csv



That did the trick for grabbing out just the columns I needed. I'm importing the file now, wish me luck!


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 6 guests