-
-
Save donmccurdy/6cbcd8cee74301f92b4400b376efda1d to your computer and use it in GitHub Desktop.
const fs = require('fs') | |
const csv = require('csv'); | |
fs.createReadStream('data.csv') | |
.pipe(csv.parse({columns: true})) | |
.pipe(csv.transform((input) => { | |
return Object.assign({}, input, { | |
foo: input['foo'].toUpperCase() | |
}); | |
})) | |
.pipe(csv.stringify({header: true})) | |
.pipe(fs.createWriteStream('./data-processed.csv')) | |
.on('finish', () => { | |
console.log('Done 🍻 '); | |
}); |
HI, i have tried this with source csv having around a million rows, but the resulting transformed csv has less data. Can you please let me know what could be the issue?
Sorry, I don't have the answer to that. You may want to ask on Stack Overflow, or file an issue on the 'csv' package, with enough information to reproduce the issue.
Actually, one idea — do you see the part where it does this?
Object.assign({}, input, {
foo: input['foo'].toUpperCase()
});
That is making a copy of the object, and modifying a property of it. A simpler way to just make a copy would be:
const copy = {...original};
I think the CSV transform is going to reuse objects in order to get better performance, so if you are not making a copy in one of these ways, or otherwise expecting the objects to last outside of this pipe() chain, you may find fewer results because some objects have been overwritten and reused. If your data is not too big to fit in memory, you may want to make a copy of each object and just push it into an array.
For example:
const fs = require('fs')
const csv = require('csv');
const rows = [];
fs.createReadStream('upload_7_17.csv')
.pipe(csv.parse({columns: true}))
.pipe(csv.transform((input) => {
rows.push({...input});
}))
.on('finish', () => {
console.log(rows);
});
Note that this requires the data to fit in memory, so it may not scale as well as streaming row by row.
How can i do async operation in transform ?
Hi sorry, I'm not involved with the csv
package and can't provide support for it. This is just an example I've posted as a reference. Stack Overflow or the csv
github repository may be better places to ask for support.
thanks a lot for this, this is very helpful