Skip to content

Instantly share code, notes, and snippets.

@donmccurdy
Created October 31, 2018 20:45
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donmccurdy/6cbcd8cee74301f92b4400b376efda1d to your computer and use it in GitHub Desktop.
Save donmccurdy/6cbcd8cee74301f92b4400b376efda1d to your computer and use it in GitHub Desktop.
example CSV transformation in Node.js
const fs = require('fs')
const csv = require('csv');
fs.createReadStream('data.csv')
.pipe(csv.parse({columns: true}))
.pipe(csv.transform((input) => {
return Object.assign({}, input, {
foo: input['foo'].toUpperCase()
});
}))
.pipe(csv.stringify({header: true}))
.pipe(fs.createWriteStream('./data-processed.csv'))
.on('finish', () => {
console.log('Done 🍻 ');
});
@aksharj
Copy link

aksharj commented Jul 1, 2020

thanks a lot for this, this is very helpful

@aksharj
Copy link

aksharj commented Jul 2, 2020

HI, i have tried this with source csv having around a million rows, but the resulting transformed csv has less data. Can you please let me know what could be the issue?

@donmccurdy
Copy link
Author

Sorry, I don't have the answer to that. You may want to ask on Stack Overflow, or file an issue on the 'csv' package, with enough information to reproduce the issue.

@donmccurdy
Copy link
Author

Actually, one idea — do you see the part where it does this?

Object.assign({}, input, {
  foo: input['foo'].toUpperCase()
});

That is making a copy of the object, and modifying a property of it. A simpler way to just make a copy would be:

const copy = {...original};

I think the CSV transform is going to reuse objects in order to get better performance, so if you are not making a copy in one of these ways, or otherwise expecting the objects to last outside of this pipe() chain, you may find fewer results because some objects have been overwritten and reused. If your data is not too big to fit in memory, you may want to make a copy of each object and just push it into an array.

@donmccurdy
Copy link
Author

For example:

const fs = require('fs')
const csv = require('csv');

const rows = [];

fs.createReadStream('upload_7_17.csv')
  .pipe(csv.parse({columns: true}))
  .pipe(csv.transform((input) => {
    rows.push({...input});
  }))
  .on('finish', () => {
    console.log(rows);
  });

Note that this requires the data to fit in memory, so it may not scale as well as streaming row by row.

@atharva-bhange
Copy link

How can i do async operation in transform ?

@donmccurdy
Copy link
Author

Hi sorry, I'm not involved with the csv package and can't provide support for it. This is just an example I've posted as a reference. Stack Overflow or the csv github repository may be better places to ask for support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment