Skip to content

Instantly share code, notes, and snippets.

@tehnrd
Last active February 2, 2023 17:15
Show Gist options
  • Save tehnrd/0456e8312f88dcc66a8e1cb1eaab1d98 to your computer and use it in GitHub Desktop.
Save tehnrd/0456e8312f88dcc66a8e1cb1eaab1d98 to your computer and use it in GitHub Desktop.
Node.js script to count number of unique values in a very large CSV file
const Papa = require("papaparse");
const fs = require("fs");
const FILE_NAME = "leads_modified_last7_days.csv";
const COLUMN_NAME = "LastModifiedById";
const file = fs.createReadStream(FILE_NAME);
const createdByIdCount = {};
let count = 0;
Papa.parse(file, {
header: true,
step: function (row) {
count++;
if (!createdByIdCount[row.data[COLUMN_NAME]]) {
createdByIdCount[row.data[COLUMN_NAME]] = 0;
}
createdByIdCount[row.data[COLUMN_NAME]] += 1;
// little progress output
if (count % 250000 == 0) {
console.log(count + " rows processed");
}
},
complete: function () {
console.log(createdByIdCount);
},
});
// package.json
// {
// "dependencies": {
// "papaparse": "5.3.2"
// }
// }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment