Skip to content

Instantly share code, notes, and snippets.

@mbbx6spp
Last active April 20, 2017 03:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mbbx6spp/9e7582a2316515d45614f37a695e5adf to your computer and use it in GitHub Desktop.
Save mbbx6spp/9e7582a2316515d45614f37a695e5adf to your computer and use it in GitHub Desktop.
Node JS streams reproduction of problem in simplest form
node_modules

Node Streams

Requirements

Here is the process we want to implement using Node streams:

  1. Read in a file to a stream.
  2. Parse a CSV file from source file stream.
  3. Transform the data to be hashed on one field.
  4. Write to a table in sqlite database.

Problem

The problem is that the hasher transform stream is emitting/pushing the same hash hex value each time.

I don't know the Node APIs well enough to figure out how to convert the given chunk into a usable string I can pass into the SHA-256 Hash functions.

Thoughts?

Run Example

To run: npm install && node streams.js

Output

hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
hash: b28c94b2195c8ed259f0b415aaee3f39b0b2920a4537611499fa044956917a21
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }
{ hash: <Buffer 62 32 38 63 39 34 62 32 31 39 35 63 38 65 64 32 35 39 66 30 62 34 31 35 61 61 65 65 33 66 33 39 62 30 62 32 39 32 30 61 34 35 33 37 36 31 31 34 39 39 ... > }

Environment

Yes, I know this is an ancient version:

$ node --version
v4.4.6

$ npm version
{ 'dk-streams': '0.1.0',
  npm: '2.15.5',
  ares: '1.10.1-DEV',
  http_parser: '2.7.0',
  modules: '46',
  node: '4.4.6',
  openssl: '1.0.2j',
  uv: '1.9.1',
  v8: '4.5.103.36',
  zlib: '1.2.8' }
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
myname0@mydomain.tld
myname1@mydomain.tld
myname2@mydomain.tld
myname3@mydomain.tld
myname4@mydomain.tld
myname5@mydomain.tld
myname6@mydomain.tld
myname7@mydomain.tld
myname8@mydomain.tld
myname9@mydomain.tld
myname10@mydomain.tld
myname11@mydomain.tld
{ "name": "dk-streams"
, "version": "0.1.0"
, "description": "Testing streams for DK uses."
, "bin": { "dk-stream-test": "./streams.js" }
, "dependencies":
{ "csv": "~1.1.1"
, "sqlite3": "~3.1.8"
}
}
/*
* Here is the process we want to describe using Node streams:
* 0. Read in a file to a stream.
* 1. Parse a CSV file from source file stream.
* 2. Transform the data to be hashed on one field.
* 3. Write to a table in sqlite database.
*/
// "package" imports
const fs = require('fs');
const process = require('process');
const crypto = require('crypto');
const csv = require('csv');
const stream = require('stream');
const sqlite3 = require('sqlite3').verbose();
// constants
const source = fs.createReadStream(__dirname + '/emails.csv')
// setup global state (OMG, I know, kill me now)
const db = new sqlite3.Database(':memory:');
const parser = csv.parse({
encoding: 'utf8',
delimiter: ',',
columns: ['email'],
autoparse: true
});
db.serialize(function() {
db.run("CREATE TABLE comp_email (hash TEXT)");
});
const insert = db.prepare("INSERT INTO comp_email VALUES (?)");
const hasher = new stream.Transform();
hasher._transform = function (chunk, enc, next) {
var data = chunk.toString('utf8');
var hashedEmail = crypto
.createHash('sha256')
.update(data, enc)
.digest('hex');
console.log('hash: %s', hashedEmail);
this.push(hashedEmail);
next();
};
// Idea: decorate Writable with parameters to decide which table to write to?
const sqlwriter = new stream.Writable();
sqlwriter._write = function (chunk, enc, next) {
insert.run(chunk);
next();
};
source.pipe(parser).pipe(hasher).pipe(sqlwriter);
db.serialize(function() {
db.each("SELECT hash FROM comp_email", function (err, row) {
if (err) { console.error(err); }
if (row) { console.log(row); }
});
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment