Skip to content

Instantly share code, notes, and snippets.

@jfoclpf
Last active September 5, 2022 13:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jfoclpf/325bb925fedf50a9cf96bd00d99e2243 to your computer and use it in GitHub Desktop.
Save jfoclpf/325bb925fedf50a9cf96bd00d99e2243 to your computer and use it in GitHub Desktop.
Node.js multi-thread for data processing and storing results in file

NodeJS is a powerful tool for backend developers, but you must be aware of multi-core processing in order to maximize the potential of your CPU. This NodeJS multi-core feature is mostly used for webservers and NodeJS has already out of the box the cluster module thereto. Although NodeJS has also out of the box the module threads used for CPU intensive jobs, it's not so easy to deal with.

Let's create a project that will test a single-thread and a multi-thread application with the task of running N iterations, each iteration doing heavy data processing and writing some random data to a file, each iteration having a different file.

Create the project:

mkdir test-threads && cd test-threads
npm init -y

Install dependencies and create dist/ directory

npm install async progress piscina command-line-args
mkdir dist

Create the file index.js at the root of the project directory

const path = require('path')
const async = require('async')
const ProgressBar = require('progress')
const Piscina = require('piscina')
const commandLineArgs = require('command-line-args')

console.time('main')

const worker = require(path.resolve(__dirname, 'worker.js'))
const piscina = new Piscina({
  filename: path.resolve(__dirname, 'worker.js')
})

const argvOptions = commandLineArgs([
  { name: 'multi-thread', type: Boolean },
  { name: 'iterations', alias: 'i', type: Number }
])

const files = []
for (let i=0; i < (argvOptions.iterations || 1000); i++) {
  files.push(path.join(__dirname, 'dist', i + '.txt'))
}

var bar = new ProgressBar(':bar', { total: files.length, width: 80 });

async.each(files, function (file, cb) {
  (async function() {
    try {
      const err = argvOptions['multi-thread'] ? (await piscina.run(file)) : worker(file)
      bar.tick()
      if (err) cb(Error(err)); else cb()
    } catch(err) {
      cb(Error(err))
    }
  })();
}, (err) => {
  if (err) {
    console.error('There was an error: ', err)
    process.exitCode = 1
  } else {
    bar.terminate()
    console.log('Success')
    console.timeEnd('main')
    process.exitCode = 0
  }
})

Create now worker.js also at the root of the project directory

const fs = require('fs')

// some CPU intensive function
// the higher is baseNumber, the higher is the time elapsed
function mySlowFunction(baseNumber) {
  let result = 0
  for (var i = Math.pow(baseNumber, 7); i >= 0; i--) {		
    result += Math.atan(i) * Math.tan(i)
  }
}

module.exports = (file) => {
  try {
    mySlowFunction(parseInt(Math.random() * 10 + 1))
    fs.writeFileSync(file, Math.random().toString())
    return null
  } catch (e) {
    return Error(e)
  }
}

Now just run on single thread and check time elapsed, for 1000 and 10000 iterations (one iteration equals to data processing and file creation)

node index.js -i 1000
node index.js -i 10000

Now compare with the great advantage of multi-thread

node index.js --multi-thread -i 1000
node index.js --multi-thread -i 10000

With the test I did (16 cores CPU), the difference is huge, it went with 1000 iterations from 1:27.061 (m:ss.mmm) for single thread to 8.884s with multi-thread. Check also the files inside dist/ to be sure they were created correctly.

@suguanYang
Copy link

How about change the default worker pool nums of nodejs, since the file I/O will use it

@jfoclpf
Copy link
Author

jfoclpf commented Sep 5, 2022

interesting @suguanYang , how do we do that in the code?
what exactly do you suggest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment