Skip to content

Instantly share code, notes, and snippets.

@timoxley
Last active August 6, 2023 17:13
Show Gist options
  • Save timoxley/0cb5053dec107499c8aabad8dfd651ea to your computer and use it in GitHub Desktop.
Save timoxley/0cb5053dec107499c8aabad8dfd651ea to your computer and use it in GitHub Desktop.
async/await recursive fs readdir
import { join } from 'path'
import { readdir, stat } from 'fs-promise'
async function rreaddir (dir, allFiles = []) {
const files = (await readdir(dir)).map(f => join(dir, f))
allFiles.push(...files)
await Promise.all(files.map(async f => (
(await stat(f)).isDirectory() && rreaddir(f, allFiles)
)))
return allFiles
}
// straight synchronous version for perf comparison
import { join } from 'path'
import fs from 'fs'
function rreaddirSync (dir, allFiles = []) {
const files = fs.readdirSync(dir).map(f => join(dir, f))
allFiles.push(...files)
files.forEach(f => {
fs.statSync(f).isDirectory() && rreaddirSync(f, allFiles)
})
return allFiles
}
import assert from 'assert'
console.time('parallel')
rreaddir(__dirname).then(pFiles => {
console.timeEnd('parallel')
console.time('sync')
const sFiles = rreaddirSync(__dirname)
console.timeEnd('sync')
assert.deepEqual(pFiles.sort(alpha), sFiles.sort(alpha))
})
function alpha (a, b) {
return a.localeCompare(b)
}
parallel: 313.899ms
sync: 80.259ms
@puckey
Copy link

puckey commented Feb 22, 2020

import { basename, join } from 'path';
import { readdir, stat, Stats } from 'fs-extra';

const recursiveReadDir = async (
  dir: string,
  checkShouldIgnore?: (uri: string, stats: Stats) => boolean,
  concurrency = 100
): Promise<string[]> => {
  const collected: string[] = [];
  const queue = [dir];
  const visit = async (file: string) => {
    const stats = await stat(file);
    if (checkShouldIgnore?.(file, stats)) return;
    if (stats.isDirectory()) {
      queue.push(...(await readdir(file)).map(f => join(file, f)));
      return;
    }
    collected.push(file);
  };
  while (queue.length) {
    await Promise.all(
      queue.splice(0, concurrency).map(visit)
    );
  }
  return collected;
};

☝️ uses typescript and introduces optional ignore and concurrency parameters and leaves out directories in the resulting file list

Because I ran into memory problems recursively reading a .git directory using @glebec's version, I came up with the queue based version above. Will run some benchmarks on this soon.

@glebec
Copy link

glebec commented Feb 22, 2020

Blast from the past! For context, this gist was part of a question posted on a Slack channel about why some async code was slower than the corresponding sync code. I didn’t intend my reply to be used in anyone’s prod. 😅 That being said, good idea to introduce a concurrency limit!

@puckey
Copy link

puckey commented Feb 22, 2020

❤️

@puckey
Copy link

puckey commented Feb 22, 2020

Seemingly not much difference in performance, which is good:

queue: 935.704ms
sync: 1754.959ms
parallel: 945.062ms

Interesting to see that the parallel version is now much faster than the original sync version. Tested using Node v12.13.0 with slightly adjusted versions of parallel and sync (because my version also leaves out directories in the resulting file list).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment