Skip to content

Instantly share code, notes, and snippets.

@loilo
Last active December 2, 2022 03:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save loilo/0f778e0f039e7a95a628f18871c814d4 to your computer and use it in GitHub Desktop.
Save loilo/0f778e0f039e7a95a628f18871c814d4 to your computer and use it in GitHub Desktop.
Compare File Lists in Node.js

Compare File Lists in Node.js

I encouter a certain situation pretty frequently: I have a known set of files on my file system which I need to scan to compare them against the same (or a similar) set of files at a later point in time (i.e. "Did any of the files in this list change?" or "Did any of the files matching these criteria change?").

A quick example:

import glob from 'glob' // from the `glob` package

const images_now = glob.sync('/home/images/**/*.jpg')

// Later (maybe even days or weeks later)

const images_later = glob.sync('/home/images/**/*.jpg')

// Do the files in images_now and images_later differ in any way?

The script below tackles this problem in a way as read-performant as possible (i.e. scan once, compare potentially many times). It checks if there are any differences between an earlier file list and the current file system and returns the first difference it finds.

Note: The script uses ES modules (transpiling most likely needed) and needs the rev-hash package to be installed.

It works as follows:

Scan the Files

We need to create a profile of the files we're interested in:

import { profile } from './compare-files.js'

profile(glob.sync('/home/images/**/*.jpg'))

Calling profile() returns a plain JavaScript object. You may serialize and store it somewhere for later access.

Check for Changes

Comparing is the other way around: Read the stored profile and use diff():

import { diff } from './compare-files.js'

diff(
  glob.sync('/home/images/**/*.jpg'),
  referenceData // The object we obtained through profile() earlier
)

The diff() returns either null if no changes have occurred, or an object with a type (which is either added, removed or changed) and a file property pointing to the (first) according file.

For example:

{ type: 'added', file: 'new.jpg' }
// or
{ type: 'removed', file: 'no-longer-exists.jpg' }
// or
{ type: 'changed', file: 'photoshopped.jpg' }

Note: If a difference is found, don't forget to invalidate the old profile and create a new one if needed.

import revHash from 'rev-hash'
import { readFileSync, statSync } from 'fs'
import { resolve } from 'path'
/**
* Calculate the rev hash for a file
*
* @param {string} file The path to the file to hash
* @returns {string}
*/
function hashFile (file) {
return revHash(readFileSync(file))
}
/**
* Calculate the rev hashes for a list of files
*
* @generator
* @param {string[]} files The files to hash
* @param {string} base A base path prepended to each file (ignored if a file path is absolute)
* @yields {string}
*/
function* hashFileList (files, base = '/') {
for (const file of files) {
yield [ file, hashFile(resolve(base, file)) ]
}
}
/**
* Profile a list of files to compare against a later state
*
* @param {string[]} files The files to profile
* @param {string} base A base path prepended to each file (ignored if a file path is absolute)
* @return {object}
*/
export function profile (files, base = '/') {
const map = {}
for (const [ file, hash ] of hashFileList(files, base)) {
map[file] = [
statSync(resolve(base, file)).size,
hash
]
}
return map
}
/**
*
* @param {string[]} files The files to profile
* @param {hashes} object A the file list characteristics generated earlier by profile()
* @param {string} base A base path prepended to each file (ignored if a file path is absolute)
* @return {object}
*/
export function diff (files, hashes, base = '/') {
// Detect new files
for (const file of files) {
if (!(file in hashes)) {
return { type: 'added', file }
}
}
// Detect missing files
for (const file in hashes) {
if (!files.includes(file)) {
return { type: 'removed', file }
}
}
// Check sizes
for (const file of files) {
if (statSync(resolve(base, file)).size !== hashes[file][0]) {
return { type: 'changed', file }
}
}
// Check contents
for (const [ file, hash ] of hashFileList(files, base)) {
if (hash !== hashes[file][1]) {
return { type: 'changed', file }
}
}
return null
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment