Skip to content

Instantly share code, notes, and snippets.

@amishshah
Created February 12, 2017 16:14
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save amishshah/678d7600c450181a94e6481fee514208 to your computer and use it in GitHub Desktop.
Save amishshah/678d7600c450181a94e6481fee514208 to your computer and use it in GitHub Desktop.
Rough script to extract images from HTTP Archive (HAR) files
const fs = require('fs');
const file = JSON.parse(fs.readFileSync('./dump.har')).log;
const targetMimeType = 'image/jpeg';
let count = 1;
for (const entry of file.entries) {
if (entry.response.content.mimeType === targetMimeType) {
// ensure output directory exists before running!
fs.writeFileSync(`output/${count}.png`, new Buffer(entry.response.content.text, 'base64'), 'binary');
count++;
}
}
console.log(`Grabbed ${count} files`);
@ntcho
Copy link

ntcho commented Jun 23, 2020

Thanks for the script.

But it ran into an error saying TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object. from Buffer. So I edited the script to make it work. Since 'base64' is a valid type for writeFileSync (from this SO answer), we can just use 'base64' without creating Buffer object.

const fs = require('fs');
const file = JSON.parse(fs.readFileSync('./dump.har')).log;
const targetMimeType = 'image/jpeg';

let count = 0;
for (const entry of file.entries) {
  if (entry.response.content.mimeType === targetMimeType) {
    // ensure output directory exists before running!
    fs.writeFileSync(`output/${count}.png`, entry.response.content.text, 'base64', function(err) {
      console.log(err);
    });
    count++;
  }
}

console.log(`Grabbed ${count} files`);

And to execute, I've ran the following in the PowerShell console.

node .\har-extract.js

@simioni
Copy link

simioni commented Aug 25, 2021

Thanks for the work and the time to share this.

I've made some further improvements to make the script more usable while handling large archives with thousands of files.

It saves the files concurrently, and displays a text progress bar to the console while doing so. I've also made it so it keeps the original file names from the req URL, and also creates the output dir first if it does not exist yet.

const fs = require('fs');
const fsAsync = require('fs').promises;
const targetMimeType = 'image/jpeg';
const file = JSON.parse(fs.readFileSync('./dump.har')).log;
const dir = './output';

if (!fs.existsSync(dir)){
  fs.mkdirSync(dir);
}

// renders a text based progress bar to the console
const width = 30;
const displayProgress = (cur, total) => {
  const pct = Math.round(cur / total * 100) / 100;
  const done = pct * width;
  const remaining = width - done;
  const filled = '\u2588'.repeat(done);
  const empty = '\u2591'.repeat(remaining);
  // '\r' clears the current line in stout
  process.stdout.write(`\r${filled}${empty} | ${Math.ceil(pct * 100)}% | ${cur} of ${total} images saved.`);
}

const promises = [];
let started = 0;
let finished = 0;
for (const entry of file.entries) {
  if (entry.response.content.mimeType === targetMimeType) {
    const pathParts = new URL(entry.request.url).pathname.split('/');
    const filename = pathParts.pop() || pathParts.pop(); // Pop twice to avoid potential trailing slash
    promises.push(fsAsync.writeFile(`${dir}/${filename}`, entry.response.content.text, 'base64')
      .then(() => {
        finished++;
        displayProgress(finished, started);
      })
      .catch(err => {
        console.log(err)
      })
    );
    started++;
  }
}

Promise.all(promises).then(() => {
  process.stdout.write(`\n\u2713 Done.`);
});

No external dependencies. Just run it normally:

node .\har-extract.js

@cgatete
Copy link

cgatete commented Mar 14, 2022

Thanks.But the image genereated can't be read.Its corrupt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment