Skip to content

Instantly share code, notes, and snippets.

@dmr121
Last active February 13, 2023 18:11
Show Gist options
  • Save dmr121/7235539501bdee8e610a8646e82fe845 to your computer and use it in GitHub Desktop.
Save dmr121/7235539501bdee8e610a8646e82fe845 to your computer and use it in GitHub Desktop.
Download .tar.gz file from remote URL and convert the contents to JSON data
import axios from "axios";
import tar from "tar-stream";
import { ungzip } from "node-gzip";
// The data being fetched is assumed to be a .tar.gz archive with one .json file inside
export const convertTarGZFileToJSON = async (url: string): Promise<any[]> => {
return new Promise((resolve, reject) => {
axios(url, {
responseType: "arraybuffer",
})
.then((fileResponse) => {
ungzip(fileResponse.data)
.then((unzipped) => {
const pack = tar.pack();
const extract = tar.extract();
pack.entry({ name: "data.json" }, unzipped.toString()); // The name is arbitrary, you must choose a temporary name for the file
extract.on("entry", function (header, stream, next) {
stream.on("data", function (chunk) {
if (header.name == "data.json") {
const data = "" + chunk; // convert to string
const startOfJSON = data.indexOf("[");
const jsonString = data
.slice(startOfJSON)
.replace(/\0/g, ""); // getting rid of null characters
const json = JSON.parse(jsonString.trim());
resolve(json);
}
});
stream.on("end", function () {
next(); // ready for next entry
});
stream.resume(); // just auto drain the stream
});
pack.pipe(extract);
})
.catch((error) => reject(error));
})
.catch((error) => reject(error));
});
};
// call method like so:
// const object = await convertTarGZFileToJSON("://file_path_that_leads_to_a_tar_gz_file.tar.gz");
@dmr121
Copy link
Author

dmr121 commented Feb 13, 2023

I used this method to handle Mailchimp batch webhooks. Mailchimp batch webhooks return a payload with a URL to a .tar.gz archive with data on all of your batched operations (whether they succeeded or failed, what the error was, data being modified, etc...). I created this method as a way to fetch the .tar.gz archive from the url, unzip it, extract the single json file from the .tar archive, clean the data (trim string, remove null chars, etc..), and return the parsed json object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment