Skip to content

Instantly share code, notes, and snippets.

@idan
Last active June 11, 2020 22:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save idan/86f85df55422aa2798e5301404f3d529 to your computer and use it in GitHub Desktop.
Save idan/86f85df55422aa2798e5301404f3d529 to your computer and use it in GitHub Desktop.
import Fiona from "https://deno.land/x/fiona/deno/index.js";
import { csvFormat, csvParse } from "https://cdn.pika.dev/d3-dsv";
const { writeTextFile, run, mkdir, stdout } = Deno;
const { now } = performance;
type Row = {
name: string;
date: Date;
bio: string;
p: number;
q: number;
r: number;
s: number;
};
const template = {
name: Fiona.Fullname,
date: Fiona.Date({ long: true }),
bio: Fiona.Sentence,
p: Fiona.Number,
q: Fiona.Number,
r: Fiona.Number,
s: Fiona.Number,
};
const dataDir = "./gitcsvtestdata/";
const fileName = "data.csv";
const totalRows = 50000;
const mutationPercentage = 0.01;
const generateRows = (count: number): Row[] => {
stdout.writeSync(new TextEncoder().encode(`Generating ${count} rows...`));
const t = now();
const rows = [...Array(count)].map((d) => Fiona().object(template));
console.log(`${now() - t}ms`);
// console.timeEnd('generateRows')
return rows;
};
const writeCSV = async (data: Row[]) => {
const path = `${dataDir}${fileName}`;
stdout.writeSync(new TextEncoder().encode(`Writing ${path}...`));
const t = now();
await writeTextFile(path, csvFormat(data));
console.log(`${now() - t}ms`);
};
const getFileSize = async (filename: string = fileName) => {
const p = run({
cmd: ["du", "-sk", filename],
stdout: "piped",
cwd: dataDir,
});
const output = new TextDecoder().decode(await p.output());
const sizeStr = output.match(/^\d+/);
if (sizeStr && sizeStr[0]) {
return Number.parseInt(sizeStr[0]);
} else {
throw new Error("No size reported!");
}
};
const gitInit = async () => {
console.log("Initializing");
await mkdir(dataDir);
const p = run({
cmd: ["git", "init"],
stdout: "null",
cwd: dataDir,
});
return await p.status();
};
const gitDestroy = async () => {
console.log("Destroying");
const p = run({
cmd: ["rm", "-rf", dataDir],
stdout: "null",
});
return await p.status();
};
const gitCommit = async (filename: string) => {
await run({
cmd: ["git", "add", fileName],
cwd: dataDir,
stdout: "null",
}).status();
await run({
cmd: ["git", "commit", "-m", "Data commit"],
cwd: dataDir,
stdout: "null",
}).status();
};
const mutateData = (data: Row[], percentage: number) => {
stdout.writeSync(new TextEncoder().encode(`Mutating ${percentage} of the data... `));
const length = data.length
const mutateCount = Math.floor(length * percentage)
const mutatedIndexes: number[] = []
for (let i = 0; i<mutateCount; i++) {
let idx
do {
idx = Fiona().number({min: 0, max: length -1})
} while (mutatedIndexes.includes(idx))
mutatedIndexes.push(idx)
data[idx] = Fiona().object(template)
}
console.log(`${mutatedIndexes.length} altered`)
}
await gitDestroy();
await gitInit();
const data = generateRows(totalRows);
await writeCSV(data);
await gitCommit(fileName);
const log = [
{
dataSizeKb: await getFileSize(),
gitSizeKb: await getFileSize('.git'),
duration: 0
}
]
for (let i=0; i<100; i++) {
console.log(`==== Mutation ${i}: `);
const t = now();
mutateData(data, mutationPercentage)
await writeCSV(data)
await gitCommit(fileName);
const duration = now() - t
const dataSizeKb = await getFileSize()
const gitSizeKb = await getFileSize('.git')
log.push({dataSizeKb, gitSizeKb, duration})
console.log(`data:${dataSizeKb} git:${gitSizeKb} duration:${duration}`)
}
await writeTextFile('run.csv', csvFormat(log));
@idan
Copy link
Author

idan commented Jun 11, 2020

To install: deno install -f --allow-run --allow-hrtime --allow-write https://gist.github.com/idan/86f85df55422aa2798e5301404f3d529/raw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment