Skip to content

Instantly share code, notes, and snippets.

@bryophyta
Last active July 8, 2022 16:31
Show Gist options
  • Save bryophyta/2edd36c4e151df90c085fb8695e0ea54 to your computer and use it in GitHub Desktop.
Save bryophyta/2edd36c4e151df90c085fb8695e0ea54 to your computer and use it in GitHub Desktop.
GHA workflow scraper

README

WIP scripts for scraping and processing Github Actions workflow runs data.

To run from the command line:

deno run --no-check --allow-net=api.github.com --allow-env="GITHUB_TOKEN" --allow-write index.ts [workflow_name]

There's also an optional parameter of a Github API date string to search by,e.g.:

-d=">2022-07-06"

Todos

  • Compare Octokit setup with Max's, and try to gain a better understanding of how typing for this lib works
  • Make filepath more configurable (and validate it before trying to use?)
  • Rationalise typing
  • Make it possible to customise the headings to scrape/save, and also the file saving options
  • Add 'update' functionality which will pick up from where the previous run left off

Feedback

Comments on the above, or any other issues, very welcome!

/**
* This Deno/Octokit setup draws heavily from an comment by @gr2m on an Octokit issue:
* https://github.com/octokit/octokit.js/issues/2075#issuecomment-817347123
*
* It's a bit of a workaround, but it seems as though Deno/Typescript support still
* isn't straightforward for Octokit. If you have any tips though, please let me know!
* */
import { Octokit as OctokitCore } from "https://cdn.skypack.dev/@octokit/core?dts";
import { paginateRest } from "https://cdn.skypack.dev/@octokit/plugin-paginate-rest?dts";
import { restEndpointMethods } from "https://cdn.skypack.dev/@octokit/plugin-rest-endpoint-methods?dts";
import { retry } from "https://cdn.skypack.dev/@octokit/plugin-retry?dts";
import { throttling } from "https://cdn.skypack.dev/@octokit/plugin-throttling?dts"
/** Github token for Authentication */
const token = Deno.env.get("GITHUB_TOKEN");
if (!token) throw new Error("Missing GITHUB_TOKEN");
export const Octokit = OctokitCore.plugin(
restEndpointMethods,
paginateRest,
retry,
throttling
).defaults({
throttle: {
onRateLimit,
onAbuseLimit,
},
});
function onRateLimit(retryAfter: number, options: any, octokit: any) {
octokit.log.warn(
`Request quota exhausted for request ${options.method} ${options.url}`
);
if (options.request.retryCount === 0) {
// only retries once
octokit.log.info(`Retrying after ${retryAfter} seconds!`);
return true;
}
}
function onAbuseLimit(retryAfter: number, options: any, octokit: any) {
octokit.log.warn(
`Abuse detected for request ${options.method} ${options.url}`
);
if (options.request.retryCount === 0) {
// only retries once
octokit.log.info(`Retrying after ${retryAfter} seconds!`);
return true;
}
}
export const octokit = new Octokit({ auth: token });
import { parse } from "https://deno.land/std@0.146.0/flags/mod.ts";
import { scrapeWorkflowRuns } from "./scrape-ci-logs.ts"
const parsedArguments = parse(Deno.args);
// 'filename.yml' for the workflow file but without the '.yml' extension
// (or the github id for the workflow)
const workflows = parsedArguments._;
if (workflows.length === 0)
throw new Error("at least one workflow must be specified");
const dateFilter = parsedArguments.d ?? undefined;
for (const workflow of workflows) {
// todo: allow this to be overridden by CLI args
const filepath = `data/${workflow}_${getDateStringOffsetFromToday(0)}.csv`;
await scrapeWorkflowRuns(workflow, filepath, dateFilter)
}
Deno.exit();
// ---
function getDateStringOffsetFromToday(daysOffset = 0) {
const today = new Date();
today.setDate(today.getDate() + daysOffset);
return [
today.getFullYear(),
`${today.getMonth() + 1}`.padStart(2, "0"),
today.getDate().toString().padStart(2, "0"),
].join("-");
}
import { writeCSVObjects } from "https://deno.land/x/csv/mod.ts";
import {
GetResponseDataTypeFromEndpointMethod,
} from "https://cdn.skypack.dev/@octokit/types?dts";
import { octokit } from "./github-api.ts";
type RunsKey = "workflow_runs";
type WorkflowRun = GetResponseDataTypeFromEndpointMethod<
typeof octokit.rest.actions.listWorkflowRuns
>[RunsKey][0];
export type SelectedWorkflowData = Pick<
WorkflowRun,
| "id"
| "name"
| "head_branch"
| "run_number"
| "head_sha"
| "status"
| "conclusion"
| "created_at"
| "updated_at"
| "run_attempt"
>;
type StringsObject<T> = {
[Property in keyof T]: string;
};
const header = [
"id",
"name",
"head_branch",
"run_number",
"head_sha",
"status",
"conclusion",
"created_at",
"updated_at",
"run_attempt",
];
export async function scrapeWorkflowRuns(
workflow: string | number,
outputPath: string,
dateFilter?: string,
fileOptions: Deno.OpenOptions = {
write: true,
create: true,
append: true,
},
) {
console.log(`Scraping: ${workflow}`);
const runsIterator = octokit.paginate.iterator(
octokit.rest.actions.listWorkflowRuns,
{
owner: "guardian",
repo: "dotcom-rendering",
workflow_id: `${workflow}.yml`,
per_page: 100,
created: dateFilter,
}
);
const processor = async function* (iterator: typeof runsIterator) {
for await (const response of iterator) {
yield* response.data?.map((run) => {
return processWorkflowRun(run);
});
}
};
const f = await Deno.open(outputPath, fileOptions);
await writeCSVObjects(f, processor(runsIterator), { header });
f.close();
console.log("done!");
}
// ---
function processWorkflowRun(
run: WorkflowRun
): StringsObject<SelectedWorkflowData> {
const fillerString = "Not set";
return {
// Deno's csv module seems to expect strings
id: run.id.toString(),
name: run.name ?? fillerString,
head_branch: run.head_branch ?? fillerString,
run_number: run.run_number.toString(),
head_sha: run.head_sha,
status: run.status ?? fillerString,
conclusion: run.conclusion ?? fillerString,
created_at: run.created_at,
updated_at: run.updated_at,
run_attempt: run.run_attempt?.toString(),
};
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment