Skip to content

Instantly share code, notes, and snippets.

View jspeed-meyers's full-sized avatar

John Speed Meyers jspeed-meyers

View GitHub Profile
@jspeed-meyers
jspeed-meyers / anaconda_scrape.py
Created July 11, 2021 14:32
Scrape anaconda package lists.
"""Scrape anaconda package names and corresponding github links.
User provides a URL to the anaconda page that contains the package information
for one particular python version, e.g. 3.9, and for one particular platform,
e.g. linux 64. This program then extracts all the package names and associated
links (provided by Anaconda) for each package. This data is then exported
to a csv.
"""
import csv
@jspeed-meyers
jspeed-meyers / filter_packages_with_github_repos.py
Last active August 7, 2021 16:59
A script to filter in only those anaconda packages with a GitHub link
"""Filter in packages with a GitHub link.
Take as input a .csv file with a field called clean_link, then
output only those values that include https://github.com.
The ouput should be a .txt file, each github link on its own line.
"""
import time
@jspeed-meyers
jspeed-meyers / summarize_by_contributor.py
Created August 10, 2021 14:47
Summarize by count of packages by contributor
"""Summarize GitGeo .csv results by contributor.
Input is a .csv file from GitGeo, which includes a column named 'country.'
Output is terminal output listing the count by contributor.
"""
import time
import pandas as pd
@jspeed-meyers
jspeed-meyers / qosf_scrape.py
Last active January 10, 2022 20:13
Scrape GitHub links from Quantum Open Source Fund (qosf) projects page
"""Scrape quantum open source fund package links
Identify and store in CSV all GitHub links associated with quantum open source
fund projects. Projects without a GitHub link will not be included.
NOTE: User has to do a little manual cleaning after running this script.
"""
import csv
@jspeed-meyers
jspeed-meyers / scoredeck.sh
Last active July 6, 2022 00:58
Run ossf/scorecard on multiple repos and output results to different json files
#!/bin/bash
# scoredeck.sh
# Collect scorecard data from a set of repos listed in repos.txt
# and store in files inside a data folder
#
# Usage:
# $ ./scoredeck.sh
#
# Requires GITHUB_AUTH_TOKEN to be set to a valid GitHub personal access
@jspeed-meyers
jspeed-meyers / get_github_org_repos.py
Created July 6, 2022 11:55
Collect all non-archived repo names associated with a GitHub organization
# collect all non-archived repo names associated with one GitHub organization and
# save in text file.
#
# USAGE:
#
# export GITHUB_AUTH_TOKEN=lkdjflkdjglkdjlkjg
#
# python get_org_repos.py
#
# NOTE:
@jspeed-meyers
jspeed-meyers / parse_scorecard_json.py
Last active July 9, 2022 00:37
Parse scorecard-derived JSON files and store in a csv
# Parse json files created by scorecard tool and store results in
# a csv
#
# Usage:
#
# python parse_scorecard_json.py
#
#
# Note: Results are stored in a csv folder in a timestamped csv
#
@jspeed-meyers
jspeed-meyers / create-scorecards-histogram.py
Created July 9, 2022 00:39
Analyze scorecards data and create a histogram
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("csv/FILENAE.csv")
# create plot
fig, ax = plt.subplots(figsize=(6,4)) # size of sub-figures
n, _, _ = plt.hist(df.score, bins=[i/4 for i in range(0, 40)])
@jspeed-meyers
jspeed-meyers / deps_dev_retrieve_most_depended_upon_packages.sql
Created July 9, 2022 11:13
Measure number of dependencies for each version of most depended upon packages using deps.dev data - SQL Query
DECLARE LatestSnapshot TIMESTAMP;
SET LatestSnapshot = (SELECT MAX(Time) FROM `bigquery-public-data.deps_dev_v1.Snapshots`);
WITH
-- Releases includes every release of every package.
Releases AS (
SELECT
System,
Name,
@jspeed-meyers
jspeed-meyers / calculate_attack_surface_reduction.py
Created September 20, 2022 19:15
Calculate attack surface reduction percentage for pairs of container images.
"""Calculate attack surface reduction percentage for pairs of container images.
This script calculates the number of packages present in each image and then
calculates the reduction in "attack surface."
Note: Must install syft (https://github.com/anchore/syft) to use.
Author: John Speed Meyers (jsmeyers@chainguard.dev)
"""