Skip to content

Instantly share code, notes, and snippets.

View MattEding's full-sized avatar

Matt Eding MattEding

  • HinaLea
  • California: Bay Area
View GitHub Profile
@MattEding
MattEding / ada-bench-loops.txt
Created November 19, 2019 19:33
ADASYN Vectorize vs Loop Benchmark
Namespace(file=None, n_jobs=4, n_neighbors=5, sampling_strategy='auto', trials=3)
1 ecoli
0.6515465679999999
2 optical_digits
1.081901593
3 satimage
0.757048266
4 pen_digits
0.7772778269999998
5 abalone
@MattEding
MattEding / smote_sampling_strategy_test.py
Created September 8, 2019 06:39
SMOTE sampling strategy comparison - random vs evenly distributed minority class indices to oversample
import argparse
import functools
import numpy as np
import pandas as pd
from imblearn.datasets import fetch_datasets
import imblearn.datasets._zenodo as zenodo
from imblearn.metrics import specificity_score
from imblearn.over_sampling import SMOTE
@MattEding
MattEding / smote_benchmark.py
Created September 5, 2019 00:15
Benchmark to compare original implemenation vs my vectorized implementation for SMOTE algorithm.
from time import time
import numpy as np
from scipy import sparse
from imblearn.over_sampling import SMOTE
def benchmark(sampler, X, y):
imb = np.unique(y, return_counts=True)
@MattEding
MattEding / youtube_duration_playlist.py
Created September 12, 2018 03:30
Given a playlist URL of YouTube videos, calculate the running time length given a specified watching speed.
from collections import namedtuple
from bs4 import BeautifulSoup
import requests
Time = namedtuple('Time', 'hr min sec')
def playlist_duration(url, speed=1):
source = requests.get(url)
import collections
import itertools
import json
import os
import random
SEPARATOR = '=' * 25 + '\n'
DEFAULT = collections.defaultdict(str)
@MattEding
MattEding / eurika_math_pdf_splitter.py
Created September 12, 2018 03:12
Splits a Eurika math pdf into separate lessons
import os
import re
import PyPDF2
def split_eurika_book(pdf_path, page_list, *, name_format='G{grade} - M{module} L{lesson}.pdf'):
"""Splits a PDF of a Eurika Math book into separate sections based on
a list of page numbers provided.
"""
pattern = re.compile(r'[gG](\d+).[mM](\d+)')