Skip to content

Instantly share code, notes, and snippets.

View rebeccabilbro's full-sized avatar

Rebecca Bilbro rebeccabilbro

View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rebeccabilbro
rebeccabilbro / top-data-science-questions.md
Created February 25, 2016 22:32
initial outline for Brittne's blog post

Title

The top n questions data scientists ask

Introduction

Data science doesn’t start with data, it starts with a problem…

The pipeline model is useful, but data scientists progress via a series of questions - what are those questions?

Scoping

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rebeccabilbro
rebeccabilbro / elastic_indexer.py
Last active February 25, 2021 08:30
Create an ElasticSearch instance, and given a list of documents, index the documents into ElasticSearch.
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch
class ElasticIndexer(object):
"""
Create an ElasticSearch instance, and given a list of documents,
index the documents into ElasticSearch.
"""
def __init__(self):
self.elastic_search = Elasticsearch()
@rebeccabilbro
rebeccabilbro / get_hobbies.py
Created June 27, 2018 15:43
Load the yellowbrick hobbies corpus
import os
from sklearn.datasets.base import Bunch
from yellowbrick.download import download_all
## The path to the test data sets
FIXTURES = os.path.join(os.getcwd(), "data")
## Dataset loading mechanisms
datasets = {
@rebeccabilbro
rebeccabilbro / doctor.go
Created August 19, 2018 21:48
System doctor
package main
import (
"fmt"
"log"
"github.com/shirou/gopsutil/mem"
"github.com/shirou/gopsutil/cpu"
"github.com/shirou/gopsutil/disk"
"github.com/shirou/gopsutil/host"
@rebeccabilbro
rebeccabilbro / get_walking_data.py
Created August 23, 2018 21:40
Download & wrangle walking dataset
import os
import zipfile
import requests
import pandas as pd
WALKING_DATASET = (
"https://archive.ics.uci.edu/ml/machine-learning-databases/00286/User%20Identification%20From%20Walking%20Activity.zip",
)
def download_data(path='data', urls=WALKING_DATASET):
@rebeccabilbro
rebeccabilbro / kimchi.py
Created January 25, 2019 16:07
For converting Python 2 pickles to Python 3
# kimchi.py
# For converting Python 2 pickles to Python 3
import os
import dill
import pickle
import argparse
def convert(old_pkl):
@rebeccabilbro
rebeccabilbro / classifier_comparison.py
Created June 5, 2019 13:00
Produce customizable classifier comparison plots
#!/usr/bin/python
# -*- coding: utf-8 -*-
# plot_classifier_comparison.py
"""
A comparison of a several classifiers in scikit-learn on synthetic datasets.
The point of this example is to illustrate the nature of decision boundaries
of different classifiers.
Particularly in high-dimensional spaces, data can more easily be separated
linearly and the simplicity of classifiers such as naive Bayes and linear SVMs