Skip to content

Instantly share code, notes, and snippets.

View kstreepy's full-sized avatar

Kyle Streepy kstreepy

  • Data Scientist
  • Arlington, VA
View GitHub Profile
@kstreepy
kstreepy / dict_map function
Created June 1, 2020 15:33
map in dict as new column
def gs_group(df):
gs_dict = {'GS-1' : 'GS 1-6',
'GS-2' : 'GS 1-6',
'GS-3' : 'GS 1-6',
'GS-4' : 'GS 1-6',
'GS-5' : 'GS 1-6',
'GS-6' : 'GS 1-6',
'GS-7' : 'GS 7-9',
'GS-8' : 'GS 7-9',
'GS-9' : 'GS 7-9',
@kstreepy
kstreepy / gz_extract.py
Created June 11, 2019 16:09
For a given directory, unzip all .gz files in folder, save unzipped files in folder and deleted zipped files. A python solution for instances where you do not have access to PowerShell.
import os, gzip, shutil
dir_name = 'x'
def gz_extract(directory):
extension = ".gz"
os.chdir(directory)
for item in os.listdir(directory): # loop through items in dir
if item.endswith(extension): # check for ".gz" extension
gz_name = os.path.abspath(item) # get full path of files
@kstreepy
kstreepy / quick_melt.py
Created May 30, 2019 12:02
Given a wide dataframe, use melt to transform from wide to long. Declare the ID Columns and establish value columns as all remaining columns in dataframe.
import pandas as pd
def quick_melt(wide_df):
'''
Take wide dataframe and melt to long. Declare ID Columns (id_cols)
and then establish all remaining columns as value columns
'''
id_cols = {'A', 'B', 'C'}
value_cols = set(wide_df.columns) - id_cols
@kstreepy
kstreepy / read_multi_csv_source.py
Last active May 29, 2019 17:48
Read in multiple CSV files in a folder into single dataframe with a new column with the name of the source file.
import pandas as pd
import os
import glob
def read_multi_csv(path):
'''
Given a file path with wildcard and extension, parse all files with that extension in directory
into a single dataframe.
'''
@kstreepy
kstreepy / read_multi_excel_source.py
Created May 29, 2019 17:47
Read in multiple Excel files into single dataframe with filename as a column in new dataframe.
import pandas as pd
import os
import glob
def read_multi_excel(path):
'''
Given a file path with wildcard and extension, parse all files with that extension in directory
into a single dataframe.
'''
@kstreepy
kstreepy / read_multi_csv.py
Created May 29, 2019 17:37
Read multiple CSV's in file folder into single pandas dataframe.
import pandas as pd
import glob
def read_multi_csv(path):
'''
Given a file path with wildcard and extension, parse all files with that extension in directory
into a single dataframe.
'''
@kstreepy
kstreepy / read_multi_excel.py
Last active May 29, 2019 17:49
Read multiple Excel files within a file folder into single pandas dataframe.
import pandas as pd
import glob
def read_multi_excel(path):
'''
Given a file path with wildcard and extension, parse all files with that extension in directory
into a single dataframe.
'''
all_files = glob.glob(path)