Skip to content

Instantly share code, notes, and snippets.

View brandontlocke's full-sized avatar

Brandon Locke brandontlocke

View GitHub Profile
@brandontlocke
brandontlocke / working with gale directories
Last active July 13, 2017 18:05
process to take the FLH corrected txt files out of the messy structure and create plain txt files
#delete all of the stray images
find . -type f -name '*.jpg' -delete
#delete all of the *images* folders (had to run with capital and lowercase 'images'
find -type d -name images -exec rm -rf {} \;
#rename
for subdir in *; do mv $subdir/*.txt $subdir.txt; done;
#remove empty directories
find . -type d -empty -delete

Keybase proof

I hereby claim:

  • I am brandontlocke on github.
  • I am brandontlocke (https://keybase.io/brandontlocke) on keybase.
  • I have a public key ASC7AocCJ5yFidT21Vyme-OzMnJBez8WfNsTdWH_EVpnOwo

To claim this, I am signing this object:

#!/bin/bash
printf "file '%s'\n" *.mov > mylist.txt
ffmpeg -f concat -i mylist.txt -c copy video.mov
rm mylist.txt
#!/bin/bash
IFS=$(echo -en "\n\b"); for i in AVCHD/BDMV/STREAM/*.MTS; do ffmpeg -i "$i" -vcodec mpeg4 -b:v 3000k -b:a 192k "$i.mp4"; done
#IFS=$(echo -en "\n\b"); for i in AVCHD/BDMV/STREAM/*.MTS; do ffmpeg -i "$i" -b:v 400k -preset veryfast -crf 29 -vcodec copy "$i.mp4"; done
#ffmpeg -i 00005.mts -s 480x320 -vcodec mpeg4 -b:v 3000k -b:a 192k test.mp4
printf "file '%s'\n" AVCHD/BDMV/STREAM/*.mp4 > mylist.txt
ffmpeg -f concat -i mylist.txt -c copy concat.mp4
@brandontlocke
brandontlocke / jsonsplit.py
Last active July 12, 2018 18:19
split chronicling america api results into individual text files
import json
with open('path/to/file.json') as json_file:
data = json.load(json_file)
for p in data['items']:
file = open(p['date']+p['title']+"pg"+p['page']+".txt", "w")
file.write(p['ocr_eng'])
file.close()
#!/usr/bin/env python
import json
with open('fordlaborunion.json') as json_file:
data = json.load(json_file)
for p in data['items']:
file = open(p['date']+p['title']+"pg"+p['page']+".txt", "w")
file.write(p['ocr_eng'])
file.close()
@brandontlocke
brandontlocke / batchner-to-chunked-network.py
Last active October 25, 2018 17:49
transforms the batchner output into a nodes and edges file, then chunks into smaller files for Gephi projection. this is pretty specific to the FLH dataset
import pandas as pd
#import file & rename column headers
edges = pd.read_csv('https://raw.githubusercontent.com/FannieLouHamerPapers/NamedEntities/master/flh_ner_all.csv')
edges.columns = ['source', 'target', 'entityType', 'weight']
#add column to make network undirected
edges['type'] = 'undirected'
#chunk out into multiple edges files by selecting one of the numbers in the filename
#one file includes most of the rows, so these are divded weirdly
import networkx as nx
from networkx.algorithms import bipartite
import pandas as pd
#create empty multigraph - multigraph is an undirected graph with parallel edges
G = nx.MultiGraph()
#import file & create nodes
flhfull=pd.read_csv('https://raw.githubusercontent.com/FannieLouHamerPapers/NamedEntities/master/flh_ner_all.csv')
nodes=flhfull['name'].drop_duplicates()
@brandontlocke
brandontlocke / batchner-to-network.py
Last active November 5, 2018 01:26
takes batchner output, creates projected network and various entity edge lists from it
import networkx as nx
from networkx.algorithms import bipartite
import pandas as pd
##########################################
##### BE SURE TO SET THESE VARIABLES #####
##########################################
#import batchner results into a dataframe—learn more about batchner: https://github.com/brandontlocke/batchner
batchner=pd.read_csv('PATH/TO/FILE', low_memory=False)
@brandontlocke
brandontlocke / flh-metadatamerge.py
Created November 1, 2018 02:23
not totally happy with this, but it does the job
import pandas as pd
#read in data
entities = pd.read_csv('https://raw.githubusercontent.com/FannieLouHamerPapers/NamedEntities/master/flh_ner_all.csv')
metadata = pd.read_csv('flhmetadata.csv')
#cut '.txt' from the doc names
entities.doc = entities.doc.str[:16]
#join dataframes; select only some