Skip to content

Instantly share code, notes, and snippets.

@soodoku
soodoku / unique_words_hindi_monolingual.ipynb
Created Dec 18, 2021
IIT monolingual hindi database to list of unique words
View unique_words_hindi_monolingual.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@soodoku
soodoku / read_parsed_dmoz.py
Created Feb 18, 2021
Reading in the parsed DMOZ file
View read_parsed_dmoz.py
import csv
import pandas as pd
import numpy as np
df = pd.read_csv('parsed-new.csv', header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')
df.head()
View mturk.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@soodoku
soodoku / get_unique_domain_names_from_comscore.py
Created Feb 12, 2018
Get a list of unique domain names from comScore browsing data
View get_unique_domain_names_from_comscore.py
#
# Get All Unique Domain Names from comScore
#
# INPUT: comScore browsing data file
#
# OUTPUT: a text file containing a list of unique domains
#
# PAREMETERS:
# + INTERNET_USAGE_FILE: path to the comScore browsing data
# + FINAL_OUTPUT_FILE: path to intended output file
@soodoku
soodoku / county_dma_2016.R
Created Nov 21, 2017
DMA to County for 2016
View county_dma_2016.R
library(readr)
library(dplyr)
a_string <- read_file("nielsen_2016")
split_lines <- strsplit(a_string, "\r\n")[[1]]
split_cols <- strsplit(split_lines, "--")
dat_frame <- ldply(split_cols)
names(dat_frame) <- c("dma", "counties")
write.csv(dat_frame, file = "dma_counties_2016.csv", row.names = F)
View state_leg_twitter_handles_shor_mccarty_scores.csv
State Position Name @Twitter Handle Part Affiliation Boris Shor's Score
AZ Senator David Bradley @Bradley4AZ Democrat -1.253
AZ Senator Katie Hobbs @katiehobbs Democrat -1.684
AZ Senator Ed Ableser @SenatorAbleser Democrat -1.606
AZ Senator Barbara McGuire @SenBarbMcGuire Democrat -0.672
AZ Senator Steve Farley @SteveFarleyAZ Democrat -1.413
AZ Senator Adam Driggs @AdamDriggs Republican 0.738
AZ Senator Bob Worsley @bob_worsley Republican 0.46
AZ Senator Kelli Ward @kelliwardaz Republican 1.144
AZ Senator Nancy Barto @NancyBarto Republican 0.996
View cces_2006_fairest.R
# Output = http://gbytes.gsood.com/2013/11/02/the-fairest-of-them-all/
# Uses cces_recode.R here: https://github.com/soodoku/in-n-out/scripts/cces_recode.R
# Plotting the fairest of all media
library(lattice)
png("fairmedia.png")
dotplot(t(t(table(droplevels(cces06$fairmedia[cces06$fairmedia != "Don't know"]), cces06$pid3[cces06$fairmedia != "Don't know"]))/colSums(table(droplevels(cces06$v2112[cces06$fairmedia != "Don't know"]), cces06$pid3[cces06$fairmedia != "Don't know"]))),
main = "Which network do you think provides the \n fairest coverage of national news?",
@soodoku
soodoku / fun.md
Last active Oct 30, 2017
Fun Math
View fun.md

Fun Math

Fun Fact 1

  • 1 - 1 + 1 - 1 + 1 - 1 + .... = 1/2
  • Proof (1) by Luigi Grandi:

S = 1 - 1 + 1 - 1 + 1 - 1 + ...
1 - S = 1 - (1 - 1 + 1 - 1 + 1 - 1 + ...)
= 1 - 1 + 1 - 1 + 1 - 1 + ...

@soodoku
soodoku / approval.csv
Created Oct 28, 2017
Approval Data on Few Politicians
View approval.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 38 columns, instead of 4. in line 5.
Last,First,N,Mode,Date Start,Date End,TotApp,TotDis,TotDK,DemApp,DemDis,DemDK,RepApp,RepDis,RepDK,IndApp,IndDis,IndDK,LibApp,LibDis,LibDK,ModApp,ModDis,ModDK,ConApp,ConDis,ConDK,Source,Link,,,,,,,,,
Bachmann,Michelle,928,Phone,07/15/11,07/17/11,29,45,26,16,58,26,45,27,28,28,47,24,15,64,21,17,53,30,50,24,26,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_National_720925.pdf,,,,,,,,,
Bachmann,Michelle,700,Phone,10/07/11,10/10/11,27,56,17,15,68,17,43,36,21,23,64,13,11,77,12,14,62,23,49,34,17,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_US_1011513.pdf,,,,,,,,,
Bachmann,Michelle,700,Phone,12/16/11,12/18/11,30,54,16,22,65,13,41,39,19,26,59,16,21,71,8,20,67,12,42,34,23,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_National_1220925.pdf,,,,,,,,,
Bloomberg,Michael,707,Phone,11/19/10,11/21/10,19,38,44,24,30,46,12,48,40,19,37,44,28,29,43,24,28,48,9,53,39,PPP,http://publicpolicypolling.blogspot.com/2010/11/americans-not-impressed-with-bloomberg.html,,,,,,,,,
Bloomberg,Michael,700,P
View tp_trelli.r
devtools::install_github("abresler/gdeltr2")
devtools::install_github("hafen/trelliscopejs")
library(gdeltr2)
library(dplyr)
asb_ocr <- "Brooklyn Nets"
gkg_codes <-
get_codes_gkg_themes()
imageweb_codes <- get_gdelt_codebook_ft_api(code_book = "imageweb")