Skip to content

Instantly share code, notes, and snippets.

View thiagomarzagao's full-sized avatar

Thiago Marzagão thiagomarzagao

View GitHub Profile
@thiagomarzagao
thiagomarzagao / wordcount.py
Last active November 29, 2022 06:32
This Python code creates a word-frequency matrix for every txt file in the specified input folder ('ipath'). It removes all special characters ($, %, #, etc) and all numbers, but keeps all accented characters (Ñ, á, ç, etc). It also removes proper nouns, in a probabilistic way (if all occurrences of the word in the text are capitalized, the word…
### GENERATE WORD-FREQUENCY MATRICES
### author: Thiago Marzagao
### contact: marzagao ddott 1 at osu ddott edu
### supported encoding: UTF8
### supported character sets:
### Basic Latin (Unicode 0-128)
### Latin 1 Suplement (Unicode 129-255)
### Latin Extended-A (Unicode 256-382)
@thiagomarzagao
thiagomarzagao / wordscores.py
Last active December 18, 2015 07:19
The Python script below implements the ‘wordscores’ algorithm (see Laver, M., Benoit, K., Garry, J. Extracting policy positions from political texts using words as data. American Political Science Review, 97(2), 2003, pp. 311-331). It takes as inputs word-frequency matrices. These matrices must be in CSV format. The first column must contain the…
### WORDSCORES (LBG-2003)
### author: Thiago Marzagao
### contact: marzagao ddott 1 at osu ddott edu
import os
import numpy as np
import pandas as pd
ipath = '/Users/username/inputdata/' # folder containing the CSV files
opath = '/Users/username/outputdata/' # folder where output will be saved
@thiagomarzagao
thiagomarzagao / mcq.py
Last active May 4, 2020 01:50
The Python script below implements the “Fightin’ Words” algorithm (see Monroe, B., Colaresi, M., Quinn, K. Fightin’ words: lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis, 16(4), pp. 372-403). It takes as inputs word-frequency matrices. These matrices must be in CSV format. The first…
### FIGHTIN' WORDS (MCQ-2008)
### author: Thiago Marzagao
### contact: marzagao ddott 1 at osu ddott edu
import os
import sys
import pandas as pd
import numpy as np
from numpy import matrix as m
import re
import math
import pickle
import logging
import gensim
import numpy as np
import pandas as pd
from casenames import casenames
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level = logging.INFO, filename = 'output.log')
import pickle
import gensim
import logging
import pandas as pd
from casenames import casenames
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level = logging.INFO, filename = 'output.log')
# set number of topics
num_topics = 50
import os
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor
# set input path (path to LSA or LDA results)
ipath = '/home/ubuntu/results/lsa/results.csv'
casenames = [
'Afghanistan1992',
'Afghanistan1993',
'Afghanistan1994',
'Afghanistan1995',
'Afghanistan1996',
'Afghanistan1997',
'Afghanistan1998',
'Afghanistan1999',
'Afghanistan2000',
@thiagomarzagao
thiagomarzagao / ads.py
Created May 30, 2014 04:51
Code used for my "Automated Democracy Scores" paper.
#!/usr/bin/env python
import os
import time
import pickle
import numpy as np
import pandas as pd
# set paths
basepath = '/fs/lustre/osu6994/hdf5/'
@thiagomarzagao
thiagomarzagao / dimensao.do
Created May 30, 2014 05:42
Code used for my paper "A dimensao geografica das eleicoes brasileiras".
* extracting variance estimates by state (to be used in R)
reg pt2 party lgdpcap bolsagdp rural illiteracy nonadequate AL AM AP BA CE DF ES GO MA MG MS MT PA PB PE PI PR RJ RN RO RR RS SC SE SP TO
predict double eps, residual
robvar eps, by(state)
by state, sort: egen sd_eps = sd(eps)
generate double gw_wt = 1/sd_eps^2
tabstat sd_eps gw_wt, by(state)
* running initial diagnostics (obs.: failed; too many observations for spatwmat)
@thiagomarzagao
thiagomarzagao / dimensao.R
Created May 30, 2014 05:44
Code used for my paper "A dimensao geografica das eleicoes brasileiras".
### preliminary stuff
setwd("/Users/thiagomarzagao/desktop/PROJECT")
library(foreign)
library(MASS)
library(car)
library(lmtest)
library(spdep)
library(sphet)
library(Matrix)
library(spgwr)