Skip to content

Instantly share code, notes, and snippets.

@diraol
Last active December 28, 2015 08:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save diraol/7474723 to your computer and use it in GitHub Desktop.
Save diraol/7474723 to your computer and use it in GitHub Desktop.
shell script to join all files from "Filiados Partidários" obtained on TSE repository into one single file with all political partie members.
#!/bin/bash
#To run this script you should be on a "root folder" on the same level as the "filiados_<nome_partido>_<estado>" folders
# So, the folder structure would be:
# - current_folder
# |
# | - partido_xx_aa
# | - partido_xy_bb
# ...
# The script will generate a single "br_filiados.csv" file.
#
# IMPORTANT:
# To run this script on all parties and states it would take a long time.
#
# Origin of data: http://www.tse.jus.br/partidos/filiacao-partidaria/relacao-de-filiados
#
###################################################################################
#
#
#Remove all PDF files from current folder and subfolders
# to remove uncomment the line below
#find . -name '*.pdf' -delete
#Remove all "sob_judice" files from current folder and subfolders
# to remove uncomment the line below
#find . -name '*sub_jud*' -delete
#Copy all correct CSV files to current folder
find . -name "filiados_*.csv" -not -name "*sob_jud*" -print0 | xargs -0 cp -t .
#Get the list of csv files on the current folder that would be read
FILES=`ls filiados*.csv`
#Convert all CSV files from ISO-8859-1 to UTF-8, generating new files
# wich names began with 'utf8_'
for FILE in $FILES;
do
iconv -f 'ISO-8859-1' -t 'UTF-8' $FILE > utf8_$FILE;
done
#Gets the CSV header from one converted file and puts it on the final file
head -n1 `ls -1 utf8_filiados_*.csv | tail -n 1` >> br_filiados.csv
#For each CSV copy the content, without the header, to the final file
for FILE in $FILES;
do
tail -n+2 utf8_$FILE >> br_filiados.csv;
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment