Skip to content

Instantly share code, notes, and snippets.

@vitorbaptista
vitorbaptista / Makefile
Last active December 13, 2021 23:36
Lista de links .gov.br
gov.br.txt:
python google_search.py | grep http | tee -a gov.br.txt
@vitorbaptista
vitorbaptista / README.md
Last active October 21, 2020 02:15
Exercício extra da aula sobre Limpeza de Dados do curso Dominando o Fluxo de Trabalho com Dados da Escola de Dados

Dominando o Fluxo de Trabalho com Dados - Limpeza de Dados - OpenRefine

Para exercitar o conhecimento adquirido na aula, tente transformar os dados do arquivo caged.csv no formato Tidy Data. Os principais desafios desse arquivo são:

  1. Como adicionar uma coluna "UF"?
  2. Como adicionar uma coluna "Ano"?
  3. Como apagar as linhas irrelevantes, como nome de colunas repetidas, linhas em branco, e "Total"?

Você pode achar útil:

We can't make this file beautiful and searchable because it's too large.
country,mobility_area,date,value
LY,residential,2020-03-29,21
LY,residential,2020-03-28,17
LY,residential,2020-03-27,14
LY,residential,2020-03-26,20
LY,residential,2020-03-25,19
LY,residential,2020-03-24,12
LY,residential,2020-03-23,13
LY,residential,2020-03-22,15
LY,residential,2020-03-21,11
@vitorbaptista
vitorbaptista / Makefile
Created November 8, 2019 14:04
Members list with geolocalized IPs of Iron March neo-fascist website data dump
core_members_ip_locations.tsv:
csvcut -c6 core_members.csv | \
grep -v ip_address | \
xargs -L 1 ip2geotools -d dbipcity -f csv-tab
@vitorbaptista
vitorbaptista / domains-4-letter-words.csv
Created August 12, 2018 20:03
List of Handshake domain auction dates for every <= 4 letter word in English wordlist
status reserved error block start name
Available False None 138816 04/25/2019 a
Reserved True None 13824 09/20/2018 aa
Reserved True None 183168 07/11/2019 aaa
Available False None 17856 09/27/2018 aah
Available False None 114624 03/14/2019 aahs
Available False None 130752 04/11/2019 aal
Available False None 21888 10/04/2018 aals
Available False None 203328 08/15/2019 aam
Available False None 38016 11/01/2018 aani
@vitorbaptista
vitorbaptista / .gitignore
Last active March 11, 2018 17:43
OpenBelgium 2018 presentation on using Frictionless Data tools to package and validate data
# Created by https://www.gitignore.io/api/python
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
@vitorbaptista
vitorbaptista / README.md
Created October 24, 2017 20:55
Remuneração dos servidores e comissionados da Câmara de SP (Referente a 09/2017)
@vitorbaptista
vitorbaptista / Makefile
Last active June 15, 2017 16:51
Extract reimbursements from Serenata de Amor's Jarbas API
.PHONY: clean
all: reimbursements.csv
clean:
rm -f reimbursements.csv reimbursements.json
reimbursements.csv: reimbursements.json
@echo 'issue_date,term_id,term,applicant_id,congressperson_id,congressperson_name,congressperson_document,party,state,subquota_id,subquota_description,subquota_group_id,subquota_group_description,supplier,cnpj_cpf,passenger,leg_of_the_trip,document_type,document_number,installment,batch_number,document_id,document_value,remark_value,total_net_value,total_reimbursement_value,probability,receipt.url,receipt.fetched,last_update,available_in_latest_dataset,suspicions.irregular_companies_classifier,suspicions.meal_price_outlier,suspicions.election_expenses,suspicions.invalid_cnpj_cpf,suspicions.over_monthly_subquota_limit,suspicions.suspicious_traveled_speed_day,suspicions.meal_price_outlier,suspicions.irregular_companies_classifier' > $@
jq -r '.results[] | [.issue_date, .term_id, .term, .applicant_id, .congressperson_id, .congressperson_name, .congressperson_document, .party, .s
@vitorbaptista
vitorbaptista / guide.py
Created April 25, 2017 19:29
Simplest DAG for OpenTrials's Airflow
from datetime import datetime
import airflow.models
from airflow.operators.latest_only_operator import LatestOnlyOperator
import utils.helpers as helpers
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 1),
'retries': 1,
@vitorbaptista
vitorbaptista / keybase.md
Created October 8, 2016 22:02
Verifyng myself on Keybase.io

Keybase proof

I hereby claim:

  • I am vitorbaptista on github.
  • I am vitorbaptista (https://keybase.io/vitorbaptista) on keybase.
  • I have a public key ASCzHuiG5mXAuFZmosmpdpi4kLA8YpficoW3o3cmzrpYiAo

To claim this, I am signing this object: