Skip to content

Instantly share code, notes, and snippets.

View thiagomarzagao's full-sized avatar

Thiago Marzagão thiagomarzagao

View GitHub Profile
# scrape e-Compras GDF (https://www.compras.df.gov.br/)
import os
import requests
from bs4 import BeautifulSoup
baseurl = 'https://www.compras.df.gov.br/publico/'
basepath = '/Users/thiagomarzagao/Desktop/HTML/'
primeiro_id = 0 # ID of the first auction
ultimo_id = 48355 # ID of the last auction (as of 12/18/14)
@thiagomarzagao
thiagomarzagao / wordcount.py
Last active November 29, 2022 06:32
This Python code creates a word-frequency matrix for every txt file in the specified input folder ('ipath'). It removes all special characters ($, %, #, etc) and all numbers, but keeps all accented characters (Ñ, á, ç, etc). It also removes proper nouns, in a probabilistic way (if all occurrences of the word in the text are capitalized, the word…
### GENERATE WORD-FREQUENCY MATRICES
### author: Thiago Marzagao
### contact: marzagao ddott 1 at osu ddott edu
### supported encoding: UTF8
### supported character sets:
### Basic Latin (Unicode 0-128)
### Latin 1 Suplement (Unicode 129-255)
### Latin Extended-A (Unicode 256-382)
@thiagomarzagao
thiagomarzagao / wordscores.py
Created February 11, 2016 17:41
Wordscores in Python
### WORDSCORES (LBG-2003)
### author: Thiago Marzagao
### contact: marzagao ddott 1 at osu ddott edu
import os
import numpy as np
import pandas as pd
ipath = '/Users/username/inputdata/' # folder containing the CSV files
opath = '/Users/username/outputdata/' # folder where output will be saved
date closing_price
2009-12-11 6873.0
2009-12-16 6820.0
2009-12-17 6660.0
2009-12-22 6700.0
2009-12-28 6790.0
2010-01-04 6937.0
2010-01-06 7030.0
2010-01-08 6948.0
2010-01-12 6970.0
DATAPR PREULT REAIS QUATOT PREMIN PREMED PREMAX
2008-12-02 3539.0 2681997000 759900 3491.0 3529.0 3581.0
2008-12-03 3531.0 1150036700 330600 3390.0 3478.0 3540.0
2008-12-04 3570.0 455416800 128400 3515.0 3546.0 3600.0
2008-12-05 3535.0 667119600 193700 3411.0 3444.0 3535.0
2008-12-08 3846.0 385729500 103100 3695.0 3741.0 3846.0
2008-12-09 3754.0 1044938600 274500 3754.0 3806.0 3882.0
2008-12-10 3903.0 936798900 241100 3861.0 3885.0 3968.0
2008-12-11 3889.0 645127100 164200 3866.0 3928.0 3983.0
2008-12-12 3911.0 244912200 64400 3700.0 3802.0 3928.0
date threshold
2009-12-10 1889539003
2010-04-29 2497655431
2011-02-09 3429210097
2011-08-08 6281037175
2011-10-27 6528943764
2012-02-07 11413915245
2012-04-10 16474209620
2012-05-24 10814820083
2012-08-29 9909941384
@thiagomarzagao
thiagomarzagao / portfolio.py
Last active January 28, 2021 00:22
portfolio analyzer (computes Sharpe ratio, alpha, beta, etc; it also suggests allocation based on efficient frontier)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from datetime import datetime
from pandas_datareader import data as wb
from pypfopt.risk_models import CovarianceShrinkage
from pypfopt.expected_returns import mean_historical_return
from pypfopt import plotting
from pypfopt import objective_functions
@thiagomarzagao
thiagomarzagao / ibov.csv
Created January 14, 2021 21:00
IBOV, 1968-01-02 a 2020-12-23
DATAPR PREULT
1968-01-02 0.000000000100000
1968-01-03 0.000000000100000
1968-01-04 0.000000000099000
1968-01-05 0.000000000097000
1968-01-08 0.000000000097000
1968-01-09 0.000000000098000
1968-01-10 0.000000000097000
1968-01-11 0.000000000100000
1968-01-12 0.000000000101000
@thiagomarzagao
thiagomarzagao / cnaes_diario.csv
Created November 13, 2020 11:16
freqüência dos CNPJs de cada CNAE na seção 1 do Diário Oficial da União, jan/2002-jun/2020
cnae count descricao
47717 156954 Comércio varejista de produtos farmacêuticos para uso humano e veterinário
94308 92818 Atividades de associações de defesa de direitos sociais
90019 56544 Artes cênicas, espetáculos e atividades complementares
80111 50875 Atividades de vigilância e segurança privada
59111 36380 Atividades de produção cinematográfica, de vídeos e de programas de televisão
84116 34365 Administração pública em geral
08100 23624 Extração de pedra, areia e argila
21211 22966 Fabricação de medicamentos para uso humano
86101 22860 Atividades de atendimento hospitalar
@thiagomarzagao
thiagomarzagao / cnpj_to_b3.csv
Created August 19, 2020 12:42
CNPJ / código de negociação na B3 / registro na B3 cancelado?
cnpj codigo_B3 registro_cancelado
04.895.728/0001-80 EQPA3 False
91.983.056/0001-69 KEPL3 False
00.924.429/0001-75 VSPT4 False
17.167.396/0001-69 RPAD5 False
09.041.168/0001-10 LOGG3 False
10.215.988/0001-60 LCAM3 False
02.328.280/0001-97 EKTR4 False
00.000.000/0001-91 BBAS3 False
02.762.124/0001-30 BETP3B False