Skip to content

Instantly share code, notes, and snippets.

@gabrielacaesar
Created February 25, 2018 11:07
Show Gist options
  • Save gabrielacaesar/0299125ba5346a6e6a86c9db684a71f2 to your computer and use it in GitHub Desktop.
Save gabrielacaesar/0299125ba5346a6e6a86c9db684a71f2 to your computer and use it in GitHub Desktop.
limpeza-temer
import re
s = agenda2016_limpa_final["oque"][0]
df = pd.DataFrame(dict(oque=[s,s]))
titles = ['Deputado Federal ', 'General ', 'Ex-presidente ', 'Senadora ', 'Senador ', 'do Exército', 'Tenente-Brigadeiro']
novas_linhas = []
for _, row in df.iterrows(): #o underline eh pra ignorar um retorno do iterrows
nova_linha = [row["oque"], row["onde"], row["ano"], row["mes"], row["dia"], row["hora"]]
pessoas = [re.sub('|'.join(titles),'',i.split(', ')[0]) for i in x.split('; ')]
for pessoa in pessoas:
linha_com_pessoa = nova_linha[:].append(pessoa) # o lista[:] eh para copiar a lista
novas_linhas.append(linha_com_pessoa)
df_novo = pd.DataFrame(novas_linhas), colums=["oque", "onde", "ano", "mes", "dia", "hora", "pessoa"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment