Skip to content

Instantly share code, notes, and snippets.

@sergiolucero
Created January 20, 2022 04:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sergiolucero/698abda2c7dba57778fe8b94538a4485 to your computer and use it in GitHub Desktop.
Save sergiolucero/698abda2c7dba57778fe8b94538a4485 to your computer and use it in GitHub Desktop.
extracción iniciativas constituyentes
import pickle, requests, time
from bs4 import BeautifulSoup
import pandas as pd
def tit(html):
tt0=BeautifulSoup(html,'lxml').find_all('meta', attrs={'property': 'og:title'})[0]
return tt0['content']
def apo(html):
idx = html.index('Cuenta con')
cola = html[(idx+29):]
jdx = cola.index('<')
return int(cola[:jdx].replace('.',''))
####################################################################################
url = 'https://iniciativas.chileconvencion.cl/m/iniciativa_popular/'
ubs = BeautifulSoup(requests.get(url).text,'lxml')
lbs = ubs.find_all('a')
lbs = sorted(set([link['href'] for link in lbs]))
print('LINKS:', len(lbs))
props = {}
t0 = time.time()
for ix, link in enumerate(lbs):
if 'o/' in link:
if ix%100==50:
print(lurl, ix, round(time.time()-t0,2))
lt = requests.get(url+link).text
props[link[2:]] = lt
pdf = pd.DataFrame(dict(id=props.keys(), texto=props.values()))
pdf['apoyos'] = pdf.texto.apply(apo)
pdf['titulo'] = pdf.texto.apply(tit)
top20 = pdf.sort_values('apoyos',ascending=False).head(20)[['titulo','apoyos','id']]
@sergiolucero
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment