Skip to content

Instantly share code, notes, and snippets.

@sergiolucero
Created September 26, 2018 02:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sergiolucero/b60273d701b08212dcaac7deaa694738 to your computer and use it in GitHub Desktop.
Save sergiolucero/b60273d701b08212dcaac7deaa694738 to your computer and use it in GitHub Desktop.
denuncias CNTV
import requests
import pandas as pd
from bs4 import BeautifulSoup
# IDEAS: multas, elencos: http://es.teleserieschile.wikia.com/wiki/Categor%C3%ADa:Teleseries_de_Canal_13
bs=BeautifulSoup(requests.get('https://www.cntv.cl/cntv/site/tax/port/all/taxport_16___1.html').text,'html5')
links = bs.find_all('a')
denuncias=[l for l in links if 'Lo más' in l.text]
droot='https://www.cntv.cl'
df=pd.DataFrame()
for d in denuncias:
url = droot+d['href']
print(url)
ddf = pd.read_html(url)[0]
ddf['fecha']=d['href'].split('/')[-2]
df = df.append(ddf)
print(len(df))
df.head(10)
sdf=df.groupby('PROGRAMA').sum()
nd=sdf.columns[0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment