Skip to content

Instantly share code, notes, and snippets.

@mdamien
Last active October 29, 2020 12:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mdamien/70ed630f89539d98c5b2dd81385ae999 to your computer and use it in GitHub Desktop.
Save mdamien/70ed630f89539d98c5b2dd81385ae999 to your computer and use it in GitHub Desktop.
Parse les annotations des videos de l'Assemblée Nationale
var links = document.querySelectorAll('.mediaIndex a')
var player = document.querySelector('#html5_player')
var results = []
var i = 0;
function next() {
if (links[i]) {
links[i].click()
setTimeout(() => {
var result = [player.currentTime, links[i].innerText, links[i].className];
results.push(result)
console.log(i, result)
i += 1
next()
}, 1000)
} else {
document.body.innerText = JSON.stringify(results)
}
}
next()
import json, time, sys
import datetime
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
driver = webdriver.Chrome()
driver.get(sys.argv[1])
time.sleep(20)
driver.execute_script(open('parse.js').read())
while True:
content = driver.find_element_by_tag_name('body').get_attribute('innerText')
try:
content = json.loads(content)
break
except:
time.sleep(2)
transformed = []
prev_time = None
for el in content:
new_el = {}
new_el['time'] = str(datetime.timedelta(seconds=el[0])).split('.')[0]
if new_el['time'] == prev_time:
continue
prev_time = new_el['time']
new_el['titre'] = el[1]
new_el['level'] = int(el[2].split('level')[1])
transformed.append(new_el)
print(json.dumps(transformed, indent=2, ensure_ascii=False))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment