Skip to content

Instantly share code, notes, and snippets.

@briatte
Last active November 5, 2018 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save briatte/9f36c55402aabca1cf47a286a66a0b29 to your computer and use it in GitHub Desktop.
Save briatte/9f36c55402aabca1cf47a286a66a0b29 to your computer and use it in GitHub Desktop.
Python 3 code to archive franceculture.com radio shows as "YYYY-MM-DD-FRCULTURE-[Show] - [Title].mp3" files; usage: frculture <URL>. Only one URL at a time for now, and no progress bar, but downloads are usually very quick.
# !/usr/local/bin/python3
# coding: utf8
from bs4 import BeautifulSoup # pip install BeautifulSoup4
import os
import re
import sys
import urllib.request
u = sys.argv[1]
p = BeautifulSoup(urllib.request.urlopen(u), 'html.parser')
# compose filename
# ----------------
# part of page to scrape
b = p.find('button', attrs = {'class': 'replay-button'})
# show title
t = b.get('title').strip().replace('Réécouter ', '')
t = re.sub(':|/', '-', t)
# show name
n = b.get('data-asset-surtitle')
# show date
d = b.get('data-asset-source')
d = re.sub('(.*?)(\d{2}).(\d{2}).(\d{4})(.*)', '\\4-\\3-\\2', d)
# filename
f = d + "-FRCULTURE-" + n + " - " + t + '.mp3'
# get URL
# -------
# show file (.mp3)
u = b.get('data-asset-source')
print('[<<] ' + u)
print('[>>] ' + f)
print('[in] ' + os.getcwd())
# download
# --------
h = urllib.request.urlopen(u)
with open(f, 'wb') as output:
output.write(h.read())
# kthxbye
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment