Skip to content

Instantly share code, notes, and snippets.

@xflr6
Last active May 8, 2024 07:40
Show Gist options
  • Save xflr6/759737dc06b290a009352d3307782a2b to your computer and use it in GitHub Desktop.
Save xflr6/759737dc06b290a009352d3307782a2b to your computer and use it in GitHub Desktop.
Download all available audio books from ICE portal
"""Download all available audio books from DB ICE Portal."""
import json
import os
import urllib.parse
import urllib.request
BASE = 'http://iceportal.de/api1/rs/'
def load_json(url: str, *, verbose: bool = True):
if verbose:
print(url)
with urllib.request.urlopen(url) as f:
doc = json.load(f)
return doc
def get_page(href: str, *,
base: str = urllib.parse.urljoin(BASE, 'page/')):
url = urllib.parse.urljoin(base, href.lstrip('/'))
return load_json(url)
def retrieve(source, target, *,
base: str = urllib.parse.urljoin(BASE, 'audiobooks/path/')) -> None:
sheet = urllib.parse.urljoin(base, source.lstrip('/'))
path = load_json(sheet)['path']
url = urllib.parse.urljoin(base, path)
urllib.request.urlretrieve(url, filename=target)
audiobooks = get_page('hoerbuecher')
for group in audiobooks['teaserGroups']:
for item in group['items']:
print('', item['title'], sep='\n')
page = get_page(item['navigation']['href'])
dirname = page['title']
# fix invalid
dirname = dirname.replace('.', '_')
for remove_char in ('"', '?', '&', '/', '|'):
dirname = dirname.replace(remove_char, '')
dirname, _, _ = dirname.partition(':')
if not os.path.exists(dirname):
os.makedirs(dirname)
for file in page['files']:
url = file['path']
target = os.path.join(dirname,
'{:d} - {}'.format(file['serialNumber'],
url.rpartition('/')[2]))
if not os.path.exists(target):
retrieve(url, target)
@xflr6
Copy link
Author

xflr6 commented Jul 31, 2021

Thanks. Updated, fingers crossed :)

@FrankCarius
Copy link

Working fine. Hint: Check the python version to make that 3.x is used. Otherwise you get a syntax error in line 10

@BoKa33
Copy link

BoKa33 commented Nov 26, 2021

look like downloading movies is not so easy? do you know any possibility?

@xflr6
Copy link
Author

xflr6 commented Nov 30, 2021

@BoKa33: nope, no experience

@FrankCarius
Copy link

FrankCarius commented May 18, 2022

Just tried that and work but not, if the filename contains a "pipe" or "ampersand". or "forward slash"
So simply add some more replaces in line 40

dirname = dirname.replace('.', '').replace('"', '').replace('?', '').replace('&', '').replace('|', '').replace('/', '')

@xflr6
Copy link
Author

xflr6 commented May 22, 2022

if the filename contains a "pipe" or "ampersand". or "forward slash"
So simply add some more replaces in line 40

Thanks, adapted.

@contrequarte
Copy link

Nice work! As it seems, when downloading podcasts, only one episode is downloaded, as the naming convention for podcast episodes is different compared to audiobooks. Therefore I've added the serial number contained in the JSON to the filename used to save locally. I've added these changes to my fork, as I didn't know, if this behaviour was intended by your code. (If not please feel free to add it.)

@xflr6
Copy link
Author

xflr6 commented Nov 19, 2023

Thanks @contrequarte, adapted so that the file names now always start with the serial number.

@ActionLuzifer
Copy link

Hi! i used this script yesterday, it works for quite a while.
But then i saw the behaviour that a file was downloaded, it's size was shrinked to zero, redownloaded, shrinked to zero, redownloaded, .... . It was more or less an endless loop until the wifi connection itself got lost.
Then i debugged and saw that this behaviour was in the line urllib.request.urlretrieve(url, filename=target) in the retrieve-function.

Did someone else saw this behaviour and/or has an idea how to stop that?
Could it be that the urlretrieve got a redirect while it's loading, does a redownload, got a redirect, does a redownload and so on?
Is there a parameter for this function which would trigger to ignore such redirects/redownloads, or an other internal function which does more or less the same?
I would be happy if this urlretrieve would throw an expection/returns with an error code if this happens, so script could catch that and download the remaining files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment