Bostwickenator/chirp-audiobooks-downloader-readme.md

## chirp-audiobooks-downloader-readme.md

      
    Raw
  

              chirp-audiobooks-downloader-readme.md
            
          
    Chirp AudioBook Download Script

This script eases the process of downloading the audio files from Chirp Audiobooks.
It uses the browsers console to generate a list of URLs, and then provides a list of wget commands to download them.
Tested with Firefox + Terminal on MacOS, and Firefox + PowerShell on Windows 10.

As an aside, I want to give a shout out to Libro.fm for providing a simple download button for each  purchase. Then you don't need a script like this!

Instructions


Find the book in your Chirp Library.

If you've already listened to it, you may need to move it back from your Archive.


Click the book to open Chirp's web player.
Open the browser's Web Developer Tools.
Copy-paste the script.js contents into the console and press [enter].
Initiate the script:

If the book is already at the start, click Play (▶).
If the book is on any other track, open the Chapters menu (top left) and select the first Track.


Wait while the script advances through each track; it's saving the URLs in the background.

It may say "There was an error loading your audiobook, please reload the page." under the Play button, ignore this.
It may also show a number of URLs in red in the console, along with a warning after each one. Ignore these also.


When it reaches the final track, the script will show a list of commands on the screen in a white box.

Click once to highlight the complete list.
Copy-paste it to a command line (Terminal, Power Shell, etc.) and press [enter] to execute it.

Some command lines will begin executing immediately, however you still need to press [enter] to execute the final command.


(The commands are also printed to the browser console, but I've found that it can sometimes collapse longer lists, making it difficult to copy exactly what you want.)


Once the commands finish, you should have a new folder with a cover image and each of the tracks as .m4a files.

On macOS, type open . and press [enter] to view the files.
On Windows, type explorer . and press [enter] to view the files.


Check the file size of each track:

If any are 0 bytes, the download URL may have expired.

In that case, go through the process again, but in step 7, first paste the commands into a text editor and delete everything except for the ones to download the 0-byte files.


Merging m4a files

After completing the steps above you will have an audio file for each chapter. If you prefer to have a single m4a or m4b file for each title you can use main.py to create these. The script embeds the cover image and chapter markers into the resulting file.
ONLY TESTED ON WINDOWS
Prerequisites:

ffmpeg, and ffprobe. - https://ffmpeg.org/
python 3 - https://www.python.org/


Drag the folder containing your collection of m4a files onto the python script. on CLI use python main.py foldername
Wait. The script will build some intermediate files and log it's progress. ffmpeg needs to repack the files into a single stream this can take 10 minutes.
The script will create a file called {foldername}.m4a

Enjoy!

  
## main.py
import subprocess
import os
import sys
import re

def parse_book_info(title):
  """
  Parses a book title string and returns a dictionary with title, author, and narrator.

  Args:
    title: The book title string.

  Returns:
    A dictionary with keys "title", "author", and "narrator", containing the extracted information.
  """

  # Regular expressions for extracting title, author, and narrator
  title_regex = r"^(.*)- Writ"
  author_regex = r"ten by (.*) -"
  narrator_regex = r" Narrated by (.*)$"

  # Extract information using regex
  match = re.match(f"{title_regex}{author_regex}{narrator_regex}$", title)

  # Check if any information was found
  if not match:
    raise ValueError(f"Could not parse book information from title: '{title}'")

  # Extract and return information
  title = match.group(1).strip()
  author = match.group(2) if match.group(2) else None
  narrator = match.group(3) if match.group(3) else None

  return {"title": title, "author": author, "narrator": narrator}


title =""
def update_title_with_album(filepath):
  """
  Updates a file by replacing the remaining content after "title=" with the value
  of another line starting with "album=" while keeping the "title=" prefix and newline.

  Args:
    filepath: The path to the file to modify.

  Raises:
    FileNotFoundError: If the file cannot be found.
    ValueError: If either "title=" or "album=" lines are not found.
  """

  with open(filepath, "r") as file:
    lines = file.readlines()

  # Find the line indexes
  title_line_index = None
  album_line_index = None
  for i, line in enumerate(lines):
    if line.startswith("title="):
      title_line_index = i
    elif line.startswith("album="):
      album_line_index = i

  # Check if both lines were found
  if title_line_index is None or album_line_index is None:
    print(f"File '{filepath}' does not contain both required lines ('title=' and 'album=')")
    folderMeta = parse_book_info(title)
    lines.append(f"title={folderMeta['title']}\n")
    lines.append(f"author={folderMeta['author']}\n")
    lines.append(f"artist={folderMeta['author']}; {folderMeta['narrator']}\n")
    lines.append(f"album_artist=Narrated by {folderMeta['narrator']}\n")
  else:
    # Extract the title and album values
    album_value = lines[album_line_index][len("album="):]
    # Update the title line
    new_title_line = f"title={album_value}"
    # Replace the old title line with the updated one
    lines[title_line_index] = new_title_line

  # Save the changes to the file
  with open(filepath, "w") as file:
    file.writelines(lines)

def get_chapter_title(filepath):
  command = f'ffprobe -show_entries format_tags="title" -v quiet {filepath}'
  out = subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
  out = out.stdout.decode().splitlines()
  title_regex = r"^TAG:title=(.*)$"
  for i, line in enumerate(out):
    match = re.match(title_regex, line)
    if match:
      title = match.group(1)
      break
  if not match:
    title = filepath[filepath.rfind(start:='- ')+len(start):filepath.find('.m4a')]

  return title

def make_chapters_metadata(list_audio_files: list):
    print(f"Making metadata source file")

    chapters = {}
    count = 1
    for single_audio_files in list_audio_files:
        file_path = f'"{folder}\{single_audio_files}"'
        command = f'ffprobe -v quiet -of csv=p=0 -show_entries format=duration {file_path}'
        out = subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
        duration_in_microseconds = int((out.stdout.decode().strip().replace(".", "")))
        title = get_chapter_title(file_path)
        chapters[f"{count:04d}"] = {"duration": duration_in_microseconds, "title": title}
        count = count+1

    chapters["0001"]["start"] = 0
    for n in range(1, len(chapters)):
        chapter = f"{n:04d}"
        next_chapter = f"{n + 1:04d}"
        chapters[chapter]["end"] = chapters[chapter]["start"] + chapters[chapter]["duration"]
        chapters[next_chapter]["start"] = chapters[chapter]["end"] + 1
    last_chapter = f"{len(chapters):04d}"
    chapters[last_chapter]["end"] = chapters[last_chapter]["start"] + chapters[last_chapter]["duration"]

    metadatafile = f"{folder}\\combined.metadata.txt"
    command = f'ffmpeg -y -loglevel error -i "{folder}\{list_audio_files[0]}" -f ffmetadata "{metadatafile}"'
    subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
    update_title_with_album(metadatafile)

    with open(metadatafile, "a+") as m:
        for chapter in chapters:
            ch_meta = """
[CHAPTER]
TIMEBASE=1/1000000
START={}
END={}
title={}
""".format(chapters[chapter]["start"], chapters[chapter]["end"], chapters[chapter]["title"])
            m.writelines(ch_meta)
            print(ch_meta)


def concatenate_all_to_one_with_chapters():
    filename = f'{title}.m4a'
    print(f"Concatenating chapters to {filename}")
    metadatafile = f"{folder}\\combined.metadata.txt"
    cover = f"{folder}\\cover.jpg"
    os.system(f'ffmpeg -hide_banner -y -f concat -safe 0 -i list_audio_files.txt -i "{metadatafile}" -map_metadata 1 "{folder}\\i.m4a"')
    os.system(f'ffmpeg -i "{folder}\\i.m4a" -i "{cover}" -c copy -disposition:v attached_pic "{filename}"')
    os.remove(f'"{folder}\\i.m4a"')

if __name__ == '__main__':

    print(sys.argv)

    folder = sys.argv[1].replace('"','')
    title = os.path.split(folder)[-1].replace('"','')
    print(title)

    list_audio_files = [f for f in os.listdir(folder) if f.find(".m4a")>=0]
    list_audio_files.sort()

    if os.path.isfile("list_audio_files.txt"):
        os.remove("list_audio_files.txt")
    for filename_audio_files in list_audio_files:
        with open("list_audio_files.txt", "a") as f:
            line = f"file '{folder}\{filename_audio_files}'\n"
            f.write(line)

    make_chapters_metadata(list_audio_files)
    concatenate_all_to_one_with_chapters()

## script.js
const $ = document.querySelector.bind(document);
function filename(name) {
  return name.replaceAll('&', 'and').replaceAll(':', ' -').replaceAll(/[^a-z0-9 ._-]+/ig, '');
}

const title = filename($('h1.book-title').textContent);
const credits = [].slice.call(document.querySelectorAll('.credit'))
	.map(n => filename(n.textContent))
	.join(' - ');
const dirname = `${title} - ${credits}`;
const commands = [
	`mkdir "${dirname}"`,
	`cd "${dirname}"`,
	`wget -O "cover.jpg" "${$('.cover-image').src }"`
];

const tracks = [];

let count = 0;
function addUrl(url) {
	count += 1;
	const chapter = filename($('div.chapter').textContent);
	tracks.push({
		count,
		chapter,
		url
	})
}

function showCommands() {
	const padSize = tracks.length.toString().length;

	tracks.forEach(({count, chapter, url}) => {
		let trackNum = count.toString().padStart(padSize, "0");
		commands.push(`wget -O "${title} - ${trackNum} - ${chapter}.m4a" "${url}"`);
	})
	commands.push(`cd ..`);
	console.log(commands.join('\n'))

	const div = document.createElement('div');
	div.innerHTML = '<div style="position: absolute; top: 100px; left: 100px; z-index: 100000; background: white; padding: 10px;"><p>Copy these commands to PowerShell/Terminal/etc:</p><textarea id="dl-commands" style="min-height:20em; min-width:30em"></textarea></div>';
	document.body.appendChild(div);
	const textarea = document.querySelector('#dl-commands');
	textarea.value = commands.join('\n');
	textarea.onfocus = function(){this.select()};
}

function next() {
	const btn = $('button.next-chapter')
	if (btn.disabled) {
		showCommands()
	} else {
		btn.click();
	}
}

const audio = $('audio');
Object.defineProperty(audio, "src", {
  get() {
    return '';
  },
  set(url) {
    setTimeout(() => {
      addUrl(url);
      next();
    }, 500);
  },
});
	import subprocess
	import os
	import sys
	import re

	def parse_book_info(title):
	"""
	Parses a book title string and returns a dictionary with title, author, and narrator.

	Args:
	title: The book title string.

	Returns:
	A dictionary with keys "title", "author", and "narrator", containing the extracted information.
	"""

	# Regular expressions for extracting title, author, and narrator
	title_regex = r"^(.*)- Writ"
	author_regex = r"ten by (.*) -"
	narrator_regex = r" Narrated by (.*)$"

	# Extract information using regex
	match = re.match(f"{title_regex}{author_regex}{narrator_regex}$", title)

	# Check if any information was found
	if not match:
	raise ValueError(f"Could not parse book information from title: '{title}'")

	# Extract and return information
	title = match.group(1).strip()
	author = match.group(2) if match.group(2) else None
	narrator = match.group(3) if match.group(3) else None

	return {"title": title, "author": author, "narrator": narrator}


	title =""
	def update_title_with_album(filepath):
	"""
	Updates a file by replacing the remaining content after "title=" with the value
	of another line starting with "album=" while keeping the "title=" prefix and newline.

	Args:
	filepath: The path to the file to modify.

	Raises:
	FileNotFoundError: If the file cannot be found.
	ValueError: If either "title=" or "album=" lines are not found.
	"""

	with open(filepath, "r") as file:
	lines = file.readlines()

	# Find the line indexes
	title_line_index = None
	album_line_index = None
	for i, line in enumerate(lines):
	if line.startswith("title="):
	title_line_index = i
	elif line.startswith("album="):
	album_line_index = i

	# Check if both lines were found
	if title_line_index is None or album_line_index is None:
	print(f"File '{filepath}' does not contain both required lines ('title=' and 'album=')")
	folderMeta = parse_book_info(title)
	lines.append(f"title={folderMeta['title']}\n")
	lines.append(f"author={folderMeta['author']}\n")
	lines.append(f"artist={folderMeta['author']}; {folderMeta['narrator']}\n")
	lines.append(f"album_artist=Narrated by {folderMeta['narrator']}\n")
	else:
	# Extract the title and album values
	album_value = lines[album_line_index][len("album="):]
	# Update the title line
	new_title_line = f"title={album_value}"
	# Replace the old title line with the updated one
	lines[title_line_index] = new_title_line

	# Save the changes to the file
	with open(filepath, "w") as file:
	file.writelines(lines)

	def get_chapter_title(filepath):
	command = f'ffprobe -show_entries format_tags="title" -v quiet {filepath}'
	out = subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
	out = out.stdout.decode().splitlines()
	title_regex = r"^TAG:title=(.*)$"
	for i, line in enumerate(out):
	match = re.match(title_regex, line)
	if match:
	title = match.group(1)
	break
	if not match:
	title = filepath[filepath.rfind(start:='- ')+len(start):filepath.find('.m4a')]

	return title

	def make_chapters_metadata(list_audio_files: list):
	print(f"Making metadata source file")

	chapters = {}
	count = 1
	for single_audio_files in list_audio_files:
	file_path = f'"{folder}\{single_audio_files}"'
	command = f'ffprobe -v quiet -of csv=p=0 -show_entries format=duration {file_path}'
	out = subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
	duration_in_microseconds = int((out.stdout.decode().strip().replace(".", "")))
	title = get_chapter_title(file_path)
	chapters[f"{count:04d}"] = {"duration": duration_in_microseconds, "title": title}
	count = count+1

	chapters["0001"]["start"] = 0
	for n in range(1, len(chapters)):
	chapter = f"{n:04d}"
	next_chapter = f"{n + 1:04d}"
	chapters[chapter]["end"] = chapters[chapter]["start"] + chapters[chapter]["duration"]
	chapters[next_chapter]["start"] = chapters[chapter]["end"] + 1
	last_chapter = f"{len(chapters):04d}"
	chapters[last_chapter]["end"] = chapters[last_chapter]["start"] + chapters[last_chapter]["duration"]

	metadatafile = f"{folder}\\combined.metadata.txt"
	command = f'ffmpeg -y -loglevel error -i "{folder}\{list_audio_files[0]}" -f ffmetadata "{metadatafile}"'
	subprocess.run(command, shell=False, capture_output=True,cwd=os.getcwd())
	update_title_with_album(metadatafile)

	with open(metadatafile, "a+") as m:
	for chapter in chapters:
	ch_meta = """
	[CHAPTER]
	TIMEBASE=1/1000000
	START={}
	END={}
	title={}
	""".format(chapters[chapter]["start"], chapters[chapter]["end"], chapters[chapter]["title"])
	m.writelines(ch_meta)
	print(ch_meta)


	def concatenate_all_to_one_with_chapters():
	filename = f'{title}.m4a'
	print(f"Concatenating chapters to {filename}")
	metadatafile = f"{folder}\\combined.metadata.txt"
	cover = f"{folder}\\cover.jpg"
	os.system(f'ffmpeg -hide_banner -y -f concat -safe 0 -i list_audio_files.txt -i "{metadatafile}" -map_metadata 1 "{folder}\\i.m4a"')
	os.system(f'ffmpeg -i "{folder}\\i.m4a" -i "{cover}" -c copy -disposition:v attached_pic "{filename}"')
	os.remove(f'"{folder}\\i.m4a"')

	if __name__ == '__main__':

	print(sys.argv)

	folder = sys.argv[1].replace('"','')
	title = os.path.split(folder)[-1].replace('"','')
	print(title)

	list_audio_files = [f for f in os.listdir(folder) if f.find(".m4a")>=0]
	list_audio_files.sort()

	if os.path.isfile("list_audio_files.txt"):
	os.remove("list_audio_files.txt")
	for filename_audio_files in list_audio_files:
	with open("list_audio_files.txt", "a") as f:
	line = f"file '{folder}\{filename_audio_files}'\n"
	f.write(line)

	make_chapters_metadata(list_audio_files)
	concatenate_all_to_one_with_chapters()
	const $ = document.querySelector.bind(document);
	function filename(name) {
	return name.replaceAll('&', 'and').replaceAll(':', ' -').replaceAll(/[^a-z0-9 ._-]+/ig, '');
	}

	const title = filename($('h1.book-title').textContent);
	const credits = [].slice.call(document.querySelectorAll('.credit'))
	.map(n => filename(n.textContent))
	.join(' - ');
	const dirname = `${title} - ${credits}`;
	const commands = [
	`mkdir "${dirname}"`,
	`cd "${dirname}"`,
	`wget -O "cover.jpg" "${$('.cover-image').src }"`
	];

	const tracks = [];

	let count = 0;
	function addUrl(url) {
	count += 1;
	const chapter = filename($('div.chapter').textContent);
	tracks.push({
	count,
	chapter,
	url
	})
	}

	function showCommands() {
	const padSize = tracks.length.toString().length;

	tracks.forEach(({count, chapter, url}) => {
	let trackNum = count.toString().padStart(padSize, "0");
	commands.push(`wget -O "${title} - ${trackNum} - ${chapter}.m4a" "${url}"`);
	})
	commands.push(`cd ..`);
	console.log(commands.join('\n'))

	const div = document.createElement('div');
	div.innerHTML = '<div style="position: absolute; top: 100px; left: 100px; z-index: 100000; background: white; padding: 10px;"><p>Copy these commands to PowerShell/Terminal/etc:</p><textarea id="dl-commands" style="min-height:20em; min-width:30em"></textarea></div>';
	document.body.appendChild(div);
	const textarea = document.querySelector('#dl-commands');
	textarea.value = commands.join('\n');
	textarea.onfocus = function(){this.select()};
	}

	function next() {
	const btn = $('button.next-chapter')
	if (btn.disabled) {
	showCommands()
	} else {
	btn.click();
	}
	}

	const audio = $('audio');
	Object.defineProperty(audio, "src", {
	get() {
	return '';
	},
	set(url) {
	setTimeout(() => {
	addUrl(url);
	next();
	}, 500);
	},
	});