-
-
Save ivan/411e75128eb22f4a278a87f98a58ef74 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash | |
# Download a podcast episode from anchor.fm | |
# | |
# Usage: | |
# grab-anchor-episode "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-from-the-Life-and-Death-of-the-Integral-Center-e31val" # (m4a example) | |
# grab-anchor-episode "https://anchor.fm/free-chapel/episodes/Are-You-Still-In-Love-With-Praise--Pastor-Jentezen-Franklin-e19u4i8" # (mp3 example) | |
# | |
# anchor.fm serves a list of m4a or mp3 files that need to be concatenated with ffmpeg. | |
# | |
# For debugging, uncomment: | |
# set -o verbose | |
set -eu -o pipefail | |
url=$1 | |
json=$(curl -sL "$url" | grep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g') | |
ymd=$(echo -E $json | jq -r '.episodePreview.publishOn' | cut -d 'T' -f 1) | |
extension=$((echo -E $json | jq -r '.[].episodeEnclosureUrl' | grep -F --max-count=1 :// | grep -oP '\.[0-9a-z]+$' | cut -d . -f 2) || echo m4a) | |
output_basename=$ymd-$(basename -- "$url").$extension | |
if [[ -f "$output_basename" ]]; then | |
echo "$output_basename already exists; skipping download" | |
exit | |
fi | |
temp_dir="$(mktemp -d)" | |
cd "$temp_dir" | |
audio_urls=$(echo -E $json | jq -r '.station.audios|map(.audioUrl)|.[]') | |
for i in $audio_urls; do | |
output_file=$(basename -- "$i") | |
wget "$i" -O "$output_file" | |
echo "file '$output_file'" >> .copy_list | |
done | |
ffmpeg -f concat -safe 0 -i .copy_list -c copy "$output_basename" | |
cd - | |
mv "$temp_dir/$output_basename" ./ | |
rm -rf "$temp_dir" |
On a Mac, brew install grep
and use ggrep in the script.
Recommend you add in:
extension="$(jq '.[].episodeEnclosureUrl' <<< "$json" | grep "://" | sed 's|.*\.\(.*\)\"|\1|g')"
if [[ ! "$extension" ]]; then
extension="mp4"
fi
after ymd and replace the subsequent line with:
output_basename=$ymd-$(basename -- "$url")."$extension"
This will allow for file extension detection in the event that there is a different filetype (like mp3). The script wasn't working for me in that case. it defaults to .mp4 here but you could change the default.
This works for the podcast I was downloading: https://anchor.fm/free-chapel
@shillshocked thanks! I just fixed the script to support mp3 based on your use of jq
. Please let me know if there are issues.
Hello, thank you to share this.
I does interest me, but i don't know how to use it ? I download the script, and i'm under windows.
If i type : grab-anchor-episode "https.." it says it is not a command
Can you help me ?
@Nikubik Hi, I tested this only on Linux, though someone above reports success on macOS when the system grep
is replaced.
On Windows, you might be able to make this work in Cygwin, though I have not tested it. Install jq
and wget
through Cygwin; install ffmpeg
by other means, e.g. chocolatey.
You might need someone's assistance if you are not familiar with the command line. wget
the raw gist URL and then remember to chmod +x
the downloaded script. Rename with mv
to grab-anchor-episode
and use ./grab-anchor-episode "URL"
.
If you find an easier way to download things on anchor.fm, let us know here. Also note that anchor.fm content will typically be syndicated to podcasting platforms that serve proper, contiguous audio files, from where it may be easier to download them.
Oh, thank you with your complete answer. I look into that tomorrow.
Can't think why its complaining but .copy_list: No such file or directory
If there is no .copy_list
, the issue is that it did not find any audio_urls
.
You can add some debug prints e.g. echo -E $json
before audio_urls=$(echo -E $json | jq -r '.station.audios|map(.audioUrl)|.[]')
if you would like to investigate the JSON.
Which URL causes that?
I have a 2 step hybrid solution on Mac that I think is a bit easier, only requires ggrep (via brew install ggrep
)
From terminal, insert your anchor url into the following code and run curl -sL "<insert-url-here>" | ggrep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g'
Cmd-F the printed output for "episodeEnclosureUrl":
, and copy the string that follows it (e.g. "https: ...")
Replace any \u002F
in that string with /
, and paste the resultant url into your web browser. Then click the 3 dots and click download!
The whole point of the script is to deal with anchor.fm's multi-file serving: for many podcasts, anchor publishes audio as multiple files that need to be concatenated. I believe the segments are split up as they were originally edited using their software.
Great point. I guess my solution is only useful for single-file podcasts from anchor.fm
Thank you so much for this!
In my case, I'm trying to download all episodes of a certain podcast. After poking around this one for a bit, I found that the Json returned by the curl request contains Urls for other episodes. (Maybe all of them? seems like it was in my case)
For those looking to do the same, here is a helper script that works with this one
./grab-all-anchor-episodes.sh
#!/bin/bash
# Downloads all(?) episodes from a podcaster
# Usage:
# ./grab-all-anchor-episodes.sh "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-from-the-Life-and-Death-of-the-Integral-Center-e31val"
#
# Must be run from same directory as ./grab-anchor-episodes.sh
# URL from an episode seems to contain information about other episodes too
# writes JSON to file in /tmp and iterates through each 'shareLinkPath' and writes to urlList
#
# Runs ./grab-anchor-episode.sh for each URL in list
#
#
url=$1
echo $url
json=$(curl -sL "$url" | grep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g')
echo $json > /tmp/json
python3 - <<END
import os
import json
data = open("/tmp/json", "r")
file = json.load(data)
for url in file['episodePreview']['episodes']:
urlPath = "https://anchor.fm%s" % url['shareLinkPath']
os.system("echo %s >> /tmp/urlList" % urlPath)
END
urlList=$(cat /tmp/urlList)
for url in $urlList
do
./grab-anchor-episode.sh $url
done
#cleanup
rm /tmp/json
rm /tmp/urlList
@Potatrix Did the script actually produce incorrect audio files? If it needs to be fixed, it would really help to have the URL for testing.
This seems to have stopped working at some point. I have the latest version, and anything I try to download, even the provided examples, just results in .copy_list: No such file or directory
.
The command I used: bash grab-anchor-episode.sh "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-fr om-the-Life-and-Death-of-the-Integral-Center-e31val"
With the help of a friend, I've managed to modify the script so that it works again (at least for my purposes). I've forked it here: https://gist.github.com/viocar/a6b6a0f485b3f400b8bcb0f8334b454d
@ivan the script downloaded the audio files fine. Sometimes it didn't convert to mp3 but wasn't really an issue for me. I had a task to download all of the recordings for anchor podcast I manage and needed a quick way to download all of them which is why I made the modification
@viocar I notice a space in your URL but I assume it wasn't like this when you tried to run the script?
No, I tried several URLs that I copied directly from my browser. I'm not sure why there's a space in my post.
Please any one could suggest me script code for tracking anchor.fm podcast audio in Tag Manager tools ?
Where can I change the output location, sorry, I am new to linux
Thanks, man!