Skip to content

Instantly share code, notes, and snippets.

@voltagex
Last active October 2, 2021 03:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save voltagex/4f0188fbb74cb7568438aaab565471b4 to your computer and use it in GitHub Desktop.
Save voltagex/4f0188fbb74cb7568438aaab565471b4 to your computer and use it in GitHub Desktop.
Podcast tracking

The seedy world of podcast tracking

Warning: this post contains bad shell scripts.

I recently wanted to listen to a single episode of a podcast without adding it to any extra software, so I grabbed the URL and went to play it in my browser. Then, I looked it again and noticed something strange - it started with: https://pdst.fm/e/dts.podtrac.com/redirect.mp3/traffic.omny.fm/ So, while the podcast played in the background, I loaded it up in curl and fiddled with it a bit until I saw the following fly by as Location: headers.

Each setting their own cookies or query strings, presumably building up a profile of me - I'm sure I've got a few podcasts in my reader that share tracking networks or CDNs.

I feel like we got browser adblocking pretty well sorted, then ignored the boom in podcasting and the associated growth in tracking and profiling.

Let's have a look at the most popular (?) podcasts as of the writing of this (2021-07-07). Data is from https://www.podcastinsights.com/top-us-podcasts/ which seems reasonable, and fuck trying to get this directly from Apple. Crime Junkie - https://feeds.megaphone.fm/ADL9840290619

Grabbing a random episode: https://pdst.fm/e/chtbl.com/track/95538/traffic.megaphone.fm/ADL2730811672.mp3

Oh look, there's some familiar hostnames! For this first one, I'll show every header, then for the rest I'll take only a few specific header.

Note: yes, HEAD seems to work with all of these servers.

$ curl -sLI "https://pdst.fm/e/chtbl.com/track/95538/traffic.megaphone.fm/ADL2730811672.mp3"
HTTP/2 302
date: Wed, 07 Jul 2021 12:59:21 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
location: https://chtbl.com/track/95538/traffic.megaphone.fm/ADL2730811672.mp3
via: 1.1 google

HTTP/2 302
content-type: text/html; charset=utf-8
content-length: 0
location: https://traffic.megaphone.fm/ADL2730811672.mp3
date: Wed, 07 Jul 2021 12:59:21 GMT
server: nginx/1.17.10
set-cookie: _chtbl=9a4ef399afbc4e379197ce0b8d30301a; Domain=.chtbl.com; Path=/
access-control-allow-origin: *
x-cache: Miss from cloudfront
via: 1.1 4b476a371465f0fceccbeec1ccfb91df.cloudfront.net (CloudFront)
x-amz-cf-pop: MEL50-C1
x-amz-cf-id: 20BCu-rH0fFsQwpwyxR0eEFbe2f1dtwyrU7eBey0k3pFgjRW7uzvTg==

HTTP/2 302
server: openresty/1.15.8.1
date: Wed, 07 Jul 2021 12:59:22 GMT
content-type: text/html; charset=utf-8
location: https://dcs.megaphone.fm/ADL2730811672.mp3?key=82c01e1275f5a434fb935b8af8159815
access-control-allow-headers: Origin, Content-Type, Accept, Authorization, Token
access-control-allow-methods: GET, OPTIONS, POST
access-control-allow-origin: *
access-control-max-age: 604800
x-request-id: 6024ea29608fe770af403e63493e4600
strict-transport-security: max-age=15724800; includeSubDomains

HTTP/1.1 200 OK
Date: Wed, 07 Jul 2021 12:59:23 GMT
Connection: Keep-Alive
Cache-Control: public,no-cache,no-store
Content-Length: 44444249
Content-Type: audio/mpeg
Accept-Ranges: bytes
X-Megaphone-Payload: e56aa860-ba68-11eb-85cd-332e78587b48#18144167@92842dfa-4e1f-11e8-a369-d375b89d2420
X-Megaphone-Payload-2: #0#pre#1##,#0#pre#2##,e56aa860-ba68-11eb-85cd-332e78587b48#18144166#mid#1#17554009#eadd5aa8-db48-11eb-9ef5-0f161621db05,#0#mid#2##,#0#post#1##,#0#post#2##@92842dfa-4e1f-11e8-a369-d375b89d2420
Vary: Origin
X-HW: 1625662763.dop023.sy2.t,1625662763.cds010.sy2.shn,1625662763.dop023.sy2.t,1625662763.cds013.sy2.c

This is probably not a bad one in the scheme of things, only 3 redirects between you and the content you actually wanted (well, probably a minute of ads for things you don't want, then the content you wanted).


Bonus rabbit hole: to get the RSS feed out of an Apple Podcast URL

echo https://podcasts.apple.com/us/podcast/drama-queens/id1571792783?at=1000ltXT | awk 'match($0,/\/(id)(.+)\?/,a){print a[2]}' | { read id; curl https://itunes.apple.com/lookup?id=$id; } | grep -oE feedUrl.+ | cut -d \" -f3

Simplifications welcome, email shellmisuse at voltagex.org


Drama Queens also uses Megaphone to host the RSS, starting to see a pattern here.

curl https://feeds.megaphone.fm/HSW3082189583 | grep enclosure | cut -d \" -f2 | head -n1
curl -sLI https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/5899E/traffic.megaphone.fm/HSW4664461431.mp3?updated=1625459697 | grep -iE -e "^location:" -e "HTTP" -e content-type: -e "Cookie:" -e "Server:"
HTTP/2 302
content-type: text/html; charset=utf-8
location: https://chtbl.com/track/5899E/traffic.megaphone.fm/HSW4664461431.mp3?updated=1625459697
server: Microsoft-IIS/10.0
HTTP/2 302
content-type: text/html; charset=utf-8
location: https://traffic.megaphone.fm/HSW4664461431.mp3?updated=1625459697
server: nginx/1.17.10
set-cookie: _chtbl=0775d36a68294e29bf82b6c4966ca69c; Domain=.chtbl.com; Path=/
HTTP/2 302
server: openresty/1.15.8.1
content-type: text/html; charset=utf-8
location: https://dcs.megaphone.fm/HSW4664461431.mp3?key=f5482eb516866e95bcc5d9120947f466
HTTP/1.1 200 OK
Content-Type: audio/mpeg

SmartLess - at least this uses a different host for the RSS, feeds.simplecast.com, and their own CDN - cdn.simplecast.com which happens to just be a domain in front of an S3 bucket! Nice and simple (for 2021)

HTTP/2 200
content-type: audio/mpeg
server: AmazonS3

(in this case, the feed URL itself doesn't even set cookies. I guess Amazon knows enough about you, anyway.)

The next podcast uses megaphone.fm which I am now bored of.

Morbid: A True Crime Podcast uses audioboom.com for the feed, but then the MP3 URL looks like

https://pdst.fm/e/chtbl.com/track/4E942/audioboom.com/posts/7898515.mp3?modified=1625434704&source=rss&stitched=1

I guess AudioBoom doesn't mind giving data to the competition (?)

HTTP/2 302
content-type: text/html; charset=utf-8
location: https://chtbl.com/track/4E942/audioboom.com/posts/7898515.mp3?modified=1625434704&source=rss&stitched=1
HTTP/2 302
content-type: text/html; charset=utf-8
location: https://audioboom.com/posts/7898515.mp3?modified=1625434704&source=rss&stitched=1
server: nginx/1.17.10
set-cookie: _chtbl=17735fdae968495f8fad1070791f65cf; Domain=.chtbl.com; Path=/
HTTP/2 302
content-type: text/html; charset=utf-8
x-content-type-options: nosniff
location: https://api.spreaker.com/v2/relay/audioboom/US_Morbid_A_True_Crime_Podcast/4997220/7898515.mp3?a=&b=IAB1-8%2CIAB1-9%2CIAB17-14%2CIAB17-18%2CIAB11%2CIAB12%2CIAB19%2CIAB23%2CIAB7-27&c=comedy&content_explicit=true&d=2021-07-04&e=one_min%3D962063%26pl%3Drss%26sub%3Dex&f=2063&h=%2Fattachments%2F38198041%2Flistener-tales-30.mp3&i=7898515-38198041&n=Morbid%3A+A+True+Crime+Podcast&o=66211523&p=1&pl=rss&r=128&s=sponsorship&t=2
set-cookie: test_cookie=1; path=/; expires=Sun, 07 Jul 2041 13:56:47 GMT; SameSite=Lax; secure
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v2?s=niBMjFnWCUpNwa5oZNGYGyWfLInATr1bBOl58%2BGpx%2BB1MNx%2BeWUI%2BfHHb699LfIHsK%2F7IdNAFwaxi8mPSkN0JJMoqLwROwUkoDSskZykRj3P06da6msX%2FSLFkeDOq4KsbRjTKJVx"}],"group":"cf-nel","max_age":604800}
server: cloudflare
HTTP/2 302
content-type: text/html; charset=utf-8
location: https://api.spreaker.com/v2/relay/audioboom/US_Morbid_A_True_Crime_Podcast/4997220/7898515.mp3?a=&b=IAB1-8%2CIAB1-9%2CIAB17-14%2CIAB17-18%2CIAB11%2CIAB12%2CIAB19%2CIAB23%2CIAB7-27&c=comedy&content_explicit=true&d=2021-07-04&e=one_min%3D962063%26pl%3Drss%26sub%3Dex&f=2063&h=%2Fattachments%2F38198041%2Flistener-tales-30.mp3&i=7898515-38198041&n=Morbid%3A+A+True+Crime+Podcast&o=66211523&p=1&pl=rss&r=128&s=sponsorship&t=2&sp_uuid=checked
server: Spreaker Proxy Cache
access-control-allow-headers: Authorization, Content-Type
set-cookie: spreaker_cid=b795df0a-589c-401d-ab5f-9f3bcdad1cf8; expires=Fri, 06-Aug-2021 13:56:48 GMT; Max-Age=2592000; path=/; domain=api.spreaker.com; HttpOnly
HTTP/2 302
content-type: text/html; charset=utf-8
location: https://dm4p36fbs3hl0.cloudfront.net/attachments/38198041/listener-tales-30.mp3?tenant=AUDIOBOOM&user_id=US_Morbid_A_True_Crime_Podcast&show_id=4997220&episode_id=7898515&response-content-disposition=attachment%3Bfilename%3D%22listener-tales-30.mp3%22&timestamp=1625666208&media_type=static&metadata=one_min%3D962063%26pl%3Drss%26sub%3Dex&Expires=1625752608&Signature=EA6zT2ZMpDp03YC4wdBgC4xaQ3jb4xIc5gTSKw5WcezW5fvHpNFGBwK73O18rUcDadcr4zgzpwyGil0x0jKOFsR2l0tqvZhCbLf%7E5gxsJeHbIke9Jp5p8DuIHuhV3hAWQJGkyV3vchXDPfDKh46%7EbuNdgAiLbtV2kom4OIWX%7Eg3456CYZA4Wamad84ipyG4MW7AMvQysFseUuoUn1YkgSQ89%7EHqwwRLyjtLExnmyp3pREO1iyjZ7ZUdrWO-nsr3roGuFT915N3ZXBM7i%7Enh4xf%7EZr6qx9-D%7ESNxCYfJY9blGcPbqSgOHEyBmDDymtWTbppSXa6Wy3Zp6Cq7gzYfVbQ__&Key-Pair-Id=APKAINDIVJ7TLFUAJI3A
server: Spreaker Proxy Cache
access-control-allow-headers: Authorization, Content-Type
HTTP/2 200
content-type: audio/mpeg
server: AmazonS3

All that just to get to a Cloudfront CDN. What's weirder is that Spreaker has the capability to generate an RSS feed directly - https://help.spreaker.com/en/articles/4258360-where-can-i-find-my-rss-feed but perhaps they're only using some features of Spreaker.

get_itunes ()
{
    echo $1 | awk 'match($0,/(\/id)(.+)\?/,a){print a[2]}' | {
        read id;
        curl https://itunes.apple.com/lookup?id=$id
    } | grep -oE feedUrl.+ | cut -d \" -f3
}

check_podcast ()
{
    curl -sLI $1 | grep -iE -e "^location:" -e "HTTP" -e content-type -e "Cookie:" -e "Server:"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment