Skip to content

Instantly share code, notes, and snippets.

@kowalcj0
Created October 26, 2018 23:28
Show Gist options
  • Save kowalcj0/89c8d91f7bbdc3642a1a0a065f89cf49 to your computer and use it in GitHub Desktop.
Save kowalcj0/89c8d91f7bbdc3642a1a0a065f89cf49 to your computer and use it in GitHub Desktop.
Extract links to music websites from Mastodon's outbox.json and download them with youtube-dl in parallel with covers and metadata
#! /usr/bin/python
"""Extract links to music websites from Mastodon's outbox.json
outbox.json contains all of your toots
"""
import json
from bs4 import BeautifulSoup as Soup
def extract_music_urls():
with open("outbox.json") as f:
j = json.loads(f.read())
prefixes = ("https://youtu.be", "https://youtube.com",
"https://www.youtube.com", "https://soundcloud.com",
"https://m.soundcloud.com", "https://vimeo.com")
links = []
for m in j["orderedItems"]:
if "content" in m["object"]:
html = Soup(m["object"]["content"], 'html.parser')
hrefs = [a['href'] for a in html.find_all('a') if a['href'].startswith(prefixes)]
if hrefs:
links.append(hrefs[0])
return sorted(links)
if __name__ == "__main__":
urls = extract_music_urls()
print("\n".join(urls))
@kowalcj0
Copy link
Author

to get rid of some special chars from filenames you can use iconv:
find . -type f -exec bash -c 'mv "$1" "${1%/*}/$(iconv -f UTF8 -t ASCII//TRANSLIT <<< ${1##*/})"' -- {} \;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment