Skip to content

Instantly share code, notes, and snippets.

@cyber-murmel
Last active October 20, 2018 08:30
Show Gist options
  • Save cyber-murmel/22509287501a0d05310204a28f78dfe7 to your computer and use it in GitHub Desktop.
Save cyber-murmel/22509287501a0d05310204a28f78dfe7 to your computer and use it in GitHub Desktop.
Using bash and youtube-dl to scrape bandcamp

Usecase

Let's say there is a band on bandcamp you like and you would like to listne to their music even if zou are online. You may already know youtube-dl, but it's tedeous to download all the albums individually.

Code

First let's define a function that returns all the links to every album of one band.

bandcamp_album_links () {
        BAND=$(echo $1 | grep -Po 'https?://\w+\.bandcamp\.com')
        for ALBUM in $(curl -s $BAND/music | grep -Po '/album/[\w-]+')
        do
                echo $BAND$ALBUM
        done
}

The first line of the function uses a regex to strip the link down to the band link. So something like https://coolband.bandcamp.com/track/cooltrackname or https://coolband.bandcamp.com/album/coolepname becomes https://coolband.bandcamp.com/.

The second line curls the website containing all album links of the band and filters for them. The body of the for loop then concats the band link and the album part.

You can put the function into your .bashrc (or in my case the .zshrc).

Downloading

We can now use the function to download all the band's albums.

mkdir -p Music/coolband
for link in $(bandcamp_album_links 'https://gusgusiceland.bandcamp.com/')
do
  echo $link
  youtube-dl --download-archive .archive.txt --format bestaudio --output '%(playlist_title)s/%(playlist_index)s - %(title)s.%(ext)s' "$link"
done

youtube-dl flags explained

--download-archive .archive.txt

The .archive.txt will be a hidden file listing all the files that already have been downloaded. This is useful in case the band releases a new album. You can simply run the whole command again and youtube-dl will know which files it has to download and which files are already there.

--format bestaudio

This tells youtube-dl to use the best audio format available.

--output '%(playlist_title)s/%(playlist_index)s - %(title)s.%(ext)s'

This is the output path format string. It makes youtube-dl create a directory for evey album and prefix each file with a playlist index, so that the files can be sorted in the way the creator wanted them to be sorted (e.g. some albums have smooth transistions between tracks)

@trilader
Copy link

It should be BAND=$(echo $1 | grep -Po 'https?://[\w-]+\.bandcamp\.com') instead as - can be in the artist name part of the domain and without it fails silently if that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment