0xdevalias/music-apis-and-dbs.md

## music-apis-and-dbs.md

      
    Raw
  

              music-apis-and-dbs.md
            
          
    Music APIs and DBs

A collection of music APIs, databases, and related tools.
Table of Contents


Spotify for Developers
Audio Identification
Lyrics
Conferences, Journals, Research Papers, etc
Unsorted
Questions
See Also

My Other Related Deepdive Gist's and Projects


Spotify for Developers


https://developer.spotify.com/


Build with Spotify’s 100 million songs,
5 million podcasts and much more


https://developer.spotify.com/community


Community
Read the latest posts from the developer community, talk with fellow developers on our forum and check out our latest videos and events. Share your ideas and feedback or show off your projects. Get involved!


https://developer.spotify.com/documentation/web-api


Web API
Retrieve metadata from Spotify content, control playback or get recommendations
Spotify Web API enables the creation of applications that can interact with Spotify's streaming service, such as retrieving content metadata, getting recommendations, creating and managing playlists, or controlling playback.


https://developer.spotify.com/blog/2023-07-03-typescript-sdk


Introducing the TypeScript SDK for the Spotify Web API


We are excited to announce the launch of the Spotify TypeScript SDK for the Spotify Web API. This SDK makes it easy to build web applications that integrate with Spotify, and it is now available on GitHub and npm and can be used wherever you can run JavaScript.
The TypeScript SDK is a comprehensive library that provides access to all of the features of the Spotify Web API. It is easy to use, and it can be used to build a wide variety of web applications with data from Spotify; such as tracks, playlists, artists and more.
The TypeScript SDK is open source, and it is licensed under the Apache License. This means that you can use it in your own projects, and you can contribute to its development.


https://github.com/spotify/spotify-web-api-ts-sdk


Spotify Web API SDK - TypeScript


A Typescript SDK for the Spotify Web API with types for returned data.


https://developer.spotify.com/documentation/web-api/concepts/spotify-uris-ids


Spotify URIs and IDs
In requests to the Web API and responses from it, you will frequently encounter the following parameters


https://developer.spotify.com/documentation/web-api/reference/get-recommendations


Get Recommendations
Recommendations are generated based on the available information for a given seed entity and matched against similar artists and tracks. If there is sufficient information about the provided seeds, a list of tracks will be returned together with pool size details.
For artists and tracks that are very new or obscure there might not be enough data to generate a list of tracks.


https://developer.spotify.com/documentation/web-api/reference/get-recommendation-genres


Get Available Genre Seeds
Retrieve a list of available genres seed parameter values for recommendations.


https://developer.spotify.com/documentation/web-api/reference/search


Search for Item
Get Spotify catalog information about albums, artists, playlists, tracks, shows, episodes or audiobooks that match a keyword string. Audiobooks are only available within the US, UK, Canada, Ireland, New Zealand and Australia markets.


https://developer.spotify.com/documentation/web-api/reference/add-tracks-to-playlist


Add Items to Playlist
Add one or more items to a user's playlist.


https://developer.spotify.com/documentation/web-api/reference/get-audio-features


Get Track's Audio Features
Get audio feature information for a single track identified by its unique Spotify ID.


https://developer.spotify.com/documentation/web-api/reference/get-audio-analysis


Get Track's Audio Analysis
Get a low-level audio analysis for a track in the Spotify catalog. The audio analysis describes the track’s structure and musical content, including rhythm, pitch, and timbre.


https://developer.spotify.com/documentation/web-api/reference/get-an-artist


Get Artist
Get Spotify catalog information for a single artist identified by their unique Spotify ID.


https://developer.spotify.com/documentation/web-api/reference/get-an-artists-top-tracks


Get Artist's Top Tracks
Get Spotify catalog information about an artist's top tracks by country.


https://developer.spotify.com/documentation/web-api/reference/get-an-artists-related-artists


Get Artist's Related Artists
Get Spotify catalog information about artists similar to a given artist. Similarity is based on analysis of the Spotify community's listening history.


https://developer.spotify.com/documentation/web-api/reference/get-available-markets


Get Available Markets
Get the list of markets where Spotify is available.


https://developer.spotify.com/documentation/web-api/reference/get-a-category


Get Single Browse Category
Get a single category used to tag items in Spotify (on, for example, the Spotify player’s “Browse” tab).


https://developer.spotify.com/documentation/web-playback-sdk


Web Playback SDK
Create a new player and stream Spotify content inside your website application.
The Web Playback SDK is a client-side only JavaScript library designed to create a local Spotify Connect device in your browser, stream and control audio tracks from Spotify inside your website and get metadata about the current playback.


https://developer.spotify.com/documentation/embeds


Embeds
Add audio to your own website.
Embeds help you share interactive content from Spotify on a website that you control. Using an Embed, you can add a podcast, an album, or other audio content to your website that your users can listen to. It's a great way to promote your music, share your new podcast episodes with fans, or highlight your favorite album or playlist.


https://medium.com/swlh/creating-waveforms-out-of-spotify-tracks-b22030dd442b


Creating Waveforms Out of Spotify Tracks
Using the Spotify Platform’s Audio Analysis Endpoint


https://gist.github.com/leemartin/ba839f48a353da1c9eadfa1309b85bc5#file-spotify-waveform-js


Spotify Waveform Data Generation from Audio Analysis API


https://codepen.io/leemartin/pen/GRqgPEe


Spotify Waveform


Audio Identification


https://www.whosampled.com/


Discover Music through Samples, Cover Songs and Remixes
Dig deeper into music by discovering direct connections among over 1,025,000 songs and 316,000 artists, from Hip-Hop, Rap and R&B via Electronic / Dance through to Rock, Pop, Soul, Funk, Reggae, Jazz, Classical and beyond.
WhoSampled's verified content is built by a community of over 33,000 contributors. Make contributions to earn Cred  - our very own points system.


https://alternativeto.net/software/whosampled/


WhoSampled Alternatives
Music Recognition Apps like WhoSampled


https://secondhandsongs.com/


SecondHandSongs is building the most comprehensive source of cover song information, by means of a database of originals, cover songs, sampled songs and sampling songs.
This advanced database stores the data in a reusable and maintainable way, and which is interconnected to many other online databases.


https://secondhandsongs.com/page/About


The Second Hand Songs project was founded in early 2003 by Bastien De Zutter, Mathieu De Zutter and Denis Monsieur. The additions to the database are the works of a group of fanatic cover song lovers, who devote their time on a voluntary basis.


https://secondhandsongs.com/participate

https://secondhandsongs.com/participate/submit
https://secondhandsongs.com/participate/own-cover-submissions


https://secondhandsongs.com/page/JoinUs


SecondHandSongs is always looking for new editors. Are you interested in working directly on the database? The fastest road is to editorship is to be active as a regular member


https://secondhandsongs.com/text-content/247


Menu / Editor Manual


https://secondhandsongs.com/page/API


The SecondHandSongs API (BETA)


The SecondHandSongs (SHS) API offers two functionalities:

Search (artists, performances and works)
Retrieve information of SHS objects (artists, performances, releases, ...)


The SHS data is licensed under the Creative Commons license CC BY-NC 3.0. It is free for any non-commercial use and must be attributed to "SecondHandSongs.com".


https://www.samples.fr/a-propos/


Samples.fr is a blog that lets you discover the music behind the hits of today and yesterday.
In the form of samples (a sample of one music, which is used to create another) or covers, various musical styles are discussed on these pages. Notice to fans of electronic music, hiphop, RnB, pop, rock, variety, disco, funk, rhythm n' blues, new wave and so on!


https://www.the-breaks.com/


www.the-breaks.com, AKA The (Rap) Sample FAQ


https://acoustid.org/


Welcome to AcoustID!
AcoustID is a project providing complete audio identification service, based entirely on open source software.
It consists of a client library for generating compact fingerprints from audio files, a large crowd-sourced database of audio fingerprints, many of which are linked to the MusicBrainz metadata database using their unique identifiers, and an web service that enables applications to quickly search in the fingerprint database.


https://acoustid.biz/


Acoustid
Audio identification services
Automatic music file tag correction. Music catalog reconciliation and cross-referencing. 100% open source.


At the core of AcoustID is an efficient algorithm for extracting audio fingerprints, called Chromaprint. The algorithm is optimized specifically for matching near-identical audio streams, which allows the audio fingerprints to be very compact and the extraction process to be fast. For example, it takes less than 100ms to process a two minute long audio file and the extracted audio fingerprint is just 2.5 KB of binary data.
AcoustID contains a large crowd-sourced database of such audio fingerprints together with additional information about them, such as the song title, artist or links to the MusicBrainz database. You can send an audio fingerprint to the AcoustID service and it will search the database and return you information about the song. We use a custom database for indexing the audio fingerprints to make the search very fast.
All of this is 100% open source and the database is available for download.


Pricing
The AcoustID service is free to use in non-commercial applications. If you want to use the service in a commercial product, please subscribe to one of the plans below. All plans come with a free trial. You are not charged for the first 10k searches. If you don't need more than that, you can use the service for free!


Also, if you are a single developer and the plans are too expensive for you, feel free to get in touch, explain your situation and I'm sure we can figure something out.


https://acoustid.org/webservice


Web Service
The AcoustID web service currently supports only two operations, searching in the fingerprint database and submitting new fingerprints into the database.


https://acoustid.org/database


Database
The AcoustID database includes user-submitted audio fingerprints, their mapping to MusicBrainz IDs and some supporting tables. It follows the structure of the PostgreSQL database used by the AcoustID server. Each table is exported in a separate file with the tab-separated text format used by the COPY command. At the moment, there are no tools for importing the database dump, it has to be done manually.


Monthly database dumps can be downloaded here


https://data.acoustid.org/


https://acoustid.org/chromaprint


Chromaprint
Chromaprint is the core component of the AcoustID project. It's a client-side library that implements a custom algorithm for extracting fingerprints from any audio source. Overview of the fingerprint extraction process can be found in the blog post "How does Chromaprint work?".


https://oxygene.sk/2011/01/how-does-chromaprint-work/


How does Chromaprint work?


Being primarily based on the Computer Vision for Music Identification paper, images play an important role in the algorithm.


A more useful representation is the spectrogram, which shows how does the intensity on specific frequencies changes over time


You can get this kind of image by splitting the original audio into many overlapping frames and applying the Fourier transform on them ("Short-time Fourier transform"). In the case of Chromaprint, the input audio is converted to the sampling rate 11025 Hz and the frame size is 4096 (0.371 s) with 2/3 overlap.


Many fingerprinting algorithms work with this kind of audio representation. Some are comparing differences across time and frequency, some are looking for peaks in the image, etc.


Chromaprint processes the information further by transforming frequencies into musical notes. We are only interested in notes, not octaves, so the result has 12 bins, one for each note. This information is called "chroma features". (I believe they were mentioned in the paper Audio Thumbnailing of Popular Music Using Chroma-Based Representations for the first time.)


http://dub.ucsd.edu/CATbox/Reader/ThumbnailingMM05.pdf


Audio Thumbnailing of Popular Music Using Chroma-Based Representations


Now we have a representation of the audio that is pretty robust to changes caused by lossy codecs or similar things and also it isn't very hard to compare such images to check how "similar" they are, but if we want to search for them in a database, we need a more compact form. The idea how to do it again comes from the Computer Vision for Music Identification paper with some modifications based on the Pairwise Boosted Audio Fingerprint paper. You can imagine having a 16x12 pixel large window and moving it over the image from the left to the right, one pixel at a time. This will generate a lot of small subimages. On each of them we apply a pre-defined set of 16 filters that capture intensity differences across musical notes and time. What the filters do is they calculate the sum of specific areas of the grayscale subimage and then compare the two sums. There are six possible ways to arrange the areas


https://ieeexplore.ieee.org/document/5312768


Pairwise Boosted Audio Fingerprint


You can basically take any of the six filter images, place it anywhere on the subimage and also make it as large as you want (as long as it fits the 16x12 pixel subimage). Then you calculate the sum of the black and white areas and subtract them. The result is a single real number. Every filter has three coefficients associated with it, that say how to quantize the real number, so that the final result is an integer between 0 and 3. These filters and coefficients were selected by a machine learning algorithm on a training data set of audio files during the development of the library.


There is 16 filters and each can produce an integer that can be encoded into 2 bits (using the Gray code), so if you combine all the results, you get a 32-bit integer. If you do this for every subimage generated by the sliding window, you get the full audio fingerprint.


You can use pyacoustid to interact with the library from Python. It provides a direct wrapper around the library, but also higher-level functions for generating fingerprints from audio files.


https://github.com/beetbox/pyacoustid


Python bindings for Chromaprint acoustic fingerprinting and the Acoustid Web service


Chromaprint and its associated Acoustid Web service make up a high-quality, open-source acoustic fingerprinting system. This package provides Python bindings for both the fingerprinting algorithm library, which is written in C but portable, and the Web service, which provides fingerprint lookups.


You can also use the fpcalc utility programatically. It can produce JSON output, which should be easy to parse in any language. This is the recommended way to use Chromaprint if all you need is generate fingerprints for AcoustID.


https://github.com/acoustid

https://github.com/acoustid/acoustid-index


AcoustID Index
Acoustid Index is a "number search engine". It's similar to text search engines, but instead of searching in documents that consist of words, it searches in documents that consist of 32-bit integers.
It's a simple inverted index data structure that doesn't do any kind of processing on the indexed documents. This is useful for searching in Chromaprint audio fingerprints, which are nothing more than 32-bit integer arrays.


Minimalistic search engine searching in audio fingerprints from Chromaprint


https://bitbucket.org/acoustid/workspace/repositories/
https://twitter.com/acoustid


https://acousticbrainz.org/


Between 2015 and 2022, AcousticBrainz helped to crowd source acoustic information from music recordings. This acoustic information describes the acoustic characteristics of music and includes low-level spectral information and information for genres, moods, keys, scales and much more.
AcousticBrainz was a joint effort between Music Technology Group at Universitat Pompeu Fabra in Barcelona and the MusicBrainz project. At the heart of this project lies the Essentia toolkit from the MTG -- this open source toolkit enables the automatic analysis of music. The output from Essentia is collected by the AcousticBrainz project and made available to the public.
In 2022, the decision was made to stop collecting data. For now, the website and its API will continue to be available.
AcousticBrainz organizes the data on a recording basis, indexed by the MusicBrainz ID for recordings. If you know the MBID for a recording, you can easily fetch from AcousticBrainz. For details on how to do this, visit our API documentation.
All of the data contained in AcousticBrainz is licensed under the CC0 license (public domain).


https://community.metabrainz.org/t/acousticbrainz-making-a-hard-decision-to-end-the-project/572828


AcousticBrainz: Making a hard decision to end the project


We’ve written a blog post outlining some of our reasons for shutting down the project, the final steps that we’re taking, and a few ideas about our future plans for recommendations and other things in the MetaBrainz world.


https://blog.metabrainz.org/2022/02/16/acousticbrainz-making-a-hard-decision-to-end-the-project/


AcousticBrainz: Making a hard decision to end the project
We created AcousticBrainz 7 years ago and started to collect data with the goal of using that data down the road once we had collected enough. We finally got around to doing this recenty, and realised that the data simply isn’t of high enough quality to be useful for much at all.
We spent quite a bit of time trying to brainstorm on how to remedy this, but all of the solutions we found require a significant amount of money for both new developers and new hardware. We lack the resources to commit to properly rebooting AcousticBrainz, so we’ve taken the hard decision to end the project.
Read on for an explanation of why we decided to do this, how we will do it, and what we’re planning to do in the future.


https://acousticbrainz.org/download


If you are interested in computing acoustic features on your own music, you can still download the command-line essentia extractor and run it yourself


2022-07-06: We provide downloadable archives of all submissions made to AcousticBrainz (29,460,584 submissions)


https://acousticbrainz.org/data


AcousticBrainz data

API Reference
Highlevel data and datasets
Sample data


https://acousticbrainz.org/datasets/list


Public datasets


https://beets.readthedocs.io/en/stable/plugins/acousticbrainz.html


The acousticbrainz plugin gets acoustic-analysis information from the AcousticBrainz project.


For all tracks with a MusicBrainz recording ID, the plugin currently sets these fields: average_loudness, bpm, chords_changes_rate, chords_key, chords_number_rate, chords_scale, danceable, gender, genre_rosamerica, initial_key, key_strength, mood_acoustic, mood_aggressive, mood_electronic, mood_happy, mood_party, mood_relaxed, mood_sad, moods_mirex, rhythm, timbre, tonal, voice_instrumental


https://metabrainz.org/datasets/derived-dumps


MetaBrainz Derived Dumps
On this page we describe several datasets with the term “canonical”. Since MusicBrainz aims to catalog all released music, the database contains a lot of different versions of releases or different versions of recordings. We find it important to collect all of these different versions, but in the end it is too much data for most of our users. Fortunately, it is easy to combine multiple pieces of well structured data into something that fits a user’s desired end-use.
However, sometimes it can be challenging to work out which of the many releases/recordings is the one that “most people will think of the most representative version”. Even defining what this means is incredibly difficult, but we’ve attempted to do just that and we’re using the results of this work in our production systems on ListenBrainz to map incoming listens to MusicBrainz entries.
When looking at the descriptions our datasets, please consider that “canonical” implies the most representative version. Each of our canonical datasets has a more detailed description of what “canonical” means in that given dataset.


https://blog.metabrainz.org/

https://blog.metabrainz.org/2023/08/28/gsoc-23-artist-similarity-graph/


GSoC ’23: Artist similarity graph


Discovering new pieces to add to your personal collection and play on repeat. This very idea is at the heart of the artist similarity graph project. The project helps the users to uncover the connections between artists with similar genres and styles. It does so by providing a search interface to the users, where they can find their favourite artist and then generate a graph of similar artists. An artist panel featuring information about the artist is also presented, it showcases artist’s name, type, birth, area, wiki, top track and album. Users can also play the tracks right on the page itself using BrainzPlayer.


A network graph with a central node of the selected artist and the links to the related artists is displayed. The artists are arranged based on their similarity score. Artists with higher scores being closer and lower being further. To convey the strength of relationships between the artists a divergent colour scheme is used. The user also has the ability to travel across the graph by clicking thorough the artists (nodes).


Technologies used:

nivo: For artist graph generation
React with Typescript: For web pages
Figma: Building mock ups and prototypes
Docker: To Containerize applications


https://github.com/plouc/nivo


nivo provides a rich set of dataviz components, built on top of the awesome d3 and React libraries


nivo provides supercharged React components to easily build dataviz apps, it's built on top of d3.
Several libraries already exist for React d3 integration, but just a few provide server side rendering ability and fully declarative charts.


https://nivo.rocks/


The first challenge was to normalize the data before using it could be used to generate a graph. Given the non linear nature of the data, a square root transformation was used to transform the data. The result is a linear set of data which can be appropriately used in a graph.


http://fmwww.bc.edu/repec/bocode/t/transint.html


The most useful transformations in introductory data analysis are the reciprocal, logarithm, cube root, square root, and square.


The square root, x to x^(1/2) = sqrt(x), is a transformation with a moderate effect on distribution shape: it is weaker than the logarithm and the cube root. It is also used for reducing right skewness, and also has the advantage that it can be applied to zero values. Note that the square root of an area has the units of a length. It is commonly applied to counted data, especially if the values are mostly rather small.


https://blog.metabrainz.org/2023/06/14/how-to-build-your-own-music-tagger-with-musicbrainz-canonical-metadata/


How to build your own music tagger, with MusicBrainz Canonical Metadata


In the blog post where we introduced the new Canonical Metadata dataset, we suggested that a user could now build their own custom music tagging application, without a lot of effort! In this blog post we will walk you through the process of doing just that, using Python.


Here at MetaBrainz, we’re die-hard Postgres fans. But the best tool that we’ve found for metadata matching is the Typesense search engine, which supports typo-resistant search. This example will use the Typesense datastore, but you may use whatever datastore you prefer.


https://typesense.org/


Lightning-fast Open Source Search


The Open Source Alternative to Algolia + Pinecone. The Easier To Use Alternative to Elasticsearch


https://github.com/typesense/typesense


Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences


https://github.com/metabrainz/canonical-data-example


MusicBrainz Canonical Data Examples
This simple example shows how to lookup music metadata using the MusicBrainz canonical dataset.


https://blog.metabrainz.org/2023/06/12/new-dataset-musicbrainz-canonical-metadata/


New dataset: MusicBrainz Canonical Metadata


The MusicBrainz project is proud to announce the release of our latest dataset: MusicBrainz Canonical Metadata. This geeky sounding dataset packs an  intense punch! It solves a number of problems involving how to match a piece of music metadata to the correct entry in the massive MusicBrainz database.
The MusicBrainz database aims to collect metadata for all releases (albums) that have ever been published. For popular albums, there can be many different releases, which begs the question “which one is the main (canonical) release?”. If you want to identify a piece of metadata, and you only have an artist and recording (track) name, how do you choose the correct database release?
This same problem exists on the recording level – many recordings (songs) exist on many releases – which one should be used?
The  MusicBrainz Canonical Metadata dataset now solves this problem by allowing users to lookup canonical releases and canonical recordings. Given any release MBID, Canonical Release Mapping (canonical_release_redirect.csv) allows you to find the release that we consider “canonical”. The same is now true for recording MBIDs, which allows you to look up canonical recordings using the Canonical Recording Mapping (canonical_recording_redirect.csv). Given any recording MBID, you can now find the correct canonical recording MBID.


https://metabrainz.org/datasets/derived-dumps#canonical


MusicBrainz Canonical Data Dumps


https://blog.metabrainz.org/2023/04/01/aibrainz-playlist-generator-beta/


AIBrainz Playlist Generator (beta)


MetaBrainz as an organisation has never much dabbled in (artificial) intelligence, but a number of recent factors have led to the team doing some exciting behind-the-scenes work over the last few months.
Lately more and more potential contributors have come to MeB interested in working on AI projects, and with ListenBrainz we have an excellent dataset. With a current focus on playtesting and finetuning our playlist features we also have the perfect use-case.
So, without further ado, we invite you to test the beta version of our new AI-powered playlist generator


https://beta.listenbrainz.org/explore/ai-brainz/


AIBrainz playlist generator (beta)


https://blog.metabrainz.org/2022/11/16/fresh-releases-my-gsoc-journey-with-metabrainz/


Fresh Releases – My (G)SoC journey with MetaBrainz


MusicBrainz is the largest structured online database of music metadata. Today, a myriad of developers leverage this data to build their client applications and projects. According to MusicBrainz Database statistics, 2022 alone saw a whopping 366,680*, releases, from 275,749 release groups, and 91.5% of these releases have cover art.  Given that it has a plethora of useful data about music releases available, but has no useful means to visually present it to general users, the idea of building the Fresh Releases page was born.


https://musicbrainz.org/statistics


https://github.com/metabrainz


MetaBrainz Foundation


https://github.com/metabrainz/musicbrainz-docker


MusicBrainz mirror server with search and replication


Docker Compose project for the MusicBrainz Server with replication, search, and development setup


https://github.com/metabrainz/musicbrainz-server


Server for the MusicBrainz project (website, API, database tools)


MusicBrainz Server is the web frontend to the MusicBrainz Database and is accessible at http://musicbrainz.org


https://github.com/metabrainz/listenbrainz-server


Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.


ListenBrainz keeps track of music you listen to and provides you with insights into your listening habits. We're completely open-source and publish our data as open data.


https://github.com/metabrainz/picard


MusicBrainz Picard audio file tagger


https://picard.musicbrainz.org/


MusicBrainz Picard
Picard is a cross-platform music tagger powered by the MusicBrainz database.


https://github.com/metabrainz/bookbrainz-site


BookBrainz website, written in node.js.


https://bookbrainz.org/


The Open Book Database
BookBrainz is a project to create an online database of information about every single book, magazine, journal and other publication ever written. We make all the data that we collect available to the whole world to consume and use as they see fit. Anyone can contribute to BookBrainz, whether through editing our information, helping out with development, or just spreading the word about our project.


https://github.com/metabrainz/troi-recommendation-playground


A recommendation engine playground that should hopefully make playing with music recommendations easy.


The Troi Playlisting Engine combines all of ListenBrainz' playlist efforts:

Playlist generation: Music recommendations and algorithmic playlist generation using a pipeline architecture that allows easy construction of custom pipelines that output playlists. You can see this part in action on ListenBrainz's Created for You pages, where we show of Weekly jams and Weekly Discovery playlists. The playlist generation tools use an API-first approach were users don't need to download massive amounts of data, but instead fetch the data via APIs as needed.
Local content database: Using these tools a user can scan their music collection on disk or via a Subsonic API (e.g. Navidrome, Funkwhale, Gonic), download metadata for it and then resolve global playlists (playlist with only MBIDs) to files available in a local collection. We also have support for duplicate file detection, top tags in your collection and other insights.
Playlist exchange: We're in the process of building this toolkit out to support saving/loading playlists in a number of format to hopefully break playlists free from the music silos (Spotify, Apple, etc)


https://github.com/metabrainz/picard-plugins


MusicBrainz Picard Plugins
This repository hosts plugins for MusicBrainz Picard.


https://listenbrainz.org/


Listen together with ListenBrainz
Track, explore, visualise and share the music you listen to.
Follow your favourites and discover great new music.


https://listenbrainz.org/explore/fresh-releases/


Fresh Releases
Listen to recent releases, and browse what's dropping soon.


https://labs.api.listenbrainz.org/


MetaBrainz Dataset Hoster Home
You can use this data set hoster to explore the various data sets that are being exposed through this interface. The goal of this interface is to make the discovery of hosted data quick and intuitive - ideally the interface should give you all of the information necessary in order start using one of these APIs in your project quickly.
The following data sets are available from here:

artist-country-code-from-artist-mbid: MusicBrainz Artist Country From Artist MBID
artist-credit-from-artist-mbid: MusicBrainz Artist Credit From Artist MBID
recording-mbid-lookup: MusicBrainz Recording by MBID Lookup
mbid-mapping: MusicBrainz ID Mapping lookup
mbid-mapping-release: MusicBrainz ID Mapping Release lookup
explain-mbid-mapping: Explain MusicBrainz ID Mapping lookup
recording-search: MusicBrainz Recording search
acr-lookup: MusicBrainz Artist Credit Recording lookup
acrr-lookup: MusicBrainz Artist Credit Recording Release lookup
spotify-id-from-metadata: Spotify Track ID Lookup using metadata
spotify-id-from-mbid: Spotify Track ID Lookup using recording mbid
sessions-viewer: ListenBrainz Session Viewer
similar-recordings: Similar Recordings Viewer
similar-artists: Similar Artists Viewer
tag-similarity: ListenBrainz Tag Similarity
bulk-tag-lookup: Bulk MusicBrainz Tag/Popularity by recording MBID Lookup

Use the web interface for each of these endpoints to discover what parameters to send and what results to expect. Then take the JSON GET or POST example data to integrate these calls into your projects.


https://critiquebrainz.org/


CritiqueBrainz is a repository for Creative Commons licensed music and book reviews. Here you can read what other people have written about an album or event and write your own review!
CritiqueBrainz is based on data from MusicBrainz - open music encyclopedia and BookBrainz - open book encyclopedia.


https://essentia.upf.edu/


Essentia
Open-source library and tools for audio and music analysis, description and synthesis


Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval. It contains an extensive collection of algorithms, including audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, a large variety of spectral, temporal, tonal, and high-level music descriptors, and tools for inference with deep learning models. Essentia is cross-platform and designed with a focus on optimization in terms of robustness, computational speed, and low memory usage, which makes it efficient for many industrial applications. The library includes Python and JavaScript bindings as well as various command-line tools and third-party extensions, which facilitate its use for fast prototyping and allow setting up research experiments very rapidly.


https://github.com/meyda/meyda


Audio feature extraction for JavaScript


https://meyda.js.org/


Meyda is a JavaScript audio feature extraction library. It works with the Web Audio API (or plain old JavaScript arrays) to expose information about the timbre and perceived qualities of sound. Meyda supports both offline feature extraction as well as real-time feature extraction using the Web Audio API. We wrote a paper about it, which is available here.


https://meyda.js.org/audio-features


Often, observing and analysing an audio signal as a waveform doesn’t provide us a lot of information about its contents. An audio feature is a measurement of a particular characteristic of an audio signal, and it gives us insight into what the signal contains. Audio features can be measured by running an algorithm on an audio signal that will return a number, or a set of numbers that quantify the characteristic that the specific algorithm is intended to measure. Meyda implements a selection of standardized audio features that are used widely across a variety of music computing scenarios.


Bear in mind that by default, Meyda.extract applies a windowing function to the incoming signal using the hanning windowing function by default. If you compare the results of Meyda’s feature extraction to that of another library for the same signal, make sure that the same windowing is being applied, or the features will likely differ. To disable windowing in Meyda.extract, set Meyda.windowingFunction to ‘rect’.


https://github.com/qurihara/picognizer


JavaScript library for detecting synthesized sounds


https://audd.io/


AudD offers Music Recognition API. We recognize music with our own audio fingerprinting technology based on neural networks. According to ProgrammableWeb, AudD is #1 among 13 Top Recognition APIs.


Pricing:

0+ requests per month - $5 per 1000 requests;
100 000 requests per month - $450;
200 000 requests per month - $800;
500 000 requests per month - $1800.
Contact us if you're interested in larger amounts of requests.

Live audio streams recognition - $45 per stream per month with our music DB, $25 with the music you upload.


https://docs.audd.io/


AudD Music Recognition API Docs


https://github.com/auddmusic


AudD provides music recognition services with API


https://github.com/AudDMusic/chrome-extension


AudD Chrome extension


Music Recognition Chrome extension


https://djtechtools.com/2015/10/08/dj-trainspotting-how-to-find-out-what-a-dj-is-playing/


DJ Trainspotting: How To Find Out What A DJ Is Playing


https://beebom.com/shazam-alternatives/


Top 6 Shazam Alternatives for Android and iOS


SoundHound – Music Discovery & Hands-Free Player
Genius – Song Lyrics & More
Musicxmatch – Lyrics for your music
MusicID
Soly – Song and Lyrics Finder
Google Assistant & Siri


https://www.soundhound.com/soundhound/


Soundhound music
Discover, Search, and Play Any Song by Using Just Your Voice


https://www.1001tracklists.com/


1001Tracklists - The World's Leading DJ Tracklist/Playlist Database


https://www.setlist.fm/


The setlist wiki


Find setlists for your favorite artists


Lyrics


https://www.azlyrics.com/
https://genius.com/
https://www.lololyrics.com/
https://beets.readthedocs.io/en/stable/plugins/lyrics.html


The lyrics plugin fetches and stores song lyrics from databases on the Web. Namely, the current version of the plugin uses Musixmatch, Genius.com, Tekstowo.pl, and, optionally, the Google custom search API.


Conferences, Journals, Research Papers, etc


https://www.ismir.net/


The International Society for Music Information Retrieval (ISMIR) is a non-profit organisation seeking to advance research in the field of music information retrieval (MIR)—a field that aims at developing computational tools for processing, searching, organizing, and accessing music-related data. Among other things, the ISMIR society fosters the exchange of ideas and activities among its members, stimulates research and education in MIR, supports and encourages diversity in membership and disciplines, and oversees the organisation of the annual ISMIR conference.


https://www.ismir.net/conferences/


Each year, the ISMIR conference is held in a different corner of the world to motivate the presentation and exchange of ideas and innovations related to the intentionally broad topic of music information. Historically, the call for papers (CFP) is announced in the beginning of the year (February-May) via the community mailing list, and conferences are held several months later (August-November).


https://transactions.ismir.net/


The Transactions of the International Society for Music Information Retrieval publishes novel scientific research in the field of music information retrieval (MIR), an interdisciplinary research area concerned with processing, analysing, organising and accessing music information. We welcome submissions from a wide range of disciplines, including computer science, musicology, cognitive science, library & information science and electrical engineering.


TISMIR was established to complement the widely cited ISMIR conference proceedings and provide a vehicle for the dissemination of the highest quality and most substantial scientific research in MIR. TISMIR retains the Open Access model of the ISMIR Conference proceedings, providing rapid access, free of charge, to all journal content. In order to encourage reproducibility of the published research papers, we provide facilities for archiving the software and data used in the research.


https://transactions.ismir.net/articles/10.5334/tismir.171


The Sound Demixing Challenge 2023 – Music Demixing Track


This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX’23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding.1 We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.


https://dblp.uni-trier.de/db/conf/ismir/index.html


International Society for Music Information Retrieval Conference (ISMIR)


The dblp computer science bibliography provides open bibliographic information on major computer science journals and proceedings.


https://en.wikipedia.org/wiki/International_Society_for_Music_Information_Retrieval


The International Society for Music Information Retrieval (ISMIR) is an international forum for research on the organization of music-related data.


https://en.wikipedia.org/wiki/International_Society_for_Music_Information_Retrieval#Annual_conferences


Since its inception in 2000, ISMIR has been the world’s leading forum for research on the modelling, creation, searching, processing and use of musical data. Researchers across the globe meet at the annual conference conducted by the society. It is known by the same acronym as the society, ISMIR. Following is the list of conferences held by the society.


Note: This section links to the annual conference webpages, as well as archives of the papers/etc presented at each (from 2000 through to 2024+)


https://en.wikipedia.org/wiki/International_Society_for_Music_Information_Retrieval#Research_areas_and_topics


Research areas and topics
The following list gives an overview of the main research areas and topics that are within the scope of Music Information Retrieval


https://en.wikipedia.org/wiki/Music_information_retrieval


Music information retrieval


Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music.


Music classification


Recommender systems


Music source separation and instrument recognition


Automatic music transcription


Music generation


MIR data and fundamentals


Methodology


Domain knowledge


Musical features and properties


Music processing


Application


https://en.wikipedia.org/wiki/International_Society_for_Music_Information_Retrieval#MIREX


MIREX
The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for MIR algorithms, coupled to the ISMIR conference. Since it started in 2005, MIREX has fostered advancements both in specific areas of MIR and in the general understanding of how MIR systems and algorithms are to be evaluated.


https://www.music-ir.org/mirex/wiki/MIREX_HOME

This wiki contains data from MIREX 2005 through to 2021 (at time of writing)


https://en.wikipedia.org/wiki/International_Conference_on_Acoustics,_Speech,_and_Signal_Processing


International Conference on Acoustics, Speech, and Signal Processing


ICASSP, the International Conference on Acoustics, Speech, and Signal Processing, is an annual flagship conference organized by IEEE Signal Processing Society. Ei Compendex has indexed all papers included in its proceedings.


As ranked by Google Scholar's h-index metric in 2016, ICASSP has the highest h-index of any conference in the Signal Processing field. The Brazilian ministry of education gave the conference an 'A1' rating based on its h-index.


https://2024.ieeeicassp.org/


https://en.wikipedia.org/wiki/International_Conference_on_Digital_Audio_Effects


International Conference on Digital Audio Effects


The annual International Conference on Digital Audio Effects or DAFx Conference is a meeting of enthusiasts working in research areas on audio signal processing, acoustics, and music related disciplines, who come together to present and discuss their findings.


https://en.wikipedia.org/wiki/New_Interfaces_for_Musical_Expression


New Interfaces for Musical Expression


New Interfaces for Musical Expression, also known as NIME, is an international conference dedicated to scientific research on the development of new technologies and their role in musical expression and artistic performance.


https://en.wikipedia.org/wiki/Sound_and_Music_Computing_Conference


Sound and Music Computing Conference


The Sound and Music Computing (SMC) Conference is the forum for international exchanges around the core interdisciplinary topics of Sound and Music Computing. The conference is held annually to facilitate the exchange of ideas in this field.


Sound and Music Computing (SMC) is a research field that studies the whole sound and music communication chain from a multidisciplinary point of view. The current SMC research field can be grouped into a number of subfields that focus on specific aspects of the sound and music communication chain.

Processing of sound and music signals: This subfield focuses on audio signal processing techniques for the analysis, transformation and resynthesis of sound and music signals.
Understanding and modeling sound and music: This subfield focuses on understanding and modeling sound and music using computational approaches. Here we can include Computational musicology, Music information retrieval, and the more computational approaches of Music cognition.
Interfaces for sound and music: This subfield focuses on the design and implementation of computer interfaces for sound and music. This is basically related to Human Computer Interaction.
Assisted sound and music creation: This subfield focuses on the development of computer tools for assisting Sound design and Music composition. Here we can include traditional fields like Algorithmic composition.


https://en.wikipedia.org/wiki/Computer_Music_Journal


Computer Music Journal


Computer Music Journal is a peer-reviewed academic journal that covers a wide range of topics related to digital audio signal processing and electroacoustic music. It is published on-line and in hard copy by MIT Press. The journal is accompanied by an annual CD/DVD that collects audio and video work by various electronic artists.


Unsorted


https://llms-heart-mir.github.io/tutorial/intro.html


LLMs <3 MIR
A tutorial on Large Language Models for Music Information Retrieval


This is a web book I wrote because it felt fun when I thought about it -- a tutorial on Large Language Models for Music Information Retrieval.


This book is written in the perspective of music AI.

Chapter I, “Large Language Models”, would be general and succinct. I’ll outsource a lot by simply sharing links so that you decide the depth and breadth of your study.
Chapter II, “LLM as a Tool with Common Sense” is where I introduce some existing works and my suggestions on how to use LLMs for MIR research.
Chapter III, “Multimodal LLMs”, provides a summary about how we can incorporate multimodal data into LLMs.
Chapter IV, “Weakness of LLMs for MIR”, presents some limitations the current LLMs have in the context of MIR research.
Chapter V, “Finale”, is just a single page of my parting words.


https://llms-heart-mir.github.io/tutorial/part04_multimodal/sec02-music-audio-llms.html


Music Audio LLMs
So, how can we feed audio signals to a LLM? It’s really the same as we did with images. We need to somewhat find a way to represent the audio signal in a vector sequence Ha, and perhaps feed it with some text representation Hq.


Salmonn


https://github.com/bytedance/SALMONN


SALMONN: Speech Audio Language Music Open Neural Network
SALMONN is a large language model (LLM) enabling speech, audio events, and music inputs, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance. Instead of speech-only input or audio-event-only input, SALMONN can perceive and understand all kinds of audio inputs and therefore obtain emerging capabilities such as multilingual speech recognition and translation and audio-speech co-reasoning. This can be regarded as giving the LLM "ears" and cognitive hearing abilities, which makes SALMONN a step towards hearing-enabled artificial general intelligence.


https://arxiv.org/abs/2310.13289


SALMONN: Towards Generic Hearing Abilities for Large Language Models


LLark


https://research.atspotify.com/2023/10/llark-a-multimodal-foundation-model-for-music/


LLark: A Multimodal Foundation Model for Music


LLark is a research exploration into the question: how can we build a flexible multimodal language model for music understanding?


LLark is designed to produce a text response, given a 25-second music clip and a text query (a question or short instruction).


We built our training dataset from a set of open-source academic music datasets (MusicCaps, YouTube8M-MusicTextClips, MusicNet, FMA, MTG-Jamendo, MagnaTagATune). We did this by using variants of ChatGPT to build query-response pairs from the following inputs: (1) the metadata available from a dataset, as pure JSON; (2) the outputs of existing single-task music understanding models; (3) a short prompt describing the fields in the metadata and the type of query-response pairs to generate.


We built our training dataset from a set of open-source academic music datasets (MusicCaps, YouTube8M-MusicTextClips, MusicNet, FMA, MTG-Jamendo, MagnaTagATune). We did this by using variants of ChatGPT to build query-response pairs from the following inputs: (1) the metadata available from a dataset, as pure JSON; (2) the outputs of existing single-task music understanding models; (3) a short prompt describing the fields in the metadata and the type of query-response pairs to generate. Training a model using this type of data is known as “instruction tuning.” An instruction-tuning approach has the additional benefit of allowing us to use a diverse collection of open-source music datasets that contain different underlying metadata, since all datasets are eventually transformed into a common (Music + Query + Response) format. From our initial set of 164,000 unique tracks, this process resulted in approximately 1.2M query-response pairs.


LLark is trained to use raw audio and a text prompt (the query) as input, and produces a text response as output. LLark is initialized from a set of pretrained open-source modules that are either frozen or fine-tuned, plus only a small number of parameters (less than 1%!) that are trained from scratch.


The raw audio is passed through a frozen audio encoder, specifically the open-source Jukebox-5B model. The Jukebox outputs are downsampled to 25 frames per second (which reduces the size of the Jukebox embeddings by nearly 40x while preserving high-level timing information), and then passed through a projection layer that is trained from scratch to produce audio embeddings. The query text is passed through the tokenizer and embedding layer of the language model (LLama2-7B-chat) to produce text embeddings. The audio and text embeddings are then concatenated and passed through through the rest of the language model stack. We fine-tune the weights of the language model and projection layer using a standard training procedure for multimodal large language models (LLMs).


In one set of experiments, we asked people to listen to a music recording and rate which of two (anonymized) captions was better. We did this across three different datasets with different styles of music, and for 4 different music captioning systems in addition to LLark. We found that people on average preferred LLark’s captions to all four of the other music captioning systems.


We conducted an additional set of experiments to measure LLark’s musical understanding capabilities. In these evaluations, LLark outperformed all baselines tested on evaluations of key, tempo, and instrument identification in zero-shot datasets (datasets not used for training). In zero-shot genre classification, LLark ranked second, but genre estimation is a difficult and subjective task; we show in the paper that LLark’s predictions on this task tend to fall within genres that most musicians would still consider correct (e.g., labeling “metal” songs as “rock”).


https://arxiv.org/abs/2310.07160


LLark: A Multimodal Instruction-Following Language Model for Music


Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, reasoning), we show that LLark matches or outperforms existing baselines in music understanding, and that humans show a high degree of agreement with its responses in captioning and reasoning tasks. LLark is trained entirely from open-source music data and models, and we make our training code available along with the release of this paper.


https://github.com/spotify-research/llark


LLark: A Multimodal Foundation Model for Music
This repository contains the code used to build the training dataset, preprocess existing open-source music datasets, train the model, and run inference. Note that this paper is not accompanied with any trained models.


https://github.com/llms-heart-mir/tutorial


https://www.discogs.com/


Discogs - Music Database and Marketplace


https://www.last.fm/


Last.fm | Play music, find songs, and discover artists
The world's largest online music service. Listen online, find out more about your favourite artists, and get music recommendations, only at Last.fm.


https://rateyourmusic.com/


RYM is one of the largest music databases and communities online, which you can use in endless ways to discover new music.


https://rateyourmusic.com/discover
https://rateyourmusic.com/search
https://rateyourmusic.com/new-music/
https://rateyourmusic.com/genres/
https://rateyourmusic.com/lists/
https://rateyourmusic.com/charts/


https://github.com/marin-m/SongRec


SongRec
SongRec is an open-source Shazam client for Linux, written in Rust.


How it works
For useful information about how audio fingerprinting works, you may want to read this article. To be put simply, Shazam generates a spectrogram (a time/frequency 2D graph of the sound, with amplitude at intersections) of the sound, and maps out the frequency peaks from it (which should match key points of the harmonics of voice or of certains instruments).
Shazam also downsamples the sound at 16 KHz before processing, and cuts the sound in four bands of 250-520 Hz, 520-1450 Hz, 1450-3500 Hz, 3500-5500 Hz (so that if a band is too much scrambled by noise, recognition from other bands may apply). The frequency peaks are then sent to the servers, which subsequently look up the strongest peaks in a database, in order look for the simultaneous presence of neighboring peaks both in the associated reference fingerprints and in the fingerprint we sent.
Hence, the Shazam fingerprinting algorithm, as implemented by the client, is fairly simple, as much of the processing is done server-side. The general functionment of Shazam has been documented in public research papers and patents.


https://beets.io/

https://github.com/beetbox/beets


music library manager and MusicBrainz tagger


Beets is the media library management system for obsessive music geeks.
The purpose of beets is to get your music collection right once and for all. It catalogs your collection, automatically improving its metadata as it goes. It then provides a bouquet of tools for manipulating and accessing your music.


Because beets is designed as a library, it can do almost anything you can imagine for your music collection. Via plugins, beets becomes a panacea:

Fetch or calculate all the metadata you could possibly need: album art, lyrics, genres, tempos, ReplayGain levels, or acoustic fingerprints.
Get metadata from MusicBrainz, Discogs, and Beatport. Or guess metadata using songs' filenames or their acoustic fingerprints.
Transcode audio to any format you like.
Check your library for duplicate tracks and albums or for albums that are missing tracks.
Clean up crufty tags left behind by other, less-awesome tools.
Embed and extract album art from files' metadata.
Browse your music library graphically through a Web browser and play it in any browser that supports HTML5 Audio.
Analyze music files' metadata from the command line.
Listen to your library with a music player that speaks the MPD protocol and works with a staggering variety of interfaces.

If beets doesn't do what you want yet, writing your own plugin is shockingly simple if you know a little Python.


https://github.com/beetbox/mediafile


MediaFile: read and write audio files' tags in Python
MediaFile is a simple interface to the metadata tags for many audio file formats. It wraps Mutagen, a high-quality library for low-level tag manipulation, with a high-level, format-independent interface for a common set of tags.


https://github.com/quodlibet/mutagen


Python module for handling audio metadata


Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg Theora, Ogg Vorbis, True Audio, WavPack, OptimFROG, and AIFF audio files. All versions of ID3v2 are supported, and all standard ID3v2.4 frames are parsed. It can read Xing headers to accurately calculate the bitrate and length of MP3s. ID3 and APEv2 tags can be edited regardless of audio format. It can also manipulate Ogg streams on an individual packet/page level.


https://mutagen.readthedocs.io/en/latest/


Questions


Could we use data from the Spotify Audio Analysis endpoints (Ref: 1, 2, 3, 4) to generate a Shazam fingerprint/similar (Ref)?

See Also

My Other Related Deepdive Gist's and Projects


AI Voice Cloning / Transfer (eg. RVCv2) (0xdevalias' gist)
Singing Voice Synthesizers (eg. Vocaloid, etc) (0xdevalias' gist)
Audio Pitch Correction (eg. autotune, melodyne, etc) (0xdevalias' gist)
Automated Audio Transcription (AAT) / Automated Music Transcription (AMT) (aka: converting audio to midi) (0xdevalias' gist)
Generating Synth Patches with AI (0xdevalias' gist)
Compare/Diff Audio Files (0xdevalias' gist)
Working Around FLStudio Trial Limitations (0xdevalias' gist)