Skip to content

Instantly share code, notes, and snippets.

@alexanderlerch
Last active May 3, 2024 14:11
Show Gist options
  • Save alexanderlerch/e3516bffc08ea77b429c419051ab793a to your computer and use it in GitHub Desktop.
Save alexanderlerch/e3516bffc08ea77b429c419051ab793a to your computer and use it in GitHub Desktop.
list of MIR datasets
dataset meta data contents with audio
200DrumMachines 7371 one-shots yes
AAM onsets, pitches, instruments, melody instrument, keys, chords, tempo, beats 3000 (artificial) tracks yes
ACM_MIRUM tempo 1410 excerpts (60s) yes
ACPAS aligned audio and scores 2189 performances of 497 scores downloadable
AcousticBrainz-Genre 15-31 genres with 265-745 subgenres audio features for over 2000000 songs no
ADC2004 predominant pitch 20 excerpts yes
ADTOF drum onsets, beats 20 excerpts no
AdoVoc Pro Monophonic and polyphonic audio files of a set of common Flamenco singing 2 Female Singers 1 Male Singer yes
AED 28 event classes 5223 audio snippets yes
AIST Dance DB street dance videos 13,940 videos for 60 pieces yes
Amg1608 valence & arousal 1608 excerpts (30s) no
AMT-pilot structure by multiple annotators 8 songs yes
AMS key, harmony, phrases 54 movements no
APL piano practice 620 segments yes
artist20 20 artists 1413 songs no
ASAP 222 compositions 1068 MIDI performances partially
AudioSet 632 event classes 2084320 clips (10s) no
AVASPEECH-SMAD speech, music 45 hours yes
bach10 multitrack & aligned MIDI 10 chorales yes
BAF fingerprinting 2000 track and 3425 TV audio snippets (60s) on request
ballroom 8 genres & tempo & (down-)beats 698 excerpts (30s) yes
beatboxset1 percussion annotation 14 clips yes
BPS-FH functional annotation 32 sonatas no
C224a 14 genres 224 artists no
C3ka 18 genres 3000 artists no
C49ka-C111ka genres 48800/110588 artists no
CAL10k tags 10870 songs no
CAL500 tags 502 songs yes
CarnaticRhythm sama & beats 176 pieces on request
CASD chords by 4 annotators 50 songs no
CBFdataset 4 playing techniques (Chinese Bamboo Flute) 10 performers yes
ChoirSet MIDI, F0, beats 2 songs, 81 takes yes
CCMixter vocal & background track 50 mixes yes
CCOM-HuQin playing techniques and instruments 845 single clips, 10 annoted excerpts yes
ChoCo chords, key 20k+ songs no
Chopin22 aligned MIDI 44 recordings yes
Clotho 5 descriptive captions 4981 snippets yes
CMMSD note/rest/transition & onsets & vibrato 36 excerpts no
Coidach 55 genres 26420 songs no
corpusCOFLA editorial & predominant melody 1800 flamenco recordings no
covers80 cover songs 80 song pairs yes
Cross-Composer 11 composers & piece & key & era & instrumentation 1100 chromagrams and chord labels no
Cross-Era composer & piece & key & era & instrumentation 2000 chromagrams and chord labels no
CSD MIDI, lyrics, performance 50 + 50 songs (Korean, English) yes
CSD pitch 48 recordings yes
Compmusic datasets Carnatic, Hindustani, Turkish-Makam, Beijing-Opera, Arab-Andalusian Visit website for details Visit website for details
dadaGP guitarPro tablatures 26,181 songs no
DALI aligned notes and lyrics 5358 songs no
DAMP karaoke performances & aligned lyrics & pronunciation assessment 34000 monophonic recordings yes
Da-TACOS cover songs 25000 songs no
DEAM valence & arousal 1802 excerpts yes
DEAPDataset valence & arousal & dominance & physiological data 120 music video excerpts no
DESED 10 audio event classes approx 20k 10s clips (unlabeled, weakly/strongly labeled) yes
DIM-SIM triplet similarity 4000 snippets by 5-12 listeners no
DREANSS onset times & perc. instruments 18 excerpts yes
DrumPt 4 playing techniques app. 2000 annotations yes (see ENST)
E-GMD drum timing, drummer, kit 45537 MIDI files yes
EEP Multitrack, bowing descriptions 23 string quartets yes
ElectronicMusic year, composer gender 1878 works no
EMO-Soundscapes arousal & valence 1213 soundscape recordings yes
EmoMusic arousal & valence 744 excerpts (45s) yes
EMOPIA emotion 1,087 music clips from 387 songs yes
Emotify induced emotion 400 excerpts yes
EMusic arousal & valence 100 excerpts (experimental music) yes
ENEPP performance assessments score-aligned audio yes
EnsembleSet different mix formats 80 synthesized chamber ensemble pieces yes
ENST-Drums onset times & perc. instruments & playing technique 318 segments yes
EPIC-Sounds 44 audio event classes 78,400 segments yes
Erkomaishvili F0, note onsets, segments 118 songs yes
Extendedballroom 9 genres & tempo &amp 4000 excerpts (30s) downloadable
ExtraSensory 51 context labels 300000 sensor recordings from 60 users yes
ffuhrmann 11 predom. instr. 6951 excerpts/220 songs yes/no
fifteen-songs-dataset 15 grateful dead songs 2617 cover performances yes
Filosax beat, chord, sections, sax pitch 48 multitrack jazz recordings yes
FlaBase editorial & biographical & musicological information on flamenco, 1102 artists & 74 palos & 2860 albums 13311 tracks no
FSLD tempo, key, instrumentation, genre 3000 annotated loops yes
FMA-full 161 genres 106574 songs yes
FMA-large 161 genres 106574 excerpts (30s) yes
FMA-medium 16 genres 25000 excerpts (30s) yes
FMA-small 8 genres 8000 excerpts (30s) yes
FSD-Kaggle2019 80 tags 29000 clips yes
FSD50K 200 audio event classes 51,197 audio clips yes
Fugue structure & cadences 36 fugues (Bach & Shostakovich no
GiantMIDI-Piano dataset composers, transcribed score 10854 MIDI files no
GiantStepsKey key 604 files no
GiantStepsTempo tempo (alternate) 664 files no
GMD genre & valence & arousal 1400 songs downloadable
GNMID14 timestamp & country 110M music ID matches no
Good-sounds.org 12 instruments, pitch, sound quality 8750 notes yes
GrooveMD drum timing, drummer 1150 MIDI files yes (rendered)
GPT 7 guitar playing techniques 6580 clips yes
GSD start/stop of guitar solos 60 songs no
GTZAN 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levels 1000 excerpts (30s) yes
GuitarSet midi & pitch & beat & chords 360 guitar excerpts (30s) yes
Hainsworth tempo 245 excerpts (60s) yes
HF1 onset, offset, pitch, 5 emotions 5x8 songs yes
HarmonixSet beats, downbeats, structure 912 pop songs no
HAYDN QUARTETS harmonic analysis in **harm syntax 6 scores no
HHDS multitrack & style & tempo 18 songs yes
HJDB downbeat 236 excerpts yes
holzapfel:onset onset times 78 excerpts yes
homburg 9 genres 1889 excerpts (10s) yes
IADS valence & arousal & dominance 111 sound snippets yes
Multitrack multitrack & style 12 songs yes
IDMT-SMT-Audio-Effects effects on bass and guitar notes 55044 recordings yes
IDMT-SMT-Bass bass performance styles 4300 excerpts yes
IDMT-SMT-Bass-SINGLE-TRACK style annotated bass lines 17 bass lines (?) yes
IDMT-SMT-Drums onset times & perc. instruments 518 files yes
IDMT-SMT-Guitar 9 guitar playing techniques 4700+400 note events yes
iKala singing voice & background 252 excerpts (30s) yes
ImprovisingDuos video and audio for improv 24 snippets yes
INRIA:DSD100 multitrack 100 songs yes
INRIA:EuroVision structure 124 songs no
INRIA:Quaero structure 159 songs no
IRMAS 11 instruments 2874 excerpts yes
ISMIR2004Genre 6 genres 729 excerpts (30s) yes
ISMIR2004Tempo tempo 465 excerpts (20s) yes
IsoVAT valence, arousal, tension 90 MIDI snippets yes
Jazz Audio-Aligned Harmony Dataset structure & key & chords & beats 113 songs no
jaCapella genre 35 songs (multi-track) yes
Jamendo-VAD voice activity 61+16+16 songs yes
JGDB multitrack & MIDI random generated excerpts yes
JKU-ScoFo audio & MIDI 16 recordings yes
Jordan:Classical structure 15 pieces yes
Jordan:Jazz structure 15 pieces yes
JLSDD symbolic scores 77 duos (Josquin & La Rue) no
KBSF Data extracted from songfacts.com details on webpage no
LabROSA:APT MIDI 29 piano excerpts yes
LabROSA:MIDI audio & MIDI 4 songs yes
last.fm listening habits 992 last.fm users no
LFM-1b listening habits 120000 users no
LIND lyrics-based artist and genre graphs 42802 artists/214 genres no
LMD MIDI & tempo & key 176581 MIDI files no
M-DJCUE cue points 134 tracks no
MASS Multitracks 10s-40s yes
MAESTRO audio aligned MIDI & velocity & sustain & aligned scores 172 hours of piano yes
magnatagatune similarity 25863 excerpts (30s) yes
MAPS piano notes/chords/pieces & tempo/key 238 pieces yes
MARD album reviews 66566 songs no
MARG-AMT MIDI pitch & onset/offset times 30 melodies yes
MAST vocal performance assessment 1018 performances no
MAST-Rhythm rhythm performance assessment 3721 performances yes
McGill Billboard chords 740 songs no
MDBDrums onset times & perc. instrument & playing technique 23 excerpts yes
Medley-solos-DB 8 instruments 21572 clips (3s) yes
Medley2K Medley transitions 2000 medleys, 7712 transitions no
MedleyDB multitrack & genre & melody f0 & instrument activation 122 songs yes
MELON playlist dataset Mel spectrograms and 148,826 playlists 649,091 songs no
MeloSol symbolic 783 melodies no
MER500 emotion 500 clips yes
MMD artist, title metadata 436631 MIDI files no
MIR-1K vocal and background 1000 excerpts yes
mirex05Train predominant pitch 13 excerpts yes
mirex06Train tempo & beats 20 excerpts (30s) yes
MLHD listening history 594415 users with 21079612671 listening events from 6685542 songs no
MLPMF 7 perceptual features 5000 audio files yes
MMTD listening behavior 1086808 tweets no
Modal onset times 71 snippets yes
MOODetector:Bi-Modal lyrics & mood 133 excerpts yes
MOODetector:Multi-Modal lyrics & MIDI & mood 903 excerpts (30s) yes
moodswings arousal & valence 240 excerpts (30s) no
MozartStringQuartets structure, cadences 32 movements no
MSMD piano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR 497 pieces no
MSD genre & mood & proprietary features 1000000 songs no
MusAV arousal & valence 2092 excerpts (30s) yes
Music4All tags, lyrics 109,269 excerpts (30s) on request
MTC phrases & key & meter 18000 melodies partially
MTD 2067 theme scores, aligned audio 18000 melodies partially
MTG-Jamendo tags (genre, instruments, mood) 55000 tracks yes
MTG-QBH title & artist 118 queries/481 songs yes/no
musiclef2012 tags 1355 songs no
MusicMicro music listening patterns 136866 users no
MUSDB-18 multitrack 150 songs yes
MusicBench chords, beats, tempo, key app. 53000 excerpts yes
MusicNet pitch and onsets 330 recordings implicitly
MuVi-Sync chords and loudness 748 music videos no
MVD vocal/scream activity 57 metal songs no
NES-MDB multi-track MIDI and aligned audio 5000 songs on request
Nine Inch Nails Multitracks multitrack 66 songs yes
NMED-H EEG 24 trials x 16 excerpts (4.5min) no
NMED-RP EEG 20 trials x 10 excerpts (4.5min) no
NMED-TNaturalistic Music EEG Dataset: EEG 30 trials x 16 excerpts (30sec) no
NSynth instrument and pitch 305979 single notes yes
NUS-48E aligned phonemes 48 pairs of sung and spoken yes
ODB onset times 19 excerpts yes
Onset_Leveau onset times 21 excerpts yes
OpenBMAT 6 classes for music presence 1647 excerpts (60s) yes
OpenMIC-2018 20 instruments 20000 excerpts (10s) yes
Orchset predominant pitch 64 excerpts yes
Phenicx-Anechoic multi-track audio & aligned MIDI 4 pieces yes
PHENICX emotion: Excerpts of the Eroica Symphony by Beethoven plus audio descriptors from Essentia 15 excerpts yes
PHENICX conduct dataset Motion capture, recordings 24 experts yes
PHENICX Symphonies Recordings Multitracks, Video 5 Symphonies yes
PGD gestures, intention, video 210 clips yes
Phonation pitch & vowel & phonation mode 900 monophonic snippets yes
PlaylistDataset playlists 75262 songs/2840553 transitions no
POP909 MIDI songs 909 piano arrangements yes
QBT-Extended taps 3365 queries/51 songs MIDI
QMUL:Beatles structure & key & chords & beats 181 songs no
QMUL:King structure & key & chords 14 songs no
QMUL:MichaelJackson structure 38 songs no
QMUL:MixEvaluation multitrack & mixes 18 songs/180 mixes yes
QMUL:Queen structure/key & chords 51/31 songs no
QMUL:RSS structure 60 songs no
QMUL:Zweieck structure & key & chords & beats 18 songs no
Quartet Multitrack, Video, motion track 96 recordings yes
RealBook chords 2486 songs no
QUASI multitrack 11 songs yes
Robbie Williams Annotations (Zanoni-Giorgi) chords & keys & beats 65 songs no
RockCorpus chords & melody & bars 200 songs no
RWC lyrics & 10 genre & 50 instruments & chords & structure & aligned MIDI 115 songs/50 classical/100 songs yes
SALAMI structure 1447 songs no
SAMBASET recording date, escolas, beats 392 sambas no
Sargon structure 4 songs yes
Semantic Artist Similarity artist biographies & similarity 268+2336 artists no
Schenker MusicXML & Schenker analysis 41 pieces no
SCP EEG 108/648 trials x 12 stimuli (5s) yes
SDD start of samples 80 songs & 80 samples no
SDDS 10 snares, 4 dampenings, 53 mics 2522 shots yes
SEILS scores in different symbolic formats 30 madrigals no
Seyerlehner:1517-Artists 19 genres 3180 songs yes
Seyerlehner:Annotated 19 genres 190 songs yes
Seyerlehner:Pop tempo 1105 songs yes
Seyerlehner:Unique 14 genres 3115 excerpts (30s) yes
SHS100K cover songs ca. 10,000 songs with 100,000 tracks no
SISEC multitrack & mix 5 excerpts yes
Slakh synthesized audio and mixes 2100 mixes yes
SMC:MIREX tempo & beat positions 217 excerpts yes
SMD audio & aligned MIDI 50 recordings yes
SongDescriber captions 706 recordings yes
SoundTracks valence & energy & tension & mood 360+110 excerpts yes
SPAM structure 50 songs no
Shazam Research Dataset: Offsets in-song query times 188M queries over 20 songs no
Su-AMT onset times & pitch 10 excerpts yes
SUPRA-RW piano roll performances 478 performances yes
SWD key, chords, lyrics, structure 2+5 cycle performances partly
TextureStringQuartets texture 11 movements no
TAFFC mood quadrants 900 excerpts yes
Traditional Flute Dataset audio & aligned MIDI 30 excerpts yes
ThisIsMyJam favorite songs & artists 131k users no
TinySOL instrument, pitch, dynamics,string number 2913 isolated notes yes
TONAS pitch 72 single-voiced excerpts yes
TPD popularity rating 23385 songs no
Tunebot title & artist 10000 queries/? songs yes/no
UIOWA:MIS single instrument notes many yes
UMA-Piano piano chords 275040 recordings yes
USM-SED 27 audio classes 20000 stereo snippts yes
UnmixDB DJ mix parameters 37 playlists yes
URBAN-SED 9 event classes 10000 recordings yes
UrbanSound8k 10 event classes 8732 slices yes
URMP score-aligned video and audio 44 recordings yes
uspop2002 tags & genre & chords 8752 songs no
VGD EMG, playing techniques 960 recordings yes
Vocadito monophonic pitch, lyrics 40 excerpts in 7 languages yes
VocalNotes monophonic pitch, note segmentation with different annotators around 10 excerpts on request
VocalSet 17 vocal techniques, f0 and lyricsa> 3560 recordings yes
YousicianUkulele evaluated notes and chords 500000 exercises by 1000 users no
WRD aligned scores, keys, singing 4 operas yes
WJazzD onset, pitches 456 Jazz solos no
@eardrummer
Copy link

I wanted to suggest the addition of some massive datasets recently introduced. As mentioned on the linked post on the website for the book. (Currently comments aren't enabled on that post). Please excuse me if this wasn't the right place to post this.

dataset meta data contents with audio
GuitarSet pitch & midi & beat & chords 360 guitar excerpts (30s) with hexaphonic audio yes
MAESTRO audio aligned midi & velocity & sustain 172 hours of piano yes

@finlay-liu
Copy link

Great dataset! Thank you!

@FabianStammen
Copy link

The entry for the SALAMI dataset needs to be updated as they released the second half of their dataset.
It now contains structure annotations for 1447 songs.

@alexanderlerch
Copy link
Author

Oops, never saw the comments here. Will update soon. Thanks!

@qrqrqrqr
Copy link

qrqrqrqr commented Mar 3, 2020

good job,thanks!!!

@gusauriemo
Copy link

Hello, is there any updated link for the UMA database?

@gusauriemo
Copy link

And the MAPS one too, please!

@alexanderlerch
Copy link
Author

I'll look into it, but it might take some time. Meanwhile, you can try to reach out to the authors and ask them directly. Please let me know in case you hear back.

@gusauriemo
Copy link

Thank you for your prompt response! I managed to download them by looking at the revisions and copying the link directly from the code. I am unsure as to why this worked as opposed to just clicking the hyperlink, but either way, thank you very much Alexander!

@VG-account1
Copy link

VG-account1 commented Nov 26, 2021

And the MAPS one too, please!

As discussed here, the MAPS Database can be found here.

@VG-account1
Copy link

VG-account1 commented Nov 26, 2021

Hello, is there any updated link for the UMA database?

I managed to download them by looking at the revisions and copying the link directly from the code. I am unsure as to why this worked as opposed to just clicking the hyperlink, but either way, thank you very much Alexander!

Clicking on the link doesn't work, but what works is right click -> save as -> override the browser's complaint that the download is not secure. It would be nice to put this info next to the link.

@alexanderlerch
Copy link
Author

@gusauriemo @VG-account1 updated the two links discussed, thanks for your input!

@sinamusique
Copy link

Does anyone know of any other database for single instrument notes: transverse flute, oboe, bassoon, clarinet, saxophone (.wav format)?

I'm already using Tinysol and GoodSounds

@alexanderlerch
Copy link
Author

Does anyone know of any other database for single instrument notes: transverse flute, oboe, bassoon, clarinet, saxophone (.wav format)?

I'm already using Tinysol and GoodSounds

@sinamusique The IOWA dataset comes to mind, see "UIOWA:MIS" above

@sinamusique
Copy link

Does anyone know of any other database for single instrument notes: transverse flute, oboe, bassoon, clarinet, saxophone (.wav format)?
I'm already using Tinysol and GoodSounds

@sinamusique The IOWA dataset comes to mind, see "UIOWA:MIS" above

@alexanderlerch The UIOWA dataset are .aiff, I need .wav and with dynamic variations, mainly pp, mf and ff.

@alexanderlerch
Copy link
Author

As far as I understand, the format conversion between aiff and wav is trivial. IIRC, the main difference is whether the data is stored in little-endian or big-endian.

@sinamusique
Copy link

As far as I understand, the format conversion between aiff and wav is trivial. IIRC, the main difference is whether the data is stored in little-endian or big-endian.

Ok, I'm going to try it!
Thank you @alexanderlerch

@VG-account1
Copy link

@sinamusique Yes, automatic batch conversion between audio formats is possible. Similarly, you can export such sounds from SoundFonts. (How to do that depends on the software / programming language you use.) There are many SoundFonts (for example https://archive.org/details/musyng-kite or http://virtualplaying.com/virtual-playing-orchestra/ and its sources linked therein), some of them might have good woodwind samples with different dynamic markings.

@sinamusique
Copy link

@sinamusique Yes, automatic batch conversion between audio formats is possible. Similarly, you can export such sounds from SoundFonts. (How to do that depends on the software / programming language you use.) There are many SoundFonts (for example https://archive.org/details/musyng-kite or http://virtualplaying.com/virtual-playing-orchestra/ and its sources linked therein), some of them might have good woodwind samples with different dynamic markings.

Thanks @VG-account1 It's a good option!

@MaxMalmer
Copy link

For anyone seaching for the MAPS dataset, here's a new link where it can be accessed:
https://amubox.univ-amu.fr/index.php/s/iNG0xc5Td1Nv4rR

@alexanderlerch
Copy link
Author

For anyone seaching for the MAPS dataset, here's a new link where it can be accessed: https://amubox.univ-amu.fr/index.php/s/iNG0xc5Td1Nv4rR

Thanks @MaxMalmer, it's updated now (and the previous link was there erroneously anyway, not sure how that happened).

@SylviaZiyaZhou
Copy link

Hi Prof Lerch, CCOM-HuQin has released the full dataset with more than 12000 single clips and 57 excerpts with annotations. Here is the link:
https://zenodo.org/record/8140034

Hope this will help! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment