Skip to content

Instantly share code, notes, and snippets.

@lostanlen
Created April 10, 2020 10:56
Show Gist options
  • Save lostanlen/ea38c0591bcc8406f15fd84f0bfdb84f to your computer and use it in GitHub Desktop.
Save lostanlen/ea38c0591bcc8406f15fd84f0bfdb84f to your computer and use it in GitHub Desktop.
Companion code to mirdata issue #236
from mirdata import orchset
import os
def print_duration(mirdata_track):
"""This function measures the duration of a mirdata
Track by converting to JAMS and reading the file_metadata.
It runs only on mirdata master and returns an AssertionError
in mirdata v0.2-beta"""
jam = mirdata_track.to_jams()
assert jam.validate()
print(jam.file_metadata.duration)
def data_augmentation(orchset_home, track_id, flag=True):
"""Here's a borrowed piece of code that I don't fully understand,
but I need it for my research. What I can tell is that it takes
two strings as arguments. This has nothing to do with mirdata
and only runs shell commands, so what could go wrong?"""
# Narrator: do NOT run this code unless you know what you're doing
# This function destroys the ORCHSET dataset by trimming the beginning
# of a wave file by 100 milliseconds.
# Because this function doesn't adjust annotations accordingly,
# evaluating a melody estimator on this modified ORCHSET would be
# a catastrophe.
audio_path = os.path.join(
orchset_home, "audio", "mono", track_id + ".wav")
if flag:
temp_path = os.path.join(orchset_home, "temp.wav")
os.system(" ".join(["sox", audio_path, temp_path, "trim", "0.1"]))
os.remove(audio_path)
if flag:
os.rename(temp_path, audio_path)
### Now, suppose I just installed mirdata and ran the README code.
# I'm going to run validate, just to be sure; but i'll run it only
# once because it takes a long time.
#orchset.validate()
# mirdata v0.2-beta -> ({}, {})
# mirdata master -> ({}, {})
# Nice, all checksums are correct!
# Let's make an ORCHSET Track!
track_id = "Beethoven-S3-I-ex2"
track = orchset.Track(track_id)
# What is its duration?
print_duration(track)
# -> 10.282879818594104
# OK, noted.
# Now, let me experiment with data augmentation ...
# I won't be using mirdata for this.
vincent_orchset_home = "/Users/vl238/mir_datasets/Orchset/"
data_augmentation(vincent_orchset_home, track_id)
# ... a few moments later ...
# What is the duration of my Track again?
print_duration(track)
# -> 10.182879818594104
# Wait, what?
print_duration(track)
# -> 10.182879818594104
# What happened? What did I do? I haven't been using mirdata at all
# since I last printed duration! Is it a bug in print_duration??
print_duration(track)
# -> 10.182879818594104
# Doesn't seem like it. Weird.
# ... a few moments later ...
# Perhaps I should try running data_augmentation with flag=False?
data_augmentation(vincent_orchset_home, track_id, flag=False)
# Let me try to read the duration again
print_duration(track)
# -> ERROR. Excerpt of the backtrace below
#
#----> 9 jam = mirdata_track.to_jams()
# 10 assert jam.validate()
# 11 print(jam.file_metadata.duration)
#
#~/mirdata/mirdata/orchset.py in to_jams(self)
# 196 metadata = {k: v for k, v in self._track_metadata.items() if v is not None}
# 197 metadata['duration'] = librosa.get_duration(
#--> 198 self.audio_mono[0], self.audio_mono[1]
# 199 )
# 200 return jams_utils.jams_converter(
#
#~/mirdata/mirdata/orchset.py in audio_mono(self)
# 185 def audio_mono(self):
# 186 """(np.ndarray, float): mono audio signal, sample rate"""
#--> 187 return load_audio_mono(self.audio_path_mono)
# 188
# 189 @property
#~/mirdata/mirdata/orchset.py in load_audio_mono(audio_path)
# 214
# 215 """
#--> 216 return librosa.load(audio_path, sr=None, mono=True)
# 217
# 218
# Argh! Where is the bug??
# I don't think the bug is in print_duration. I'm just calling the to_jams()
# method and printing the result.
# Also, if I run the same script on guitarset, print_duration is constant.
# Is it something with the ORCHSET loader?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment