Skip to content

Instantly share code, notes, and snippets.

@LTeder
Last active March 2, 2024 16:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save LTeder/675b72ed7f4da9d865684644b6e2866d to your computer and use it in GitHub Desktop.
Save LTeder/675b72ed7f4da9d865684644b6e2866d to your computer and use it in GitHub Desktop.
MIDI piano transcription from any recording

Using a couple repos with Python, you can transcribe any piano recording to MIDI. Each will use their own environment. Pretty straightforward with conda/mamba.

Both steps incur some loss which affects the quality of the end result. But on any reasonably hi-fi recording since the late 60s you should get acceptable results. Super useful for dumping into Syn/Neothesia.

  1. Start with a folder containing WAV files of the desired songs.

  2. Use Music-Source-Separation-Training's Demucs4HT 6-stem model inference to derive the piano stems. For me this was:

python inference.py --model_type htdemucs --config_path configs/config_htdemucs_6stems.yaml --start_check_point htdemucs4-6.th --input_folder YT-Rips/ --store_dir results/
  1. Then feed this into piano_transcription_inference. Here's a simple inference script to use it on folders instead of files:
import argparse
from pathlib import Path
from tqdm import tqdm
from time import time
from torch import cuda
from piano_transcription_inference import PianoTranscription, sample_rate, load_audio


def inference(args):
    output_midi_path = Path(args.output_folder)
    device = 'cuda' if args.cuda else 'cpu'

    transcribe_time = time()

    transcriptor = PianoTranscription(device=device, checkpoint_path=None)
    # checkpoint_path: None for default path, str for downloaded checkpoint path

    for audio_path in tqdm([*Path(args.input_folder).glob("*.wav")]):
        # Load, transcribe, and write out to MIDI file
        audio, _ = load_audio(audio_path, sr=sample_rate, mono=True)
        _ = transcriptor.transcribe(audio, output_midi_path / audio_path.stem)
        #from pprint import pprint; pprint(transcribed_dict)

    print('Transcribe time: {:.3f} s'.format(time() - transcribe_time))

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('--input_folder', '-i', type=str, required=True)
    parser.add_argument('--output_folder', '-o', type=str, required=True)
    parser.add_argument('--cuda', '-c', action='store_true', default=False)
    args = parser.parse_args()
    if args.cuda:
        assert cuda.is_available()
    inference(args)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment