Using a couple repos with Python, you can transcribe any piano recording to MIDI. Each will use their own environment. Pretty straightforward with conda/mamba.
Both steps incur some loss which affects the quality of the end result. But on any reasonably hi-fi recording since the late 60s you should get acceptable results. Super useful for dumping into Syn/Neothesia.
-
Start with a folder containing WAV files of the desired songs.
-
Use Music-Source-Separation-Training's Demucs4HT 6-stem model inference to derive the piano stems. For me this was:
python inference.py --model_type htdemucs --config_path configs/config_htdemucs_6stems.yaml --start_check_point htdemucs4-6.th --input_folder YT-Rips/ --store_dir results/
- Then feed this into piano_transcription_inference. Here's a simple inference script to use it on folders instead of files:
import argparse
from pathlib import Path
from tqdm import tqdm
from time import time
from torch import cuda
from piano_transcription_inference import PianoTranscription, sample_rate, load_audio
def inference(args):
output_midi_path = Path(args.output_folder)
device = 'cuda' if args.cuda else 'cpu'
transcribe_time = time()
transcriptor = PianoTranscription(device=device, checkpoint_path=None)
# checkpoint_path: None for default path, str for downloaded checkpoint path
for audio_path in tqdm([*Path(args.input_folder).glob("*.wav")]):
# Load, transcribe, and write out to MIDI file
audio, _ = load_audio(audio_path, sr=sample_rate, mono=True)
_ = transcriptor.transcribe(audio, output_midi_path / audio_path.stem)
#from pprint import pprint; pprint(transcribed_dict)
print('Transcribe time: {:.3f} s'.format(time() - transcribe_time))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='')
parser.add_argument('--input_folder', '-i', type=str, required=True)
parser.add_argument('--output_folder', '-o', type=str, required=True)
parser.add_argument('--cuda', '-c', action='store_true', default=False)
args = parser.parse_args()
if args.cuda:
assert cuda.is_available()
inference(args)