Skip to content

Instantly share code, notes, and snippets.

@bancek
Last active September 3, 2023 15:50
Show Gist options
  • Star 37 You must be signed in to star a gist
  • Fork 29 You must be signed in to fork a gist
  • Save bancek/b37b780292540ed2d17d to your computer and use it in GitHub Desktop.
Save bancek/b37b780292540ed2d17d to your computer and use it in GitHub Desktop.
CUE splitter using ffmpeg (to mp3)
cue_file = 'file.cue'
d = open(cue_file).read().splitlines()
general = {}
tracks = []
current_file = None
for line in d:
if line.startswith('REM GENRE '):
general['genre'] = ' '.join(line.split(' ')[2:])
if line.startswith('REM DATE '):
general['date'] = ' '.join(line.split(' ')[2:])
if line.startswith('PERFORMER '):
general['artist'] = ' '.join(line.split(' ')[1:]).replace('"', '')
if line.startswith('TITLE '):
general['album'] = ' '.join(line.split(' ')[1:]).replace('"', '')
if line.startswith('FILE '):
current_file = ' '.join(line.split(' ')[1:-1]).replace('"', '')
if line.startswith(' TRACK '):
track = general.copy()
track['track'] = int(line.strip().split(' ')[1], 10)
tracks.append(track)
if line.startswith(' TITLE '):
tracks[-1]['title'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
if line.startswith(' PERFORMER '):
tracks[-1]['artist'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
if line.startswith(' INDEX 01 '):
t = map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':'))
tracks[-1]['start'] = 60 * t[0] + t[1] + t[2] / 100.0
for i in range(len(tracks)):
if i != len(tracks) - 1:
tracks[i]['duration'] = tracks[i + 1]['start'] - tracks[i]['start']
for track in tracks:
metadata = {
'artist': track['artist'],
'title': track['title'],
'album': track['album'],
'track': str(track['track']) + '/' + str(len(tracks))
}
if 'genre' in track:
metadata['genre'] = track['genre']
if 'date' in track:
metadata['date'] = track['date']
cmd = 'ffmpeg'
cmd += ' -b:a 320k'
cmd += ' -i "%s"' % current_file
cmd += ' -ss %.2d:%.2d:%.2d' % (track['start'] / 60 / 60, track['start'] / 60 % 60, int(track['start'] % 60))
if 'duration' in track:
cmd += ' -t %.2d:%.2d:%.2d' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, int(track['duration'] % 60))
cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items())
cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title'])
print cmd
@lachlan-00
Copy link

-ab 320k for new ffmpeg versions

cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items())
cmd += ' -ab 320k'
cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title'])

@diqidoq
Copy link

diqidoq commented Oct 5, 2018

Something is going on with track length. When I slidely change the script to split into flac files, all files have a length info of the original length from the combined file length showing, causing players to print an error on the end of each file. Not sure if it is related to some missing parts in the script here, or caused by the fact that I try to split the flac without re-encoding (-c:a copy) which can cause known issues with flac frames ... EDIT: yep, it is.

@naplutatium
Copy link

Great! Exactly what I was looking for. Thanks!

@buendias-dev
Copy link

buendias-dev commented Mar 28, 2020

Python 3:
Line 34 to:

t = list(map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':')))

Line change 65 to:

print(cmd)

So Python 3 and new ffmpeg version:

cue_file = 'file.cue'

d = open(cue_file).read().splitlines()

general = {}

tracks = []

current_file = None

for line in d:
    if line.startswith('REM GENRE '):
        general['genre'] = ' '.join(line.split(' ')[2:])
    if line.startswith('REM DATE '):
        general['date'] = ' '.join(line.split(' ')[2:])
    if line.startswith('PERFORMER '):
        general['artist'] = ' '.join(line.split(' ')[1:]).replace('"', '')
    if line.startswith('TITLE '):
        general['album'] = ' '.join(line.split(' ')[1:]).replace('"', '')
    if line.startswith('FILE '):
        current_file = ' '.join(line.split(' ')[1:-1]).replace('"', '')
    
    if line.startswith('  TRACK '):
        track = general.copy()
        track['track'] = int(line.strip().split(' ')[1], 10)

        tracks.append(track)

    if line.startswith('    TITLE '):
        tracks[-1]['title'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
    if line.startswith('    PERFORMER '):
        tracks[-1]['artist'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
    if line.startswith('    INDEX 01 '):
        t = list(map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':')))
        tracks[-1]['start'] = 60 * t[0] + t[1] + t[2] / 100.0

for i in range(len(tracks)):
    if i != len(tracks) - 1:
        tracks[i]['duration'] = tracks[i + 1]['start'] - tracks[i]['start']

for track in tracks:
    metadata = {
        'artist': track['artist'],
        'title': track['title'],
        'album': track['album'],
        'track': str(track['track']) + '/' + str(len(tracks))
    }

    if 'genre' in track:
        metadata['genre'] = track['genre']
    if 'date' in track:
        metadata['date'] = track['date']

    cmd = 'ffmpeg'
    cmd += ' -i "%s"' % current_file
    cmd += ' -ss %.2d:%.2d:%.2d' % (track['start'] / 60 / 60, track['start'] / 60 % 60, int(track['start'] % 60))

    if 'duration' in track:
        cmd += ' -t %.2d:%.2d:%.2d' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, int(track['duration'] % 60))

    cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items())
    cmd += ' -b:a 320k'
    cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title'])

    print(cmd)

@n30a5tr0
Copy link

n30a5tr0 commented Jul 4, 2020

...and i suggest to change line 55:
cmd += ' -b:a 320k'
to:
cmd += ' -c:a copy'
to skip the reencoding part

@OscarL
Copy link

OscarL commented Jan 30, 2021

Kinda weird that nobody seems to have noticed that the files split with this script have wrong start time and duration. Granted, the errors are all smaller than a second, but still pretty noticeable on the second track onward.

The issue is with lines 57 and 60. It uses integer for seconds, instead of using float with two decimals for precision.

Line 57 should read:

    cmd += ' -ss %.2d:%.2d:%05.2f' % (track['start'] / 60 / 60, track['start'] / 60 % 60, track['start'] % 60)

And line 60 should read:

        cmd += ' -t %.2d:%.2d:%05.2f' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, track['duration'] % 60)

With those changes, no more short tracks, nor ones that start to early!

@holesocks
Copy link

holesocks commented Feb 23, 2021

This code does a pretty good job. Thanks.
However getting the cuts to the millisecond needs some more work!
The problem is standard cue file Index points are specified in MM:SS:FF format, where FF are frames.
And ffmpeg wants fractions of a second to make the cuts.
Also If we want to avoid re-encoding, which is sensible, ffmpeg has to cut at frame boundaries, which it is cautious about, so adds a couple of frames to ensure nothing is excluded. (Typically .026 secs a go for mp3).

If the cue file was designed for CD rather than an MP3 file, which is usual, then each FF is 1/75 sec, so the calculation to get ms from FF is easy, but the problem with ffmpeg remains.

If you want to get this spot on, the frame size in ms will need to be calculated (The typical MP3 (Layer III, version 1) has 1152 samples per frame and the sample rate is (commonly) 44100 hz.) and all valid audio frames will have to read and written 1 by 1 to the desired duration.

Alternatively mp3directcut (windows free) will read a cue file, and split the audio without reencoding, and works to the frame level, but I have never checked exactly how accurate this is. There may be better tools. I'd love to know.

@OscarL
Copy link

OscarL commented Mar 3, 2021

Alright, following @holesocks advise (Thanks!), I've forked this gist, see here, and made the following changes.

  • fixed location of the bitrate parameter.
  • Support both Python 2 & 3.
  • Fixed track duration so it does not cuts tracks short, nor starts them early (for the usual case of CD-Images as .flac files at least).

I've kept the changes to the minimum, so its easy to compare to the original (and anyone can use it as a base).

I'll probably rewrite an over-engineered version (call ffmpeg, flac-to-flac splits, selectable output format. error checking, etc) just to exercise a bit my rusty fingers.

@holesocks
Copy link

Thanks!
ffmpeg will work out what output to produce going by the filename extension. Your program could split aac and wav files too (don't know about flac) with very few changes, Just an idea!
Generally mp3's are just not designed to be cut at the frame level - data can overflow from one frame to the next for one. ffmpeg probably tidies up the ends as best it can to avoid audible imperfections, but at the expense of a little loss of precision.
According to the hydrogenaud.io specialists, pcutmp3 is the best tool to cut mp3s that will deal with overflow and gapless play. It is a java program and it is unclear if it is still supported so I didn't test it.
That's me done - cheerio.

@OscarL
Copy link

OscarL commented Mar 10, 2021

@holesocks: that's the idea! The script I have in progress it's called "cue_splitter.py", and it let's you select format/codec/bitrate/etc... albeit personally will only use it to do .flac to .flac splitting (particularly due to your comments regarding frame-level splitting).

Using ffmpeg you can do splitting without re-conversion, but there's a bug in ffmpeg, and the split files end up all having the right size, but the wrong duration in them (and tend to confuse some media players). I just resort to "flac 2 flac" with the default compression level (fast enough even on my old CPU) and files work ok.

I've intentionally kept this gist as close to the original as possible (while fixing the most glaring errors), because maybe other fellows can do like me... and use it for practicing their programming with a simple, but concrete project.

Thanks for your feedback, and greetings from Argentina! :-)

Copy link

ghost commented May 7, 2021

Thank you 🙏

@jeanslack
Copy link

I made this FFmpeg based command line utility https://github.com/jeanslack/FFcuesplitter, it has some interesting options and is flexible enough for most needs, the results seem accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment