Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save ScribbleGhost/54ad17da006e8bba4a1612bd6a64571c to your computer and use it in GitHub Desktop.
Save ScribbleGhost/54ad17da006e8bba4a1612bd6a64571c to your computer and use it in GitHub Desktop.

Converting audio to AAC with Fraunhofer FDK AAC (libfdk_aac) in FFmpeg

Check if you have an FFmpeg build supporting libfdk_aac

Run:

ffmpeg -hide_banner -h encoder=libfdk_aac

If you have an FFmpeg version that does not include libfdk_aac, you will see this:

Codec 'libfdk_aac' is not recognized by FFmpeg.

If you have a build that includes libfdk_aac you will see this:

Encoder libfdk_aac [Fraunhofer FDK AAC]:
    General capabilities: delay small 
    Threading capabilities: none
    Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000
    Supported sample formats: s16
    Supported channel layouts: mono stereo 3.0 4.0 5.0 5.1 7.1(wide) 7.1
libfdk_aac AVOptions:
  -afterburner       <int>        E...A...... Afterburner (improved quality) (from 0 to 1) (default 1)
  -eld_sbr           <int>        E...A...... Enable SBR for ELD (for SBR in other configurations, use the -profile parameter) (from 0 to 1) (default 0)
  -eld_v2            <int>        E...A...... Enable ELDv2 (LD-MPS extension for ELD stereo signals) (from 0 to 1) (default 0)
  -signaling         <int>        E...A...... SBR/PS signaling style (from -1 to 2) (default default)
     default         -1           E...A...... Choose signaling implicitly (explicit hierarchical by default, implicit if global header is disabled)
     implicit        0            E...A...... Implicit backwards compatible signaling
     explicit_sbr    1            E...A...... Explicit SBR, implicit PS signaling
     explicit_hierarchical 2            E...A...... Explicit hierarchical signaling
  -latm              <int>        E...A...... Output LATM/LOAS encapsulated data (from 0 to 1) (default 0)
  -header_period     <int>        E...A...... StreamMuxConfig and PCE repetition period (in frames) (from 0 to 65535) (default 0)
  -vbr               <int>        E...A...... VBR mode (1-5) (from 0 to 5) (default 0)

How to get an FFmpeg build with libfdk_aac

FFmpeg supports two AAC-LC encoders (aac and libfdk_aac) and one HE-AAC (v1/2) encoder (libfdk_aac). The license of libfdk_aac is not compatible with GPL, so the GPL does not permit distribution of binaries containing incompatible code when GPL-licensed code is also included. Therefore this encoder have been designated as "non-free", and you cannot download a pre-built ffmpeg that supports it. This can be resolved by compiling ffmpeg yourself.

My way of building a custom FFmpeg

I setup a clean install of Windows 10 in a VM and run https://github.com/m-ab-s/media-autobuild_suite

My go-to preset for highest quality regardless of file size

ffmpeg -i input.wav -ac 2 -c:a libfdk_aac -cutoff 20000 -afterburner 1 -vbr 0 output.m4a

-ac 2 Downmix to a stereo track

-c:a libfdk_aac Use Fraunhofer FDK AAC (libfdk_aac).

-cutoff 20000 libfdk_aac defaults to a low-pass filter of around 14kHz. 20000 is the maximum available.

-afterburner 1 Afterburner is "a type of analysis by synthesis algorithm which increases the audio quality but also the required processing power." Fraunhofer recommends to always activate this feature. 1 = On and 0 = Off.

-vbr 0 - Setting VBR (variable bitrate) to 0 means libfdk_aac will try to set the maximum available CBR (constant bitrate) for the stream. This results in the best theoretical quality no matter if you choose VBR or CBR. This will increase the filesize though.

@joshbarrass
Copy link

Setting VBR (variable bitrate) to 0 means libfdk_aac will try to set the maximum available CBR (constant bitrate) for the stream. This results in the best theoretical quality no matter if you choose VBR or CBR. This will increase the filesize though.

Do you have a source on this? I'm more than willing to be proved wrong, but based on the Hydrogen Audio Wiki (emphasis theirs):

In general, however, for most types of input, assuming identical input, identical encoding methods, and sensible targets for VBR quality and bitrate bounds, VBR will almost always produce equal or better perceived-quality results than CBR for files of the same size or average bitrate, and this has been demonstrated in numerous double-blind listening tests.

In addition, using -vbr 5 disables the cutoff (source), potentially preserving more of the sound. Based on my own tests, this seems to check out; using -vbr 0 (without manually specifying a bitrate with -b:a) tends to yield a bitrate around half that of -vbr 5, and seems to give the same bitrate regardless of the chosen cutoff (and in fact the ffmpeg docs warn against increasing the cutoff for this reason, as it can audibly reduce audio quality (source), presumably because you're cramming a larger frequency range into the same amount of data). Using VBR gives a much higher bitrate (closer to the CBR bitrate I would associate with "perceptually lossless") and preserves some of the high-frequency information above 20kHz which was lost in the CBR encoding following your settings.

Unless you have a specific need for CBR (compatibility, for example), I think using -vbr 5 is the best option to preserve quality. If you do want to use CBR, I would suggest specifying a bitrate yourself (as a general rule of thumb, -b:a 256k can often be considered "perceptually lossless", but can give you some pretty large files). That said, I may be wrong, and if you disagree or can point to some documentation somewhere that contradicts what I've found, I'd love to hear your thoughts on the matter.

@ddelange
Copy link

Hi 👋

Some testing I did using spek for my tool yt:

tl;dr:

  • aac_at outperforms libfdk_aac
    • same size but much more detail at CBR@256
    • best libfdk_aac setting is CBR@256 (VBR performs consistently worse)
    • libfdk_aac CBR@256 (12.6 MB) contains even less detail than aac_aat @ aac_he_v2 (3.1 MB) 🔥
  • aac_aat @ aac_he_v2 is really impressive size/quality-wise
  • best setting for me is aac_aat CBR@256, or VBR@q2 for 9% size savings with virtually no degradation

input.wav

84.9 MB
image

libfdk_aac (--enable-nonfree)

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -cutoff 20000 -b:a 256k -afterburner 1 -vbr 0 output.m4a
12.6 MB
image

-afterburner 1 -vbr 0 is the default setting, can be removed

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -b:a 256k -afterburner 1 -vbr 0 output.m4a
12.6 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -b:a 256k -afterburner 1 output.m4a
12.6 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -b:a 256k output.m4a
12.6 MB
image

VBR performs consistently worse

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 -cutoff 20000 output.m4a
10.4 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 1 -cutoff 20000 output.m4a
7.3 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 output.m4a
10.5 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 1 output.m4a
5.2 MB
image

aac_at (--enable-audiotoolbox)

CBR

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -b:a 256k output.m4a
12.6 MB
image

VBR

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -q:a 0 -aac_at_mode vbr output.m4a
15.6 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -q:a 1 -aac_at_mode vbr output.m4a
13.6 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -q:a 2 -aac_at_mode vbr output.m4a
11.5 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -q:a 3 -aac_at_mode vbr output.m4a
10.5 MB
image

aac_he

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -profile:a 4 output.m4a
4.2 MB
image

aac_he_v2

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -profile:a 28 output.m4a
3.1 MB
image

ABR

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -b:a 256k -aac_at_mode abr output.m4a
12.6 MB
image

CVBR

ffmpeg -i input.wav -vn -ac 2 -c:a aac_at -b:a 256k -aac_at_mode cvbr output.m4a
13.6 MB
image

@joshbarrass
Copy link

I agree that aac_at is better than Fraunhofer, if you have access to it, which sadly restricts it to Mac (or Windows with a bit of trickery). That said, I'm not sure I understand your conclusions about libfdk_aac, namely

VBR performs consistently worse

By what measure does VBR perform consistently worse? From your spek outputs, VBR is able to retain a greater range of frequencies than CBR with a smaller file size. The only measure I can see where CBR beats it is average bitrate — which is kind of the point of using VBR, to get better quality at a lower average bitrate — and the bitrate is not a measure of quality except when comparing CBR files. I'll cite HydrogenAudio again:

In general, however, for most types of input, assuming identical input, identical encoding methods, and sensible targets for VBR quality and bitrate bounds, VBR will almost always produce equal or better perceived-quality results than CBR for files of the same size or average bitrate, and this has been demonstrated in numerous double-blind listening tests.

Comparing them with spek or directly comparing bitrates is like comparing apples and oranges when it comes to CBR vs VBR. The bitrate doesn't really tell you anything in this comparison, and spek is only useful in checking the preserved frequency range. The only thing you can do (and, realistically, the only that matters when stepping down from lossless to lossy) is compare the perceived quality with double-blind tests — which have demonstrated that VBR generally outperforms CBR for Fraunhofer.

@ddelange
Copy link

the only that matters when stepping down from lossless to lossy) is compare the perceived quality with double-blind tests

very true! right now I'm only visually comparing the spectrum analysis.

and you're right, fdkaac CBR@256 and VBR@5 have visually almost the same spectrum (VBR loses some decibels at the higher frequency ranges), with VBR having considerably lower file size :)

@joshbarrass
Copy link

joshbarrass commented Sep 14, 2023

Fraunhofer CBR@256 and VBR@5 have very similar spectrums when specifying the cut-off frequency manually. However, the ffmpeg documentation notes that this can audibly reduce audio quality (emphasis mine):

Note: libfdk_aac defaults to a low-pass filter of around 14kHz (​details). If you want to preserve higher frequencies, use -cutoff 18000. Adjust the number to the upper frequency limit only if you need to; keeping in mind that a higher limit may audibly reduce the overall quality.

If you follow this advice and retain the default cut-off frequency (17kHz for CBR, disabled for VBR 5), then VBR 5 has a clear advantage in this regard (in fact, I don't think there's any way the cutoff can be disabled for CBR, so VBR 5 should also have an advantage over CBR@256k as well). Of course, it all boils down to the perceived quality in the end, and in that respect my advice is to err on the side of caution and follow the settings that have performed better in blind tests.

Strangely, your spek results seem inconsistent with mine. Your result for VBR 5 when not specifying a cutoff seems to have a clear cutoff at 19kHz, which I don't see in my own results. There's still some slight loss of volume in VBR, but I expected that to be the case based on my (rudimentary) understanding of how audio compression works.

@joshbarrass
Copy link

You could also try the HE/HE_v2 settings in Fraunhofer. From my own tests, they tend to yield files around half/a quarter the size with the same compression settings. Add either -profile:a aac_he or -profile:a aac_he_v2 to your ffmpeg command. From the docs:

As always, experiment to see what works for your ears.

I hope this helps.

@ddelange
Copy link

ddelange commented Sep 14, 2023

When I leave out the cutoff in the VBR@5 command above, the output is definitely worse 🤔

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 -cutoff 20000 output.m4a
10.4 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 output.m4a
10.5 MB
image

Strangely, your spek results seem inconsistent with mine.

That's weird indeed. I compiled in March or so, so all should be relatively recent:

$ ffmpeg -h encoder=libfdk_aac
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 12.0.0 (clang-1200.0.32.29)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-libsrt --enable-libxvid --enable-nonfree
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Encoder libfdk_aac [Fraunhofer FDK AAC]:
    General capabilities: dr1 delay small
    Threading capabilities: none
    Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000
    Supported sample formats: s16
    Supported channel layouts: mono stereo 3.0 4.0 5.0 5.1 6.1(back) 7.1(wide) 7.1 7.1(top)
libfdk_aac AVOptions:
  -afterburner       <int>        E...A...... Afterburner (improved quality) (from 0 to 1) (default 1)
  -eld_sbr           <int>        E...A...... Enable SBR for ELD (for SBR in other configurations, use the -profile parameter) (from 0 to 1) (default 0)
  -eld_v2            <int>        E...A...... Enable ELDv2 (LD-MPS extension for ELD stereo signals) (from 0 to 1) (default 0)
  -signaling         <int>        E...A...... SBR/PS signaling style (from -1 to 2) (default default)
     default         -1           E...A...... Choose signaling implicitly (explicit hierarchical by default, implicit if global header is disabled)
     implicit        0            E...A...... Implicit backwards compatible signaling
     explicit_sbr    1            E...A...... Explicit SBR, implicit PS signaling
     explicit_hierarchical 2            E...A...... Explicit hierarchical signaling
  -latm              <int>        E...A...... Output LATM/LOAS encapsulated data (from 0 to 1) (default 0)
  -header_period     <int>        E...A...... StreamMuxConfig and PCE repetition period (in frames) (from 0 to 65535) (default 0)
  -vbr               <int>        E...A...... VBR mode (1-5) (from 0 to 5) (default 0)

@joshbarrass
Copy link

Mine was compiled sometime last year, so yours should be more up-to-date than mine. Let me build a fresh copy and see if that makes any difference.

@ddelange
Copy link

Thanks for sparring with me :) Wonder if we can find the source of the discrepancy!

https://http.cat/417

@joshbarrass
Copy link

When using the latest version of ffmpeg/libfdk, I see the same ~19kHz cutoff.

Digging into libfdk's source code and assuming I'm understanding correctly, this diff seems to show that the cutoff was changed in 2020 (my version of ffmpeg must be older than I thought!) to 19293Hz in order to improve audio quality. Page 17 of this document explains why, and the answer agrees with my assumption earlier in the thread: the higher the cutoff, the more frequencies you have to represent with the same amount of data, giving you lower quality overall for the sake of saving near-imperceptible frequencies. The cutoffs are chosen based on listening tests to maximise perceived quality. As with the ffmpeg documentation, they also strongly recommend keeping the default cutoffs for this reason. If they've changed these cutoffs, they are presumably working from more recent listening tests that show a better result.

The more you know :)

@ddelange
Copy link

What a rabbit hole :) awesome find!

@ScribbleGhost
Copy link
Author

I haven't had the time to comment on any of this, but I am glad to see this is relevant to you guys. For me, I am not that interested in which AAC encoder produces better quality than the other. All lossy encoders produce low quality. If I want quality I go for FLAC. But that's just me 🐵

@AdventurerRussia
Copy link

image
can you tell me where it's better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment