Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save ScribbleGhost/54ad17da006e8bba4a1612bd6a64571c to your computer and use it in GitHub Desktop.
Save ScribbleGhost/54ad17da006e8bba4a1612bd6a64571c to your computer and use it in GitHub Desktop.

Converting audio to AAC with Fraunhofer FDK AAC (libfdk_aac) in FFmpeg

Check if you have an FFmpeg build supporting libfdk_aac

Run:

ffmpeg -hide_banner -h encoder=libfdk_aac

If you have an FFmpeg version that does not include libfdk_aac, you will see this:

Codec 'libfdk_aac' is not recognized by FFmpeg.

If you have a build that includes libfdk_aac you will see this:

Encoder libfdk_aac [Fraunhofer FDK AAC]:
    General capabilities: delay small 
    Threading capabilities: none
    Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000
    Supported sample formats: s16
    Supported channel layouts: mono stereo 3.0 4.0 5.0 5.1 7.1(wide) 7.1
libfdk_aac AVOptions:
  -afterburner       <int>        E...A...... Afterburner (improved quality) (from 0 to 1) (default 1)
  -eld_sbr           <int>        E...A...... Enable SBR for ELD (for SBR in other configurations, use the -profile parameter) (from 0 to 1) (default 0)
  -eld_v2            <int>        E...A...... Enable ELDv2 (LD-MPS extension for ELD stereo signals) (from 0 to 1) (default 0)
  -signaling         <int>        E...A...... SBR/PS signaling style (from -1 to 2) (default default)
     default         -1           E...A...... Choose signaling implicitly (explicit hierarchical by default, implicit if global header is disabled)
     implicit        0            E...A...... Implicit backwards compatible signaling
     explicit_sbr    1            E...A...... Explicit SBR, implicit PS signaling
     explicit_hierarchical 2            E...A...... Explicit hierarchical signaling
  -latm              <int>        E...A...... Output LATM/LOAS encapsulated data (from 0 to 1) (default 0)
  -header_period     <int>        E...A...... StreamMuxConfig and PCE repetition period (in frames) (from 0 to 65535) (default 0)
  -vbr               <int>        E...A...... VBR mode (1-5) (from 0 to 5) (default 0)

How to get an FFmpeg build with libfdk_aac

FFmpeg supports two AAC-LC encoders (aac and libfdk_aac) and one HE-AAC (v1/2) encoder (libfdk_aac). The license of libfdk_aac is not compatible with GPL, so the GPL does not permit distribution of binaries containing incompatible code when GPL-licensed code is also included. Therefore this encoder have been designated as "non-free", and you cannot download a pre-built ffmpeg that supports it. This can be resolved by compiling ffmpeg yourself.

My way of building a custom FFmpeg

I setup a clean install of Windows 10 in a VM and run https://github.com/m-ab-s/media-autobuild_suite

My go-to preset for highest quality regardless of file size

ffmpeg -i input.wav -ac 2 -c:a libfdk_aac -cutoff 20000 -afterburner 1 -vbr 0 output.m4a

-ac 2 Downmix to a stereo track

-c:a libfdk_aac Use Fraunhofer FDK AAC (libfdk_aac).

-cutoff 20000 libfdk_aac defaults to a low-pass filter of around 14kHz. 20000 is the maximum available.

-afterburner 1 Afterburner is "a type of analysis by synthesis algorithm which increases the audio quality but also the required processing power." Fraunhofer recommends to always activate this feature. 1 = On and 0 = Off.

-vbr 0 - Setting VBR (variable bitrate) to 0 means libfdk_aac will try to set the maximum available CBR (constant bitrate) for the stream. This results in the best theoretical quality no matter if you choose VBR or CBR. This will increase the filesize though.

@ddelange
Copy link

the only that matters when stepping down from lossless to lossy) is compare the perceived quality with double-blind tests

very true! right now I'm only visually comparing the spectrum analysis.

and you're right, fdkaac CBR@256 and VBR@5 have visually almost the same spectrum (VBR loses some decibels at the higher frequency ranges), with VBR having considerably lower file size :)

@joshbarrass
Copy link

joshbarrass commented Sep 14, 2023

Fraunhofer CBR@256 and VBR@5 have very similar spectrums when specifying the cut-off frequency manually. However, the ffmpeg documentation notes that this can audibly reduce audio quality (emphasis mine):

Note: libfdk_aac defaults to a low-pass filter of around 14kHz (​details). If you want to preserve higher frequencies, use -cutoff 18000. Adjust the number to the upper frequency limit only if you need to; keeping in mind that a higher limit may audibly reduce the overall quality.

If you follow this advice and retain the default cut-off frequency (17kHz for CBR, disabled for VBR 5), then VBR 5 has a clear advantage in this regard (in fact, I don't think there's any way the cutoff can be disabled for CBR, so VBR 5 should also have an advantage over CBR@256k as well). Of course, it all boils down to the perceived quality in the end, and in that respect my advice is to err on the side of caution and follow the settings that have performed better in blind tests.

Strangely, your spek results seem inconsistent with mine. Your result for VBR 5 when not specifying a cutoff seems to have a clear cutoff at 19kHz, which I don't see in my own results. There's still some slight loss of volume in VBR, but I expected that to be the case based on my (rudimentary) understanding of how audio compression works.

@joshbarrass
Copy link

You could also try the HE/HE_v2 settings in Fraunhofer. From my own tests, they tend to yield files around half/a quarter the size with the same compression settings. Add either -profile:a aac_he or -profile:a aac_he_v2 to your ffmpeg command. From the docs:

As always, experiment to see what works for your ears.

I hope this helps.

@ddelange
Copy link

ddelange commented Sep 14, 2023

When I leave out the cutoff in the VBR@5 command above, the output is definitely worse 🤔

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 -cutoff 20000 output.m4a
10.4 MB
image

ffmpeg -i input.wav -vn -ac 2 -c:a libfdk_aac -vbr 5 output.m4a
10.5 MB
image

Strangely, your spek results seem inconsistent with mine.

That's weird indeed. I compiled in March or so, so all should be relatively recent:

$ ffmpeg -h encoder=libfdk_aac
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 12.0.0 (clang-1200.0.32.29)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-libsrt --enable-libxvid --enable-nonfree
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Encoder libfdk_aac [Fraunhofer FDK AAC]:
    General capabilities: dr1 delay small
    Threading capabilities: none
    Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000
    Supported sample formats: s16
    Supported channel layouts: mono stereo 3.0 4.0 5.0 5.1 6.1(back) 7.1(wide) 7.1 7.1(top)
libfdk_aac AVOptions:
  -afterburner       <int>        E...A...... Afterburner (improved quality) (from 0 to 1) (default 1)
  -eld_sbr           <int>        E...A...... Enable SBR for ELD (for SBR in other configurations, use the -profile parameter) (from 0 to 1) (default 0)
  -eld_v2            <int>        E...A...... Enable ELDv2 (LD-MPS extension for ELD stereo signals) (from 0 to 1) (default 0)
  -signaling         <int>        E...A...... SBR/PS signaling style (from -1 to 2) (default default)
     default         -1           E...A...... Choose signaling implicitly (explicit hierarchical by default, implicit if global header is disabled)
     implicit        0            E...A...... Implicit backwards compatible signaling
     explicit_sbr    1            E...A...... Explicit SBR, implicit PS signaling
     explicit_hierarchical 2            E...A...... Explicit hierarchical signaling
  -latm              <int>        E...A...... Output LATM/LOAS encapsulated data (from 0 to 1) (default 0)
  -header_period     <int>        E...A...... StreamMuxConfig and PCE repetition period (in frames) (from 0 to 65535) (default 0)
  -vbr               <int>        E...A...... VBR mode (1-5) (from 0 to 5) (default 0)

@joshbarrass
Copy link

Mine was compiled sometime last year, so yours should be more up-to-date than mine. Let me build a fresh copy and see if that makes any difference.

@ddelange
Copy link

Thanks for sparring with me :) Wonder if we can find the source of the discrepancy!

https://http.cat/417

@joshbarrass
Copy link

When using the latest version of ffmpeg/libfdk, I see the same ~19kHz cutoff.

Digging into libfdk's source code and assuming I'm understanding correctly, this diff seems to show that the cutoff was changed in 2020 (my version of ffmpeg must be older than I thought!) to 19293Hz in order to improve audio quality. Page 17 of this document explains why, and the answer agrees with my assumption earlier in the thread: the higher the cutoff, the more frequencies you have to represent with the same amount of data, giving you lower quality overall for the sake of saving near-imperceptible frequencies. The cutoffs are chosen based on listening tests to maximise perceived quality. As with the ffmpeg documentation, they also strongly recommend keeping the default cutoffs for this reason. If they've changed these cutoffs, they are presumably working from more recent listening tests that show a better result.

The more you know :)

@ddelange
Copy link

What a rabbit hole :) awesome find!

@ScribbleGhost
Copy link
Author

I haven't had the time to comment on any of this, but I am glad to see this is relevant to you guys. For me, I am not that interested in which AAC encoder produces better quality than the other. All lossy encoders produce low quality. If I want quality I go for FLAC. But that's just me 🐵

@AdventurerRussia
Copy link

image
can you tell me where it's better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment