Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Audio Compression for Voiceover

About compression

Audio compression is used to reduce the dynamic range of a recording. Dynamic range is the difference between the loudest and softest parts of an audio signal. It was originally used to guard against defects when cutting wax and vinyl phonograph records, but generally became useful as a way of increasing the loudness of an audio recording without achieving distortion.

The goal of most compression applications is to increase the amplitude of the softest parts of a recording, without increasing the amplitude of the loudest parts.

Compressor anatomy

Compressors generally all have the same conceptual parts. However, not all compressors present variable controls for all parts to the user. If you don't see all of your compressor's controls here, there's a chance it either has a fixed value (and no control), or is named something else:

  • Threshold: The level above which the compressor activates
  • Ratio/Amount: The input to output ratio of gain reduction
  • Output Gain/Makeup Gain: the output level after compression
  • Input Level: The level of input signal into the compressor
  • Attack Time: the length in time it takes for the compressor to begin reducing gain after the signal has crossed above the threshold
  • Release Time: the lenght in time it takes for the compressor to stop reducing gain after the signal has crossed below the threshold
  • Knee: a degree of smoothing in the output graph between the uncompressed and compressed ranges

If nothing else, your compressor will have the first three controls, or something like them.

In a typical application, an audio signal is applied to a compressor, optionally boosted or cut using the Input Level. When it crosses above the Threshold level, the compressor waits for Attack Time to pass before reducing the input level by Ratio amount. Once the signal level has crossed back below the Threshold, the compressor waits for Release Time to pass before gain reduction stops. Finally, the signal is generally boosted by the Output Gain level before leaving the compressor.

Audio terminology

Here are a few terms we'll use:

  • Level/Gain/Loudness/Amplitude - The loudness of an audio signal, expressed in decibels
  • Decibels/dB - A standard logarithmic representation of the change in loudness, measured against a fixed reference. In electrical representations, an increase in 6dB will be perceived as "twice as loud" to a listener. For our purposes, we'll use the digital Full Scale reference, and any use of "dB" will be short for "dbFS".
  • Full Scale - in digital applications, the Full Scale reference indicates the range between a byte of all zeros and all ones. Digital recording reproduces 6 decibels per bit, so a 16 bit recording will have 96dB of resolution, described between -96dB and 0dB, where 0 is the loudest level.
  • Noise floor - the quietest non-program part of a signal, typically background room tone, tape hiss in analog recordings, or low-amplitude quantization errors in digital.

Compressing for voiceover

The typical goal of compression for voiceover is to increase the intelligibility of the quietest parts of the signal. We'll do this by reducing the level of the loudest parts of the signal, then increasing the level of the whole signal by the same amount we reduced in the loudest parts.

Here's a typical workflow:

  1. Zero the compressor: set the Threshold control to 0dB (no threshold) and the Ratio to 1:1 (no compression). Set the Attack and Release time to reasonable medium levels - I use 10ms and 30ms, respectively. There should be no compression at this point.
  2. Observe your input: using your system's metering, observe the average level of your signal. In a proper recording, the average signal level should be at around -18dB, and the loudest parts might be up to -6dB.
  3. Set the threshold: reduce the threshold level from 0dB (no threshold) to between 6dB and 12dB below the average level you've observed. For most voiceovers, this should be between -20dB and -30dB.
  4. Increase the Ratio: increase compression, and thus gain reduction, by increasing the Ratio control. Most voiceovers will sound natural with between 2:1 and 4:1 gain reduction, but 3:1 is a good reference point - that is to say, one decibel comes out for every three that go in.
  5. Set the Attack time: change the Attack Time until the voiceover sounds natural. Too fast and the signal will sound squashed and flat; too slow and there will be no audible gain reduction. Since speaking voices are typically fast - words transition between phonetics quickly - an attack time that allows the initial phonic to come out before the compressor activates is usually more intelligible. Most speech sounds natural with an Attack time of 1-5ms.
  6. Set the Release time: Reduce the Release Time until you don't hear "pumping" anymore. Pumping occurs when the latter, quieter phonetics in a word are still being compressed, producing a lumpy, wavy dynamic. You want gain reduction to stop soon after a word transitions to the quieter portion. A Release Time of 10-15ms usually sounds natural, but can be as high as 30ms.

At this point, you've dialed in the initial settings you're likely to use. Any change will be small from here.

Hopefully your compressor has a Gain Reduction meter. This should show how much compression is happening once the signal crosses the threshold, measured in decibels. A normal, natural-sounding amount of gain reduction is between 3dB and 10dB; I shoot for 6dB and work from there.

If you don't have a gain reduction meter, you'll need to judge it by looking at the compressor's Output Meter (if it has one), or your system's Mix Bus Meter, the final mix level meter. Turn the compressor on and off during playback, and try to approximate the level of gain reduction you've achieved.

Finally, turn the compressor's Output Gain up about the same amount of gain reduction you've achieved, to make up for it (this control is often called "Makeup gain" for that reason).

If you've done this right, you will have reduced the loudest parts of the signal by some amount, and increased the entire signal by the same amount, making the quietest parts louder. If I've reduced the loudest parts of the signal by around 6dB, I would apply 6dB of makeup gain, for example.

You can test this by listening to your signal with the compressor turned on and then off. Once makeup gain is applied, the loudest parts of the signal should be about the same level with the compressor turned on, but the quietest parts should be louder.

Here are those go-to reference settings again:

  • Threshold: -26db
  • Ratio: 3:1
  • Attack: 1ms
  • Release: 10ms
  • Output: +6dB

Fixing problems

Is there too much difference between loud and soft? Decrease the Threshold, so that more of the signal goes above the threshold. You might also reduce the Ratio when decreasing the Threshold - this will result in less compression, but over a wider part of the signal level. Remember to reset your Makeup Gain.

Can you not understand the words when they're played over music? You could increase compression by increasing the Ratio (and then the Makeup Gain). As the Ratio increases to 100:1 (practically Infinity:1), no additional output gain occurs above the Threshold, and the Compressor becomes a Limiter. This effect is used by radio DJ's to ensure total intelligibility, at the cost of sounding unnatural.

Does everything sound "flat"? If every part of every word sounds squished, your compressor is active on every part of the signal. Turn the Threshold up, so that only the louder parts are activating the compressor.

Do you hear no change at all? Your Attack Time might be too slow. Every word has gone above, and then below, the Threshold before your compressor could do anything. Make the Attack Time faster to catch words just after they've gone above the threshold.

Do you hear "pumping"? Your Release Time is probably too long. Make it faster so that compression stops before the next word occurs.

Does it sound "spikey" or uncontrolled durning loud portions? Do you hear rapid changes in air pressure? The Release Time might be too short. Allow the compressor to stay active long enough to smooth out whole words or phrases. Increasing the Release Time can make the signal sound more "smooth". A "spikey" sound may also indicate that you need to increase compression by lowering the Threshold, raising the Ratio, or both.

Does your system not have an Output Gain control? Some compression plugins only have a "Preserve Volume" boolean control, which automatically sets the output gain based on the gain reduction you've achieved. I've seen this in Final Cut Pro and in Logic Audio. This is handy, but a little blunt. Your best bet is to use the track's fader control to simulate Output Gain, since they are essentially identical in the digital world.

Does your system not have a Threshold? Some plugins that mimic older gear (the Bomb Factory BF76, which clones the Urei 1176, comes to mind) have a fixed threshold, and you use the Input Level control to raise the signal level above the threshold. This is a little less intuitive, but comes naturally with practice. In these systems, you normally wouldn't change the Output Level control, since raising the Input Level has the effect of making up for gain reduction.

The "noise floor"

Since we're using a compressor to reduce dynamic range and increase the total signal level of a recording, compression inevitably results in increasing the quietest non-program parts of the signal, which we refer to as the "noise floor". This is undesirable, but unavoidable. Combatting this with other electronic effects (either an expander or noise reduction system) is hit-or-miss, and can produce unwanted artifacts. To achieve the best possible result, record in a quiet environment at an appropriate input level.

How loud should I record?

To achieve the best possible effect out of your compressor, you should record at the appropriate signal level.

In the analog world, gear was designed around optimum voltages between distorting the circuits and the noise floor. We had VU (Volume Unit) meters to guide us to the optimum levels to record at: 0 VU was the optimum target level, where the circuits operated at maximum efficiency, and +6-8 VU was pushing distortion. This extra 6-8 VU was considered your "headroom" before distortion.

Since digital meters display the full scale between 0dBFS and the lowest reproducible signal (usually -96dBFS), choosing an input signal is more difficult. However, digital audio gear was designed to replace analog equivalents, and the converters are still tuned around optimum voltages for analog capture and reproduction. Most systems use -18dBFS as an equivalent reference point for 0 VU. This means you have 18dB of "headroom" before you clip the input signal, which is plenty.

With that in mind, the average level of your recordings should be around -18dBFS, with peaks of between -6dBFS and -12dBFS.

Final thoughts

Gain control is the single most important concept in audio mixing. It's also hard to master.

Pay attention to the changes in sound pressure around your eardrums - this is unnatural at first, but becomes second nature. As dynamic range is reduced (i.e. compression is applied), the changes in sound pressure become more gradual, and are perceptibly smoother-sounding. Signals with the lowest dynamic range sound flat. The opposite is also true: as dynamic range increases, sounds gain a spikey, less smooth dynamic quality.

Copy link

howardellison commented Jul 29, 2016

Best explanation you'll ever read, and if compression obsession has taken hold you could also explore the thoughts of Robert Orban, when he was creating Optimod for broadcasters in the 70s. Subtle stuff!
Now, in 2016, software and digital comping is the norm, with lookahead. So it would be great to hear scottburton's views on the creative advantage/disadvantage of the zero attack time option when working with speech.

Copy link

Layarion commented Nov 10, 2018

I've heard of something called "wet" and "dry" compression, or something like having the original uncompressed signal run underneath the compressed signal.

does this help voice overs, or is this a niche thing not really suited for voice overs?

if it benefits voice overs, what are those benefits? what purpose or role does it fill?

Copy link

voodoorobbo commented Dec 11, 2018

Layarion, Parallel compression (what you are describing) is used a lot in music. I have been toying with it on a few projects I work on. Some voice responds well, others just don't work. GIve it a try, set up your VO bus as you normally would, then set up another bus with the same voice over signal, but squash the hell out of it, then mix it in with your main VO. When it works it makes your VO sound much fatter and fuller as well as increasing the punch! You can also play with the EQ on that bus as well for some interesting results..

Copy link

notw commented Jan 5, 2019

Agree with others: this is the best explanation on compression specific to voiceover I've ever read. The "start out at zero" approach makes a lot of sense.

Unfortunately, the voiceover/narration presets in plugins come, well, pre set, and it can be overwhelming to know which one or many of the parameters needs to be changed. On top of that, most of the compression advice that I've read deals with music, where the dynamics are much broader and there are mix & master considerations. And finally, it seems some advice givers use the subject as a platform for humble-bragging about all the expensive hardware components in their signal chain.

Great article here. Just what I've been looking for. Thanks.

Copy link

PiotBar commented Sep 24, 2019

This article is interesting and helpful.

Copy link

ellisgl commented Mar 14, 2021

sing the punch! You can also play with the EQ on that bus as well for some interesting results..

What settings have you seen do well with a lowered registered voice, say 80-100hz?

Copy link

howardellison commented Jan 12, 2022

Would be so glad to hear an equally clear exposition of lookahead as this is now familiar in good software compressors - such as the Fabfilter Pro C2. It's true that a touch of overshoot spike at the start of a word can add impact, maybe compensate for reduced dynamics, but when the aim is smooth/natural then surely a few millisec anticipation helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment