Feature Article - "Dialogue Normalization: Friend or Foe" - June, 2000 (Updated August, 2001) Brian Florian
There is more to Dolby Digital than just raw audio information. That digital stream flowing into your decoder has other bits of information - called "Metadata" - along for the ride. Metadata is information your decoder can use to do certain things better, like downmixing the soundtrack to two channels or changing the dynamic range. One piece of Metadata which has gotten a pretty bad rap is Dialogue Normalization. In their 30 year history, Dolby has never been known to add something that would deteriorate audio quality, so I started to investigate. Like so many things in this world, Dialnorm, as we’ll call it for short, is somewhat misunderstood.
From the name, you might think Dialnorm affects the level of dialogue with respect to the other channels or content. This is not at all true. As we will learn, the balance of the mix (loudness) from channel to channel, sound element to sound element, is entirely the result of the sound engineer's efforts, and Dialnorm does not affect this relationship.
To begin, let's lay some ground rules. First and foremost, metadata, like Dialnorm, is ancillary to the audio data. In other words, it's in the bitstream, not in the sound. For any Metadata to have any effect at all, your processor needs to use it. One example of Metadata is down-mix information. Any DVD player can make a 2-channel Pro Logic mix from a 5.1 soundtrack, and they can do it on the fly. Metadata for downmixing tells the player what the relative level of the channels should be in the fabricated Pro Logic mix. Again, this is information for your decoder's benefit, and the sound information itself is not affected by its presence.
The example of downmix metadata is an option which is at the sound engineers discretion to use. The only piece of metadata that is actually mandatory for consumer delivery is Dialnorm.
Let’s think of a soundtrack as a vertical bar. The bottom represents the quietest sound, and the top represents the loudest. The difference between the two - how tall our bar is - is called the "dynamic range" and wide that range is depends on the sound format and/or media. Conventional cassette tape can have a dynamic range of 60 dB, CD audio 80 dB, and Dolby Digital 105 dB. That is quite a difference between soft and loud. Sound engineers can be very creative with this sort of latitude, giving you whisper quiet ambiance, intelligible dialogue, involving music, and then eye opening effects like explosions or the last bang of an orchestral movement (Fig.1). And, with wide dynamic range, this is all possible without you touching your volume control.
While the above example works great for cinema sound, not all sources make such responsible use of dynamic range. For years, we have all been witness to the classic abuse of dynamic range: The television commercial (notwithstanding the fact that, without the commercials, we would not have any TV programs). A typical commercial makes all of its sounds as loud as possible to get your attention. The worst instances prompt us to reach for the remote and turn the volume down until the show is back on. The Advanced Television Systems Committee (or ATSC) selected Dolby Digital for all digital TV transmissions, so while it can be annoying now, the problem could get worse. Consider the following: You’re watching a movie at a level you are comfortable with. You can understand people talking, and the gunfights are nice and loud (isn't it great to be male?) The commercial interrupts, and a guy trying to sell you a car is as loud as the gun fight! (Fig. 2).
Dialnorm can help! It lets our Dolby Digital decoders know how loud dialogue is for each show or program. Dialnorm does this by expressing where on our dynamic range scale most of the talking occurs. It’s just a value and is not "in" the soundtrack itself. What your consumer decoder has been instructed to do with this information is adjust your volume for you such that all sound content at the Dialnorm level plays at a consistent level through your system. Consider again our evening of watching a movie. We’ve set our volume control at the desired level for the show. When the commercial comes on, the decoder sees that there is a different Dialnorm value for the commercial and over-rides the volume so that the level of sounds at the Dialnorm values play out the same. When the movie is back on, the decoder sees the change again and readjust the volume to where it was (Fig. 3).
The graph shows a pretty dramatic scenario, but you get the point. If implemented right, your days of riding the volume during commercials could be over. Dialnorm permits your decoder to make volume adjustments for you when the program changes. One very important note: You will never find it changing in the middle of a program so there is no need to worry about the artist's vision for a soundtrack being compromised.
But why is dialogue the reference from program to program? Of all the sounds you are likely to hear through your home theater system, dialogue is the one you have the most experience in hearing. Gun shots, car crashes, ray-guns, and even live music, though exciting, are not near as familiar to us as a simple conversation. Dialogue is something we hear everyday and we know what it should sound like and how loud it should be. Its no wonder then that the level of dialogue is most important. When you adjust your volume, most often you are subconsciously setting it so that dialogue sounds natural and intelligible. Once you've struck that balance, Dialnorm keeps dialogue at a constant level, program to program.
Dispelling the Myths
Before we address some of the concerns people have, lets look at a few technical detials. When we speak of how loud sounds are in a Dolby Digital soundtrack, we express the loudest level as "0 dB" and the quietest as "-105 dB". The Dialnorm value expresses the level of dialogue as how much lower it is then the peak (0 dB). So a value of "-31" indicates a point 31 dB below the peak and, incidentally, is the value at which no volume adjustment is performed by your consumer decoder. A Dialnorm value of -27 would indicate to your decoder that the dialogue is at a point 27dB below the peak, or 4dB higher than a program with a Dialnorm value of -31. Your decoder would then turn things down by 4 dB. A Dialnorm value of -25 would call for a 6 dB reduction and so on. The -27 setting "fits" movie soundtracks perfectly in that it yields a very natural level for talking and is likely the most common for movies. For decades this has been the standard level for dialogue in motion picture soundtracks.
Myth #1: Dialnorm reduces dynamic range.
The main criticism that you are likely to encounter is that Dialnorm affects dynamic range, specifically that it reduces it. Indeed, from Fig. 3, it looks like we’ve chopped off the bottom third of the commercial’s soundtrack. In truth we haven’t: We’ve only turned the volume down. Lowering the volume knob on your system would have the exact same effect. If you really want that car commercial to blast you in the face, turn up your volume (though I doubt you'd want to), and you will still have all 105 dB dynamic range. Consider again that most films will use the same -27 setting, and that other values are appropriate for other material. In other words, if you've set your playback level to one that you are comfortable watching a movie at, Dialnorm is going to maintain that comfort level for you as the program changes, but there is nothing stopping you from further adjusting your volume one way or the other. "Controlled" values of Dialnorm may someday be imposed in such areas as broadcast TV (Dialnorm was a major point of attraction when Dolby Digital was chosen as the audio format for HDTV), but you always have the final say with your volume knob.
Myth #2: Dialnorm reduces everything by 4 dB, altering reference level playback of a movie.
A common criticism is that Dialogue Normalization "normally" reduces the level of the soundtrack by about 4 dB. Reduces it as compared to what? You have to compare it to something else first, and then the question becomes: is the Dolby Digital soundtrack 4 dB too low, or is the other material 4 dB too high? Follow me on this one.
A lot of home theater enthusiasts are concerned with what is called "reference level playback". In a nutshell, you use test-tones (as may be found on such DVDs as AVIA) to set the volume to the same standard levels used in cinemas. The reason to do this is to hear the soundtrack at the level the movie makers intended. A concern naturally arises that if volume is being altered by Dialnorm, the sound engineer's vision is compromised. Reference level playback is in practice very very loud in the relatively small acoustic spaces of home, and we must caution you against it at this point. Not only do most find it uncomfortably loud, but as we noted in our article explaining the LFE channel, it can quickly bring a subwoofer to its knees. But for the record, let's press on.
The default power-on setting for Dialnorm on Dolby's professional AC-3 encoder, the DP569, is -27 because as we noted, that value is a perfect fit for movie soundtracks. True, this value calls for your decoder to attenuate its output by 4 dB. Fact is, the two most common reference DVDs, Video Essentials and AVIA, were encoded with the same -27 Dialnorm value, so their test noises are also being attenuated by 4 dB, making them a perfect reference for Dolby Digital movies. If you've set-up a system with either of these tools, then any movie you play will not be "reduced" by 4 dB as compared to the reference.
DTS soundtracks, unlike Dolby Digital, are not attenuated by 4 dB by your decoder. This means that if you've set up your system using AVIA or Video Essentials, the DTS soundtrack is actually going to play 4 dB too high. Yes, that's right. You read it right: On a system calibrated for reference level playback with Video Essentials or AVIA, DTS soundtracks play 4 dB too loud. Conversely (and to be fair), if you set up a system using DTS test noise, the Dolby Digital soundtrack will be 4 dB too low. Yet what is important here, and what I really want you to take away from this, is that regardless of what actual level you watch a movie at, relative to one another, there exists this 4 dB difference between DTS and Dolby Digital movie soundtracks played over consumer equipment. If at any time you are comparing soundtracks, you must turn your volume down when listening to the DTS track and/or raise it when listening to the Dolby Digital track (as the case may be) in order to hear the same level from both.
We should note that most THX-certified receivers and processors address this by attenuating DTS material by 4dB after the decode stage, effectively putting everything on level ground.
Myth #3: Dialnorm adversely affects S/N (signal-to-noise) ratio.
Another concern that comes up is the notion that if the volume is being adjusted by the decoder (for any reason) in the digital domain, there is a reduction in quality (S/N ratio) from bit-width reduction. Audiophiles should actually appreciate that, when performed in the digital domain, the Dialnorm adjustment is extremely accurate. Dolby Digital is capable of 24 bit resolution. Thus, a volume reduction of 4 dB would mean a bit reduction of less than 1 bit. IF (big if) the D/A converters were silent to -144 dB, you might be able to measure this. In the real world where the D/A's performance is less than that, these sort of level changes at the decoder stage will not have an effect on S/N ratio for a given volume level. The same holds true for dynamic range.
The vast majority of DVD movie releases use the -27 Dialnorm value, but we cannot categorically say that other values are 'wrong'. There have reposts of DVDs, mostly music titles, which were encoded in error with oddball Dialnorm values. Dolby continues to watch for these mistakes and alert the studios, but ultimately, it's not Dolby's fault or that of their system.
Some other functions of Dialnorm:
One has to consider that not all playback systems have state of the art audio hardware (portable DVD players comes to mind). By using Dialnorm to bring down the decoder's output a bit, they create "virtual" digital headroom and safeguard against digital clipping in consumer digital electronics.
The Dialnorm value is also used as the reference for Dynamic Range Control.*
For many purists, 'dynamic range control' is a dirty little phrase. Let's talk about it for just a moment. Most, if not all Dolby Digital decoders have the option of reducing dynamic range. Reducing dynamic range in simple terms means raising the level of quiet sounds and lowering the level of loud ones such that there is less of a delta. The classic example is watching a film while others are trying to sleep. If you just turn the volume down so that explosions won't bother the sleepers, it will likely be too low for you to hear the dialogue. By invoking dynamic range control, you will hear all of the soundtrack but not disturb others with loud peaks. Quiet playback is not the only use for DRC. When you are watching a movie on an airplane, you are bombarded with the noise of the engines. A wide dynamic range would leave quiet sounds to be drowned out by the engines. By compressing dynamic range you bring the soundtrack together to a point where it can be raised above the engine noise but not ruin your hearing (Fig. 4). Of course, the airline food might ruin your stomach, but that is not our domain.
Dialnorm's role here is that its value represents the "center" of compression. That is, sounds under it are raised, sounds above it are lowered, but sounds at its level are unchanged. An associated value that goes along with Dialnorm, and is used at this point, is the Dynamic Range Control Preset. Although Dialnorm is the center of the action where sound levels are not adjusted, a DRC preset is selected that tells the decoder how wide this "null-zone" is and can range from 5 to 20 dB (Fig. 5).
Again, its no coincidence that dialogue is the center of DRC, its level going unaffected. Next time you are watching something which is mostly talking, try turning on Dynamic Range Control. If the soundtrack was assembled properly, you should not hear any change in the level of the dialogue. Now watch something which has talking AND loud explosions. When you engage Dynamic Range Control, the dialogue stays the same, but the explosion is not near as loud. This is why its important for the Dialnorm value to be set not arbitrarily, but exactly where most of the talking occurs in the dynamic range scale.
Here's a wild, radical thought (dangerous I know). Consider again "reference" level playback that we talked about earlier. It is the volume at which each channel would play the loudest sound in a soundtrack at 105 dB. That is a very loud level (but, it keeps you from thinking about the sticky floors in the theater). It's time to put the macho image aside and admit we don't watch an entire movie at that level at home. I don't anyway. But when we play a soundtrack at anything less than reference, look at what happens (Fig. 6). The quietest sounds in the passage drop below the threshold of hearing. If you set your volume too low, you might miss some important stuff. If you invoke a mild dynamic range control, from a certain point of view, you would be better off because you would once again be hearing all the soundtrack.
In conclusion, the purpose of Dialnorm is to empower your decoder to keep the volume of dialogue content consistent from program to program. It does not limit the volume you listen at or impose a certain volume level, nor does it rob you of dynamic range. With it and other Metadata that we will talk about in future articles, Dolby Digital decoders can take one soundtrack and play it back with equal aplomb be it over headphones, the 3" speaker of a TV, or a full blown home theater sound system.
Cheers and happy listening gang!
- Brian Florian -
(Last Updated - 8/2001)
I would like to thank Roger Dressler & Mike Babbitt of Dolby Labs for the generous contribution of his knowledge for this article.
* Dynamic range control is not to be confused with the "Compressor" or "Compression" used in the music recording industry. This practice cuts down the transients of a string pluck, for example, to make the instrument sound "fat and funky".
Mixlev, another piece of AC-3 metadata, is not directly related to Dialnorm, despite some similarities.