DVD Benchmark – Part 6 – DVD-Audio


In mid-2000, after a long delay for a variety of issues, DVD-Audio (DVD-A) was given an inauspicious launch with the arrival of the first DVD-Audio players. It’s been over a year since the first players have hit the market, and this marks our first foray into a background article explaining the technology behind DVD-Audio. This article will be split into several sections.

There is a lot of excitement surrounding DVD-Audio. At this time, musical artists are experimenting with multi-channel and speaker layouts. No longer is the artist limited by 2-channels. For those that might not be aware, stereo was actually intended to take up at least 3-channels of audio. Due to the limitation of the vinyl format, stereo was ultimately restricted to 2-channels. Not only are the artists experimenting with new channels, but they are trying different speaker layouts as you will read about below.

It’s Digital Versatile Disc, and It Can Hold a Lot of Music!

There’s plenty of argument over whether the DVD acronym is for Digital Video Disc or Digital Versatile Disc, but either way, the CD is miniscule in comparison. Each layer of a DVD can store approximately 4.37 GB, more than 7 times the capacity of the Compact Disc. Since DVD-A is about music, how much music data can you store? Well, a lot isn’t descriptive, so here is a chart.

Sampling Depth
Sampling Rate
No. of Channels
Time w/o MLP
Time w/ MLP
24-Bit 96 kHz 5.1 N/A * 100 Minutes
24-Bit 96 kHz  N/A * 86 Minutes
24-Bit 96 kHz 150 Minutes 240 Minutes
24-Bit 192 kHz 75 Minutes 120 Minutes
16-Bit 44.1 kHz  420 Minutes 720 Minutes
16-Bit 44.1 kHz  840 Minutes 1500 Minutes

* Two of the formats exceed the allowable bandwidth off the drive as per the DVD specification (9.6 Mbit/second) so these formats carry an N/A. While it is possible to store 24-bit / 192 KHz without lossless compression, the discs we have access to at this time with 24-bit / 192 KHz data use MLP (Meridian Lossless Packing) compression, and we suspect that this will be the norm for all 24-bit / 192 KHz discs due to the extremely limited playback time without MLP.

We will delve into the details of what makes MLP tick later in this article.

Ingredients for a "Sort of Universal" DVD-Audio disc

In Part 3, we took a quick look the directory structures for DVD-Video (movies) discs, and touched on what the contents of a DVD-Audio disc would look like. DVD-Audio files are contained in files with extensions AOB (Audio Objects). This distinguishes them from their DVD-Video counterparts which are contained in files with extensions VOB (Video Objects). For example . . .   

Root Folder TS Video Folder TS Audio Folder
Figure 1
Figure 2
Figure 3

Figure 1 above is the root directory of Joni Mitchell’s "Both Sides Now" DVD-A. As with the earlier examples, there are two folders, AUDIO_TS and VIDEO_TS. The AUDIO_TS is used by DVD-Audio players, while the VIDEO_TS folder is used for DVD-Video players. Some DVD-A players can emulate DVD-Video players for playing back DVD-A discs’ DVD-Video tracks in the VIDEO_TS folder. Examples include the Onkyo DV-S939 and the Integra Research RDV-1 players. Both of these players have a setup option called Priority Content, which governs the behavior of the player when a DVD-Audio disc is inserted. Setting Priority Content to Audio/DVD-Audio (player dependent) means that the player will use the AUDIO_TS folder for content during DVD-Audio disc playback. Setting Priority Content to Video/DVD-Video (still player dependent) means the player will use the VIDEO_TS folder for content during DVD-Audio disc playback.

Figure 2 above is what the Video_TS directory looks like, and Figure 3 above shows the Audio_TS directory on "Both Sides Now". The directory structure for the two is similar, and the same constraints for file sizes (less than 1 GB per file) are in effect. The Video Manager runs the VIDEO_TS Files to display the menuing system for the disc. In this case, it’s a simple menu, with one type of sound track (Dolby Digital). As with standard DVD-Video discs, multiple soundtracks are possible, with DD on every disc and DTS on some discs.

The entire contents of the DD-encoded music are contained within VIDEO_TS.VOB, with 44 minutes or so of material and the images required being stored within that single file.

Contrast the small storage requirements with the DVD-Audio portion of the disc. As before, the AUDIO_TS files are used by the Audio Manager, and the ATS series of files with the AOB extension contains the DVD-Audio content. Note that the 44 minutes of DVD-Audio material takes up approximately 4 GB of data for the same program. The latest Warner Brothers releases I’ve purchased are shipping with dedicated 5.1 and stereo mixes, rather than relying on the MLP decoder core to do a fold-down to stereo. The DVDAUDIO files are the credits for the DVD Audio disc, which is a series of still images with text overlays and menu options.

Like DVD-Video, DVD-Audio makes use of the Top Menu (a.k.a. Title Menu) button. Here are the results (Figure 4a below) on a PC, which behaves as a DVD-Video player:

When you insert the disc, the audio soundtrack selection is made. Notice the segregation between DVD-Video and DVD-Audio players? You get a different Top Menu when you insert this disc into a DVD-Audio player. In this Top Menu, you select your soundtrack before you’re moved off to the Video Menu (Our term).

Access to the MLP 5.1 Surround track is blocked on my PCs software DVD player and on my DVD-Audio player when Priority Content is set to Video.

Top Menu Video Menu

Figure 4a Top Menu
Figure 4b Video Menu

Figure 4b (above) illustrates the video menu, where you can select the various options. The Tracks/Lyrics selection encompasses four separate titles, Title 1, which is the DTS 5.1 soundtrack without lyrics, just a background image, Title 2, the DTS 5.1 soundtrack with lyrics for each track, Title 3 is DD 2.0 with background image only, and Title 4 is DD2.0 with lyrics for each track. Title 5 is the music video.

It’s necessary to split the song versions with lyrics from the song versions without into separate groups, as DVD-Video playback is not designed to allow browsing of content during playback – so the Lyrics option is offered as a separate group which displays a static image.

Shown below (Figure 5) is a directory listing of the VIDEO_TS folder; you can see there are six titles in this folder. See above for the descriptions of all but Title 6, which is another copy of the Dolby Digital 2.0 encoded track.

TS Video Folder  TS Audio Folder

Figure 5
Figure 6

The DVD-Audio behavior (Figure 6, shown above) is interesting as well – there are four groups (title equivalent for DVD-A) accessible, Group 1 is DVD-A playback, Group 3 is DTS 5.1 playback. Group 4 gives DD 2.0 playback, and Group 5 is the music video. You might notice that the ATS (DVD-Audio playback files) are the only contents in this folder. To save space, and not duplicate the data, links to titles within the VIDEO_TS folder are used to access DTS, DD and the Music Video titles.

DTS so far is the only content provider allowing you to switch between DVD-Audio / DTS / Dolby Digital tracks without changing the default behavior of the player. I give kudos to them for what I consider to be correct implementation, which allows for comparison between the various tracks.

Meridian Lossless Packing (MLP) in a Nutshell

Audiophiles have long desired a higher-resolution format to take the place of CDs. "16-bit just doesn’t cut it," they said, "We want more." Many also want music that will take advantage of the high-quality multi-channel speaker systems they bought to listen to movie soundtracks. Unfortunately, 6 channels or more of high-resolution music take up way too much bandwidth to fit comfortably even on a DVD. Using lossy compression (information is lost) algorithms (like MPEG or DTS) is one solution, but to the purist, throwing away any of the music is a sin. Luckily, the creators of DVD-A were able to put together a format that allows for multi-channel, high-resolution audio, with reasonable playing times and peak data rates, using standard DVD technology, and no data loss at all (lossless).

How is it done? With MLP, which stands for Meridian Lossless Packing. We’re going to take a look at how MLP is able to pack audio so tightly without losing a single bit of the original high-resolution recording, and examine its many nifty features and benefits. Some of this will get rather technical, but Secrets’ readers love this kind of thing, so stick around – there’s a lot of cool technology behind MLP.

Compression: Lossless vs. Lossy

Intuitively, many audio and videophiles think of data compression as a bad thing. Compressing video data excessively via MPEG leads to visible artifacts when the data are decompressed into the video we see (that is why the set of formulas is called a CoDec, for CodeDecode or CompressDecompress), particularly when dealing with rapid motion. Compressing audio via perceptual codecs, such as MP3, especially at higher compression ratios, robs music of subtle detail, leaving behind a hollow shadow of what might have been. Even DTS and AC-3, some of the most advanced multi-channel compression formats, sacrifice fidelity in order to fit 6 (or more) channels of hour long audio onto the same disc that must store relatively high-resolution video for the same amount of time.

A lossy codec (like MPEG, DTS, or AC-3) compresses content such that the result, when decompressed, is not exactly the same as the original. If everything goes well, it is almost (but not quite) indistinguishable from the original, but it isn’t bit-for-bit the same. A lossless codec compresses the data without losing any of it when it is decompressed. The result, when decompressed, is exactly the same as the original, with no compromises. The ZIP format, used by PKZip, WinZip, and Stuffit on PCs, is an example of a lossless compression scheme, though one that is generic, and not optimized for any specific kind of file. MLP is able to get higher compression than ZIP in almost all cases, because it is optimized for one kind of data: audio.

Lossy compression encoders may accept high-bit data, such as 24-bits, and they output 24-bits as well. The catch is that the 24-bit output isn’t identical to the 24-bit input. The nature of the codec means they never will be, regardless of how sophisticated the reconstruction algorithms are to decode the process. Even though the output buffers used by the decompression algorithms are 24-bit, that doesn’t mean that the effective resolution of the output is 24-bit. Most lossy compression algorithms throw away the least significant bits ("LSB") of the input, because (in theory) they represent detail that is impossible to hear, or at least difficult to hear. In the end, a "24-bit" lossy compression scheme may really only give results that are effectively only 18-bit, or 16-bit, or even lower. No matter how carefully the bits to throw away are chosen, there is always the possibility that audible information that is important to the music is being lost.

For example, here is a 24 bit binary number:


The least significant bits could be the 0011 on the far right. The resulting number after the 24 bit number has been compressed and then decompressed is only 20 bits long, as shown below.


Although the 0011 are the least significant bits, they are not insignificant, as musical info was in there.

With a lossless encoding/decoding method like MLP, the output of the decoder is equal to the input of the decoder, bit for bit. If they weren’t equal, it wouldn’t be lossless! You can be sure that whatever the original recording engineers recorded is reproduced.

To illustrate the difference for yourselves, do the following:

    * Start with a high-resolution bitmap file (BMP or TIFF formats are examples).
    * Encode the file with a lossless compression codec by archiving it to WinZip.
    * Encode the file with a lossy compression codec by saving it as a JPEG file with moderate compression.
    * Decode using the lossless codec by extracting the bitmap from the WinZip archive using a different name.
    * Decode using the lossy codec by opening the JPEG file and saving it to a new name as a bitmap file.

You’ll note, in general, that the JPEG file is smaller than the WinZip file. The decoded files, however, are the same size. Now, compare the two compressed bitmaps to the original bitmap, side by side. The JPEG file will look slightly different from the original, while the WinZipped file will be identical. So, you see the compromises inherent in aggressive lossy compression.

Why Compress?

Given that DVD-A doesn’t have any video on the disc to suck up bandwidth, why would you need to compress the audio at all? The short answer is that if the data were uncompressed, a DVD wouldn’t be able to hold much audio, and the data rate would go over the limits of the hardware.

The data rate for six channels of uncompressed 24-bit / 96 kHz sampling is 13.18 megabits (Mb) per second. A single-layer DVD holds 4.37 GB (not 4.7 GB, as is commonly reported – see section 7.2 of the DVD FAQ by Jim Taylor for more on this), and a dual-layer holds 7.95 GB, so at this rate you could only store 45 minutes of audio on a single-layer DVD and 82 minutes on a dual-layer. Also, the DVD format has a maximum transfer rate of 9.6 Mb per second, so aside from the short playing time, transfer rate limitation makes 6 channels of 24-bit / 96 kHz sampling impossible.

In contrast, MLP can keep the peak data rate at or below 9.6 Mb per second, and generally keeps the average data rate well below that. This allow longer playing times and/or higher resolution recording. See our table above for a demonstration of playing times with and without MLP.

How do they do it?

Here’s where it gets complicated, but don’t let that hold you back. It’s not that bad.

The basic techniques used by MLP are:

    * Bit Shifting – avoids wasting bits for unused dynamic range.
    * Matrixing – puts the audio common to multiple channels into one channel.
    * Prediction Filters – predicts the next bit of audio based on the previous audio.
    * FIFO Buffer – smoothes the instantaneous data rate.
    * Entropy Coding – compresses the final data as tight as possible.

Bit Shifting

MLP continuously varies the number of bits per sample so it uses only the number of bits that are necessary. In contrast, uncompressed PCM stores all bits, even if most of them are unused (bunches of zeros) most of the time.

In PCM (Pulse Code Modulation), there are a fixed number of bits stored for every sample. It might be 16, 18, 24, or some other number, depending on the recording format, but the number remains unchanged for the whole recording time. During silent sections, all or most of those bits are zero. Maybe they’re only zero for a second or so, but that’s thousands and thousands of samples that are all zeros or extremely low numbers. The MLP encoder recognizes that it could switch to 4-bits (for example) for that section. It stores a special flag that says "switching to 4-bits", then stores a long run of 4-bit values. When the music picks up again, perhaps it decides that the new section will require 22-bits. The encoder stores a new flag, saying, "switching to 22-bits", then starts storing 22-bit samples. Only when the music has a large dynamic range does it need to switch to full 24-bit storage.

On the decoding end, the 4-bit values in our example are converted to full 24-bit values by adding an appropriate number of zeros. Again, because it is a lossless codec, MLP only uses this technique when the data have a lot of zeros in it to begin with.

Matrixing Channel Information into Substreams to Reduce Redundant Data

In a multi-channel audio mix, often there is a significant amount of similar audio on multiple channels. Audio is rarely panned hard to a single channel, and when it is, it’s generally just for a short time. When the same sound is coming out of all the speakers, it doesn’t make sense to compress it separately for each individual channel and use 6 times the bandwidth. MLP compresses all the common elements from all the channels just once. A simple way of thinking about it is to imagine that MLP creates a "combo" channel containing the sum of everything common to all channels. Then for each additional channel, it only needs to store the differences from the common channel.

The advantage of this strategy is that while the "combo" channel is complicated and requires lots of bits to compress, the "difference" channels use only a few bits most of the time, because they only have to store data for how they vary from the main channel. If they don’t vary at all from the combo channel, they use almost no bits. If they vary just slightly, they’ll still use few bits. Only if they vary drastically from the combo channel will a difference channel require the full bandwidth, and typically this doesn’t happen continuously, but rather for short intervals here and there.

For example, let’s just look at a two-channel mix. In the compression stage, the compressor analyzes the two channels and puts together a "common" channel, that contains essentially a sum of the two channels. Then it puts together a "difference" channel that contains the difference between the original Right and original Left stereo channel. At decompression time, it can reconstruct the original Right and Left channels by inverting the mathematics used to create the matrixed channels in the first place. It’s just simple math.

In the above scenario, since compressed channel #2 is a difference between the two original channels, whenever the same sound is playing at the same volume on both channels, the difference channel gets zeros. And as we’ve seen above, zeros compress well. Even if the sound is playing at a slightly different volume on the two channels, the difference channel will still use fewer bits than either of the original channels. It’s only when the sounds coming from the two original left and right channels have no relationship to each other at all that the difference channel will use the same number of bits as the original channels. And that rarely happens in normal music.

In 6-channel music, the same basic idea is used, but the sums and differences are more complicated. In fact, it’s possible for each finished compressed channel to be a weighted sum of proportions of every other channel, though in practice one channel will get the bulk of the common data, and other channels will be largely differences from the main channel.

One useful feature of MLP is that the audio data can be divided into substreams, each of which can contain multiple channels, and can build off of other substreams. A substream is a portion of the data that is easy for a decoder to extract separately. For example, it is possible (and encouraged) for a producer of DVD-A discs to put all the data for the 2-channel mix into substream 0. Then substream 1 can just contain four "difference" channels that enable the decoder to reconstruct the full 6-channel mix. So, if the player only has 2-channel output, it can just decode substream 0, which contains everything necessary for the 2-channel mix. But if the player has full 6-channel output, it decodes both substream 0 and substream 1, and gets the full 6 channels.

In the above scenario, the 2-channel mix in substream 0 is not the left and right channels from the 6 channel mix. It’s a mixdown of all 6 channels into a special optimized 2-channel mix. It’s essentially like feeding the 6 final channels into a 6-input, 2-output mixer, where the mixing engineer can adjust level and phase on the fly to produce just the mix desired. It does need to be just combinations of what’s already on the original 6 channels – the mixer can’t add extra effects or new sounds – but the specific mix is all under the engineer’s control. Compare that to DTS and Dolby Digital, which create the 2-channel mixes on the fly during decoding, with absolutely no input from the recording engineer. With MLP, the control is in the hands of the people making the music.

Amazingly enough, just given the 2-channel mix, the mixdown coefficients (the "levels" the engineer used to make the 2-channel mixdown), and 4 more difference channels, the original 6-channel mix can be extracted, including the original untouched left and right channels. So, only 6 total "channels" of information are stored on the disc, yet from that information a full 6-channel mix and a 2-channel mixdown of those channels can be extracted. Very clever indeed.

In addition, this substream approach allows for simpler, cheaper 2-channel decoders. For example, you could have a 2-channel DVD-A walkman that would use a simpler chipset, because it only needs to extract substream 0 and decode the 2 channels there. It can completely ignore the other 4 channels, and doesn’t need any buffer space for them or processing power to decode them. Again, with Dolby Digital and DTS, the decoder has to extract and decode all 6 channels before it can even begin to downmix the 2-channel version, so a 2-channel player needs just as complicated a decoder, and just as much buffer memory, as a 6-channel player.

Prediction Filters

This is the heart of the MLP codec, and what makes it so much spiffier for compressing music. The gist of it is this: music is not random. Given a certain chunk of audio, it is possible to make useful predictions about what kind of audio will come next. It’s not necessary to be perfectly accurate. (And it’s not possible – if you could always predict what sounds were coming next, you wouldn’t need to listen to music. You could just listen to the first note, and the rest of the piece would be obvious.) The idea is that some prediction is closer than no prediction at all, which allows the MLP algorithm to store just the difference between the real music and the prediction.

Here’s an example, grossly simplified: musical notes are generally held for some amount of time. They don’t just instantly appear in the music and instantly disappear. They have attack, sustain, and decay. In the attack phase, the volume is going up. In the sustain phase, the volume is remaining constant. In the decay phase, the volume is going down. So if the prediction algorithm just predicts that the volume is going to change in the next millisecond the same amount that it did the last millisecond, it’s going to be right, or close, a lot of the time. It’s going to be really, really wrong when the note changes from attack to sustain, or sustain to decay, but those are short instants of time. For the rest of the note’s duration, the prediction is quite close.

Since the prediction algorithm is completely known in advance, and shared by the encoder and the decoder, the encoder just knows that given the preceding music, the decoder is going to predict X (where X is the next sequence of bits). Since it knows the decoder will predict X, it doesn’t need to store X, just the difference between the real music and X. As long as the prediction is fairly close much of the time, the differences will be small, and fit into a smaller number of bits than the raw data by itself. And as we saw before, fewer significant bits allow the encoder to store fewer bits in the data stream. Presto – compression!

In addition, the encoder stores special coefficients for the prediction algorithm in the bitstream, so the coming predictions will be closer to the real music. In effect, the encoder is storing things like "this next section has sharp attacks and decays, so adjust the predictions accordingly." This makes the predictions even better, which means the differences take up fewer bits, and the whole package takes less space, even taking into account that the special coefficients have to be stored as well.

In practice, the MLP decoder is obviously not pulling out individual notes. What it does is break down the sound into individual frequencies, and do predictions on each major frequency. When you see a real-time spectrum plot of a particular piece of music, you see certain strong frequencies that raise and lower and move around, and a bunch of relatively random noise at a lower level. MLP pulls out the major frequencies and does predictions on each of them individually, and then separates whatever is left (essentially the noise) and compresses it separately.

FIFO Buffer to Smooth the Instantaneous Data Rate

There’s a fundamental problem with lossless compression: it’s impossible to force the compression ratio to a specific amount. The encoder applies all the algorithms, and depending on how complex the music is, it gets some level of compression, and that’s it. If the music is sufficiently complex, the compression ratio may be low, and it’s just impossible to get more compression out of it. Luckily, in practice, no real audio signal is that complex all the time. But it can be complex enough for short periods of time that there is too much output data in too short a time for a DVD player to handle. Remember that the maximum data rate DVD players are designed for is 9.6 Mb/sec. Any more than that, and the DVD player just fails miserably, and the music cuts out. Since nobody wants that, some method needs to be used to make sure the data rate never goes above the maximum level, even when the music is complex enough to peak here and there above 9.6 Mb/sec.

The answer is a FIFO, or First In First Out buffer. It works just like it sounds – the data that come in first are the data that go out first. The useful thing about the FIFO buffer is that while the data can’t be read off the DVD at more than 9.6 Mb/sec, data can be read out of the FIFO buffer at much higher speeds, because the buffer is all in RAM (Random Access Memory) on the player. The DVD player is constantly reading ahead, filling up the FIFO buffer at 9.6 Mb/sec or less. When a peak in the data rate happens, all the data the MLP decoder needs are already in the buffer, and can be read out quickly so the music doesn’t cut out. The player then refills the buffer slowly, getting ready for the next peak. It’s not unlike the shock buffers on modern CD walkmans, that buffer up music so they can keep playing if the player is bumped and the CD skips.

Obviously it’s theoretically possible for a complex and sustained sequence of audio to peak above 9.6 Mb/sec so often and for so long that the DVD player can’t fill the buffer fast enough. But in practice, that just doesn’t happen with real material. It’s possible to create special test signals that are impossible to compress with MLP and stay below the limits, but such tests would be highly unrealistic, wouldn’t resemble real music or even real test and demonstration signals, and would serve no earthly purpose. In addition, if a real piece of music overflowed the buffer, it would be noticed in the encoding stage, and the mastering engineers would have many different options for dealing with it, and could take the steps necessary to reduce the data rate for that one section while keeping the sound as pristine as possible. Note that here the recording engineer makes the choice of what to do with the music, not the compression algorithm, and again this is a low-probability case.

Entropy Coding

Entropy coding is a fancy way of saying standard, generic lossless compression, the kind WinZip and other compression algorithms use. This is the final compression, used to try to get a few extra percent when all the other algorithms have done their best.

MLP uses several different types of entropy coding. Let’s take a look at one kind: Huffman coding. This compression technique takes the most common patterns found in a type of file, in this case a music file (or rather a music file that has already gone through all the previous steps above) and replaces them with smaller, simpler codes.

Here’s a simple example: a significant percentage of English text consists of common words like "a," "an," "the," etc. Let’s say we decide that we’ll take the most common words and replace them with the code sequences /0, /1, /2, etc. We can also take out the space before and after the words, because they’re implied. This means we can compress, "The rain in Spain falls mainly on the plain" as "/0rain/1spain falls mainly/2/0plain" where

/0 = "the"
/1 = "in"
/2 = "on"

This compresses a 43 character sentence to 33 characters, for about a 25% savings. As long as the person reading the sentence knows the code, they can convert it back to the original sentence and read it. With more complicated systems, we could get it even smaller.

Huffman coding is just a more sophisticated version of the above, where a code book is created consisting of the most common sequences of bits found in the music stream. The compressor substitutes the codes for the original sequences, and the decompressor does the reverse, substituting the sequences from the code list for the codes it finds in the bitstream.

MLP uses several different forms of entropy coding besides Huffman. For example, most music has a fairly standard distribution of values – the bell curve we’re familiar with from statistics and probability. In such cases, the encoder can use Rice coding, which maps the most common values (the center of the bell curve) to short code sequences, and the least common values (the "tails" of the bell curve) to long code sequences. There is no need for a "code book" as in Huffman coding, because the map of code sequences to original values is entirely mathematical, and the code book is implied (in other words, you don’t have to list a definition that 2 + 2 = 4 when showing a formula, because it is already defined by the system of mathematics).

In some cases, where the original data are essentially random (which would happen if there was a lot of white noise, like an explosion or cymbal crash), the data are left uncompressed, because there are not enough common or repeated sample values to compress.

Other Benefits

Flexible Metadata

Metadata represents extra data that aren’t audio. They could be track names, pictures, or anything else that are not the actual audio data. MLP has a flexible and extensible architecture for metadata, so new kinds of information can be embedded in the bitstream without causing problems for older decoders. The new decoders will read the new metadata, but the old decoders will just skip over it.

For example, Chesky demonstrated at CES a recording that used the subwoofer and center channel as left and right side "height" channels, using speakers placed up high to give a sense of the spaciousness of the recording venue. Potentially, the metadata could tell the decoder to treat a certain channel as a height channel instead of its normal designation (such as the center channel). If your stereo system had a decoder that recognized that metadata, and you had the appropriate speakers, it would automatically route that channel to the right place.

However, any new metadata would need to be standardized in some way to be useful. It’s no good to put the data in the stream if there aren’t any decoders that can use it. Similarly, it’s no good to put the data in the stream if every different piece of mastering software uses a slightly different metadata ID and layout.

Still, it does make it easier to encode new types of data without having to change the format. This means if some new and exciting idea in audio becomes a reality (like Chesky’s height channels), it will be possible to make new DVD-A decoders that can handle the new stuff without having to create a new format, or make the new discs incompatible with the old discs.

More Control of the Compression Process for Producers and Engineers

A content producer, if so inclined, can selectively adjust the data rates for any channel (or all of them) as required by the content, while still maintaining lossless compression.

Here’s an example: A producer decides that he doesn’t want to record content above 24 kHz. Most content at this frequency (if not all) is noise, and 24 kHz is beyond most humans’ hearing capability. Given this, he can selectively low-pass filter the rear channels (or all channels), resulting in greater compression through the MLP process while still maintaining the higher sampling rate.

Or, if the producer decides that he doesn’t really need 24-bits for rear channels, the MLP encoder can dither that information down to 23-bits, or 22-bits, still maintaining the benefits of higher resolution (and much of the benefit of the 24-bit source resolution), trading off a slightly higher noise floor for space.

In contrast, most lossy compression algorithms just let you dial up the bit rate you need. If the audio doesn’t sound so great at that bit rate, the producer has little choice other than to crank up the bit rate. With MLP, the producer can choose exactly what they want to do with the music to get the bit rate to a level they can accept.

Also, reducing the effective bit rate of the original music will always lower the bit rate of the compressed content with MLP. This isn’t so with lossy schemes. For example, the producer might trim off the high-frequency content with a low-pass filter, and recompress with a lossy algorithm, only to find that nothing is saved because the lossy algorithm was already throwing away the high frequencies.


The MLP data stream contains restart information 200 – 1000 times per second, depending on the data, so that any error in transmission can be quickly recovered from. In other words, you won’t listen to a long burst of noise (or silence) simply because the decoder lost its lock on the incoming signal.

Relatively Simple Decoding

MLP is deliberately asymmetric, which means that it’s easier to decode than encode. Most of the work is done in the compression stage: analyzing the music, choosing good prediction filters and coefficients, figuring out which entropy coding scheme will produce the most benefit, etc. On the decoding end, a much simpler chipset can be used, which makes players cheaper and easier to implement.


Because the decoded output is bit-for-bit identical to the encoder input, a recording engineer can master the content, encode it and package it up. On the far side, his customer can decode the data, apply a final round of mastering tweaks to that content, and encode it again for final production while not losing any information due to the extra encoding and decoding process. This is the hallmark of lossless compression and something that is very much NOT the case with lossy codecs such as MPEG, DTS, or AC-3.


MLP is not only an efficient compression scheme for multi-channel audio, but also a flexible and dynamic format with room to grow and change as the industry changes. It combines multiple layers of compression with a wide array of clever innovations. While the future is not at all clear for high-resolution audio formats like DVD-A, we at Secrets certainly are impressed with the technical innovation present in MLP, and we predict that it will be around for a good long time.

Bass (mis)Management and (no) Time Alignment:

We outlined this topic in the audio review on the Onkyo DV-S939 DVD-Audio player. Right now, Bass management for DVD-Audio players is non-existent, and most consumers don’t have the luxury of five full range (or nearly so) loudspeakers. None of the DVD-A players we’ve worked with has provided bass management for DVD-Audio material. We’ve had several discussions on this topic, and the word is that as of this writing, DSPs have insufficient processing power to handle MLP unpacking and also handle the crossover arithmetic in the digital domain. In the future, DSPs will have sufficient power. Digital Standards (non-proprietary) are in the works, and once this hits the marketplace, the Preamp/Processor (or Receiver) will handle bass management as it does for DD/DTS and CDs.

All of DVD-A players we have worked with included Dolby Digital Decoding, and some of them also included DTS decoding.

There are products coming to market to handle bass management for all five channels, such as the Outlaw ICBM-1. Be warned that there’s info out there that’s full range in all channels on some of the discs out there. Take Steely Dan’s Two Against Nature — great material, and they have electric bass across all five speakers on this disc on most tracks.

So we’ve got to address bass management outside of the DVD-A player, because we don’t have it available to us on the analog outputs of the player. There are ways to work around this, but it takes some effort.

The solution makes a few assumptions about the capabilities of your subwoofer. First, it can accept both high and low-level inputs. Second, it has an LFE input. This input is direct to the amplifier without engaging the internal crossover. You really need two inputs, but one is acceptable; just install a Y adapter to work around this limitation.

With this scenario, the front left and right channels are sent via speaker level inputs, and the subwoofers internal crossover is used. From here, the speaker level inputs get routed to the front left and right speakers, with all low frequencies removed.

Use of the LFE input is vital, as you don’t want have multiple crossovers (three channels + subwoofers) working simultaneously on the same audio streams.

In Figure 7 below is a line level diagram of more than one product that will be entering the market shortly to assist with bass management for DVD-A (and multi-channel SACD) players which have no bass management for their high-resolution outputs.

Line Level Diagram

Figure 7

A well designed 5-channel crossover unit will have 3 discrete crossover settings, for Front L/R Channels, Center Channel, and Surrounds. A single global crossover point would be functional, but just barely, since most consumers don’t have 5 matched speakers.

Additionally, this will require two sets of interconnects (yes that’s 12 cables) for going into and out of the crossover.

That tangled snarl of cables behind your system is going to get even worse than it already is!

This method gives a summed output for the subwoofer, which also includes the discrete 0.1 track should it be on the recording. We have seen some DVD-A discs that are 5.0 (no subwoofer / LFE track).

On a related to bass management topic, John Kotches was involved in a series of discussions about bass management with the Panasonic RP-91 for DVD-Audio discs at the AVS Forum. One thing that was learned in the process by more than one person is that some receivers (or processors for that matter) will do some unexpected bass management. This unexpected bass management is in the form of sampling the incoming analog signal into the digital domain, applying the defined crossover, and sending out the subwoofer signal. The original signal is not altered, and is passed along to the speakers.

Another function provided by our HT processors is Time Alignment. This compensates for differences in speaker distances so that the sound from each speaker arrives at the listening position at the right moment in time (the same time). For digital formats, your Preamp/Processor (or receiver) will provide time alignment of the drivers based on your input. So why do we even need time alignment of our speakers?

Consider John Kotches listening room – diagrammed in Figure 8 below. We have left out the popcorn machine, candy machine, and soda pop dispenser for the sake of simplicity. The speakers are anywhere from 5.5 feet (subwoofers) to 8.5 feet (front left + right) from his listening position. If sound is emitted from all speakers simultaneously, the sounds won’t arrive at his ears simultaneously. Each foot of difference in distance corresponds to roughly one millisecond in time. If you really want to delve into it, it’s a little more than one foot (1130 ft/second at sea level at one atmosphere of pressure, with a specified temperature and humidity). If we want all these sounds to be coincident at our ears, we have to have some mechanism for delaying the signals from the speakers farther away as required. This is yet another function handled by DSPs in our receivers / processors. In his case, the sounds from the subwoofers would be delayed by about 3 milliseconds, the surrounds by about 2 milliseconds, and the center by about 1.5 milliseconds to make everything coincident.

John Kotche - listening room

Figure 8

Time alignment isn’t just about making the sounds coincident! It’s also necessary to keep the time relationship between speakers accurate so that things like hall reverberation (on music recordings) and other sonic characteristics remain properly aligned in time.

Once again, due to insufficient horsepower in the current generation of DSPs, time alignment isn’t being performed on DVD-A material. So for multi-channel recordings, we aren’t hearing everything quite like it was intended.

DVD-Audio Usability

There’s a lot of commotion about the “requirement” of having a display device to set into motion DVD-Audio playback. We are writing to dispel that notion. Quite simply put, it isn’t true. While it might be necessary to use a display for speaker level calibration (this is dependent on the player – it is possible for the player to give visual cues on its on LED panel), that’s the extent of the monitor’s requirement. Inducing playback of a disc can be accomplished by more than one method; none of them requires the use of a monitor.

If your player’s operating firmware is implemented correctly, Warner Brothers titles (and their affiliated labels) can be started by inserting the disc into the tray, and hitting the Play button. This will initiate immediate playback following reading of the player’s Table of Contents (TOC). There is one caveat to this method: you will get the default soundtrack datastream, which happens to be the multi-channel one. Warner’s first round of titles (including kd lang’s "invincible summer" and Emerson, Lake and Palmer’s "Brain Salad Surgery") had only the multi-channel track and stereo reproduction via the fold-down instructions within the MLP data. Subsequent releases (and possible newer pressings of the above two tracks) include both the multi-channel and dedicated stereo mixes. This was confirmed on multiple discs by the mastering engineer Bob Ludwig, of Gateway Mastering. Thankfully, WB has implemented identical menu structures on all of these discs, so once you know the menu layout for one title, you know it for all of them. It so happens that the multi-channel mix is listed first in the menu, then the stereo mix. When the player loads the disc and pulls up the Top Menu, you will be greeted with a long pause. During this pause, all you need to do is hit the <down> arrow on your cursor keypad, followed by <select> and then <Play> twice. The 1st time you hit play, you are transferred to the stereo track listing. The 2nd time, you begin playback of the disc.

What we would really like to see is the automatic initiation of playback for all discs, so that even these gyrations are not needed to begin playback; simply insert the disc and hit the <close> or <play> buttons, and playback would initiate. Some record labels are already delivering their discs with automatic playback enabled. Hats off to Silverline and Audionet for authoring their titles this way! This is much more what the user is expecting. If you’re patient, almost all titles will eventually auto-play, but it can take up to 45 seconds for playback to begin without any input from you. There is one notable exception: DTS’ DVD-Audio titles are not starting automatically, so you must select between the DVD-A, DTS, or DD soundtrack before playback will begin.

Improving playback usability isn’t limited to this problem. It would be nice to be able to define a default mix (a.k.a. datastream) or even a hierarchy of them. When this default mix is available, the player would begin playback without intervention. Otherwise, the disc’s default mix could be engaged. This would be a welcome addition, especially for those who will (or can) only listen to stereo mixes for music.

Along a similar vein, there are some discs, where the downmix flag is set to “disallow”, which means your playback must match the channel configuration of the recording. If it does not, you’ll be missing information from channels that are not configured in your system. This works against the DVD-A format, since not all audiophiles want to listen to music in surround. Like it or not, it’s a shortcoming that should be addressed by content providers.

There is another interesting feature when it comes to some DVD-Audio players, the ability to set the Priority Content. What this does is change the personality of the player between DVD-Video and DVD-Audio. When set to DVD-Video, the player will behave exactly as a DVD-Video player, and when presented with a DVD-Audio disc, it will read the VIDEO_TS folder (just as a DVD-Video player would). When set to DVD-Audio, it will read the AUDIO_TS folder when presented with a DVD-Audio disc. I’m not sure why someone would want to force a DVD-Audio player into DVD-Video player behavior, but some players do provide this capability. (This would be useful for a test disc where you stored test patterns and test tones for video in Video_TS and test tones for audio in Audio_TS.  – Stacey–)

Lest you think we dislike DVD-Audio, we say . . . not so! In fact, we think it is a tremendous improvement over Redbook CD, and the best recordings on DVD-A certainly could make you wonder just what all the fuss is about the competing high-resolution format. For recordings which are truly mastered in 24- bit / 96 kHz (which means the entire recording chain all the way to the disc stays at 24-bit / 96 kHz), the sound is nothing short of spectacular. Take a listen to titles from AIX, Silverline, and Chesky and you’ll understand what we mean.

DVD-A and the Speaker Babylon (or How a Flexible Standard Can Cost You So Much Money!)

This section is an expansion on MLP. MLP’s metadata is very rich, which can (and does) carry information regarding the channel configuration.

DVD-Audio is a highly flexible format, with a variety of channel configurations available, from monophonic to 6 channels running full range at 24 bit sampling depth, and 96 kHz sampling.

The "standard" format for DVD-Audio is the 5.1 mix, which all of you are already familiar with. But anyway, here is a diagram of the recommended speaker configuration for multi-channel installations (Figure 9, below). This is an important frame of reference to keep in mind for the subsequent configurations discussed below. Going straight out the listeners eyes to the front of the room defines the 00 reference for all angles.

5.1 2D Layout 5.1 3D Layout

Figure 9a    ITU 5.1 2D Layout 
Figure 9b ITU 5.1 3D Layout

In this layout, all channels are defined as full range, along with a dedicated Low Frequency Effects (a.k.a. the 0.1) channel. All speakers are equidistant from the primary listening position, with the main left and right speakers at the point of an equilateral triangle. The center speaker is on an arc, so that it is dead center on the listener (duh!). Surrounds are located at 1100 behind the listener. This is slightly behind the listening position, as opposed to directly opposite the listening position (900).

What ends up in the 0.1 channel is solely at the discretion of the mixing engineer, and this can be anything from kick drums to bass guitar to pipe organs. All speakers are equidistant from the primary listening position, with the main left and right speakers at the points of an equilateral triangle. The surrounds are placed at an angle of 1100 to the rear of the listening position.

In an ideal world, all the speakers are equidistant, and in the real world, we need delay settings. The rule of thumb is one millisecond (ms) for every 1 foot of difference in displacement. Without this time alignment, the soundstage can be compromised, which means DVD-Audio can’t reach its maximum sonic potential.

The above diagram is an example of a real room, with distances and speaker layouts displayed. In this room, the speakers are not equidistant. If the DVD-Audio player in use required millisecond inputs, we’d have to “normalize” to the longest distance. In this case, that would be the 8’6” distance to the mains. To time align the center channel, a delay of 1.5 ms is added to the Center Channel setting. Continuing on, the surrounds require a 2 ms delay, and the LFE channel requires a 3 ms delay. Once these values are configured in your DVD-A player, voila`, you’re finished. Note: Many DVD-A players don’t support time alignment for DVD-A outputs, only Dolby Digital and DTS decoded outputs. We expect this to be supported in a future generation of DVD-A players.

If the limitation for the processor is 1ms increments (most common case), try both 1 and 2ms and see which sounds best in your room. Our experience is that the higher value will work better.

A Common Variation

Some mixes (classical and an isolated track or two of popular music) are in 5.0, which skips the 0.1 channel. Most orchestral music doesn’t have tremendous amounts of low frequency information, with the exception of bass drum hits. For example, the String Bass has a lowest frequency of approximately 41 Hz, while the lowest note on the standard piano keyboard (referred to as A0) is 27.5 Hz.

Outside the realm of classical DVD-Audio discs, most mainstream label material is presented in 5.1 format. A rare exception to this are two tracks from Emerson, Lake and Palmer’s "Brain Salad Surgery", tracks Benny the Bouncer and Still You Turn Me On. Oddly enough, the rest of the disc is 5.1.

Raising the Roof of Your Listening Room

At CEDIA 2001, we were treated to a demonstration of an alternative 6-channel configuration for DVD-Audio: the inclusion of height channels. Rather than go into a long speech about the demonstration, I’ll (JK) just say that for the first time I experienced the full acoustic space of a venue. It was so good, I showed up early two days in a row to hear the demonstration.

An underplayed feature of MLP is the able to carry within the metadata (everything that isn’t the audio) the channel configurations. Should a non-standard speaker configuration be presented, the metadata can indicate so. DVD-A players or processors (when the signal can be transmitted digitally) with sufficient outputs could theoretically route outputs based on the channel configuration. This ties in to the inclusion of height channels nicely, and there are a few configurations out there in the market.

We’ll start off with the easiest of them, the Telarc height channel. In the Telarc configuration, you begin with the standard 5.0 speaker layout of Front L/R/C and Rear L/R. The subwoofer channel is reallocated to the height channel (Figure 10a). The height channel has two potential solutions. A single channel can be located directly above the center channel for a 6.0 solution.

Telarc Height 2D Layout Telarc Height 3D Layout

Figure 10a   Telarc Height 2D Layout 
Figure 10b  Telarc Height 3D Layou

Alternately, two channels can be utilized to present height information (Figure 10b). In this configuration, the two channels are placed at the sides of the listening position, well above the plane of the other speakers. In both cases, the information presented in the height channel is identical; the two speaker solution affords a more diffuse presentation of the height information. In Telarc’s multi-channel mixing room, dual height channels are employed using dipolar radiating planar magnetic loudspeakers (Figure 11, below). The speakers are wired in series, so a single channel of amplification can drive them.

Alternate Telarc 2D Layout Alternate Telarc 3D Layout

Figure 11a Alternate Telarc 2D Layout
Figure 11b Alternate Telarc 3D Layout

Of course, just to keep things interesting, there has to be more! And in fact there are two more configurations to cover. First is the 2+2+2 format, being brought out by MDG records in Germany. In this setup, the center and subwoofer channels are redirected for use as left and right height channels respectively. The height channels are placed literally above the main left and right speakers. Below (Figure 12), you will find an image of the MDG 2+2+2 format.

2+2+2 2D Layout 2+2+2 3D Layout

Figure 12a    Height 2+2+2 2D Layout
Figure 12b Height 2+2+2 3D Layout

Our journey through speaker Babylon is nearly complete. We come now to Chesky’s 6.0 format, which includes front / side height channels which is similar to an earlier mentioned format that attempts to build a more spacious sensation when listening in this fashion. In a conversation with David Chesky, he stated that his height channel recordings will only be exclusive to the DVD-A format.

In the Chesky configuration (Figure 13, below), height channels are placed above and outside the main speakers at approximately 55 degrees. Having heard this example it is most impressive in truly conveying the sensation of space within a recording venue.

6.0 2D Layout 6.0 3D Layout

Figure 13a Height 6.0 2D Layout
Figure 13b Height 6.0 3D Layout

All of the alternative surround layouts are intended for one thing: to improve the illusion that you are in a real space (the recording venue).

Making it All Work in Your Home Theater

So . . . how do you hook this all up? There are several options, and the logistical issues of relocating speakers as required is something we will get into at another time and date. It’s either that or purchase several more speakers! We have drawn three diagrams (Figures 14, 15, and 16, below) which will show you how to cover most of the system possibilities out there, even ours.

In all cases, the speaker level outputs for the CC (5th) and the 6th channel will go to the switchbox common inputs, left and right respectively.

The simplest configuration assumes you are using the same speakers for the Chesky and MDG surround setups. It requires a two-way switchbox, with output A handling standard 5.1 setups and the Telarc height channel. During 5.1 listening, no signal will be present on the 6th channel. For Telarc height, the subwoofer line level cable would need to be connected to the 6th channel amplifier input. A switchbox could be employed for this as well.

Two-Way Switchbox

Figure 14    Two-way Switchbox

If you want to start getting really serious about this, you could dedicate separate speakers (or at least speaker wires) to the different configurations. In this case you need a three-way speaker box. Output A would be for 5.1 and Telarc listening, with the Telarc height channel inactive for 5.1 listening. Output B goes to the Chesky height channel speakers, and Output C goes to the DMG height speakers.

Three-Way Switchbox

Figure 15    Three-way Switchbox

Of course as a reviewer I can’t do things that straight forward, so I’m (JK) legally required to make things vastly more complicated in my system (only kidding). In my system, I have an unpowered subwoofer, and my 6th amplification channel drives the subwoofer for 5.1 listening. To fully do all the height options, I’d have to go to a four-way switchbox. The outputs in my system would be as follows: Output A for 5.1 listening with the left channel connected to the center, and the right channel connected to the subwoofer. For Telarc listening, I’d use output B, with the left channel going to the center and the right channel going to the height channel(s). For Chesky Listening, I’d use output C, with the left channel going to left height, and the right channel going to right height. For MDG listening, I’d use output D, with left channel going to left height and right channel going to right height.

Four-Way Switchbox

Figure 16    Four-way Switchbox

And now our journey to Speaker Babylon has come to an end. Please exit from the front doors and watch your step.

Information Gathered and Tests Executed During Benchmark-2 (B2) for Analysis of Audio Performance

There were several tests that every player passed at our B2. None of the players included in this event were SACD capable, and all were able to correctly read the CD layer of a hybrid SACD. All DVD-Video players correctly accessed the DVD-Video content on a DVD-Audio disc. All players passed both tracks with simulated fingerprints (tracks 18 and 19). This test was done via listening, and the key part with this test is not that there are no errors in the process, rather that there are no audible errors. None of the players did buffering (cache the key presses from a user) of chapter to chapter jumps on DVD-Audio discs. With some DVD-Video players, you can rapidly hit the Next Chapter button and rapidly switch through.

For players that included Dolby Digital decoders, we recorded information on bass management for them: Size adjustments, delay, and subwoofer settings were recorded, and the AVIA bass sweep was used to determine the crossover frequency for engaged bass management. Of the DVD-A players, tested none of the players had bass management capability for DVD-A material. No player tested engaged crossovers for Redbook CD playback. In addition, we noted whether the player included a night compression mode for Dolby Digital.

We noted which players indicated ability to play HDCD discs, and where visual indicators were available, confirmed with an HDCD disc that the content was being detected, and the appropriate filter engaged.

Observational Listening

The second Benchmark Event (B2) was held at Stacey’s house, and we had the whole house rocking with various and sundry tests. Because Stacey’s main HT room was tied up with video tests, we had to assemble a system for use in critical listening. John Kotches took over one of the bedrooms to do the listening, and some of the audio tests.

Hot on the heels of JJs review of the Diva 4.1s, AV123.com shipped us a matched quintet of Model 5.1s for use in listening during the review. The finish on these speakers looked marvelous, and after some break-in, they sounded pretty good too. They won’t convert John Kotches over from being a planar nut, but they are excellent, musical speakers. On the other hand, they did point out the importance of having a matched quintet of speakers and how costly this may be for most. Pans on a matched set of fives is never a question about differences in speakers; it’s all about room placement (the biggest factor anyway) and components. A hearty thanks to AV123.com (and Mark Schifter) for the loan of the speakers.

The subwoofer was a Von Schweikert Tower of Power, which was both John Kotches’ and Stacey Spears’ reference subwoofer until the SV Subwoofers displaced them, so we are familiar with their performance from the years it served as our subwoofer’s.

The brains and brawn of the system was the Integra DTR-9.1 A/V Receiver. This product has its own review, so we are not going to repeat information here other than to say it did yeoman’s work on a variety of different configurations, working flawlessly for us.

Cabling for all components was by Nordost: Red Dawn Analog interconnects and speaker cabling and their Moonglo digital cable. These models of cables have been used by many of the Secrets staff so we are familiar with their characteristics.

Power conditioning was provided by the PS Audio Power Plant 600 – We didn’t do any work with multi-wave options due to time constraints, and all components were fed with 60 Hz Sine Wave.

We tried to damp the first reflection points for the speakers using ASC Planar traps – these normally reside in Stacey’s main system, which was handling video-only chores, so they weren’t needed.

Speaker calibration was done with an Audio Control SA-3051, Spectrum Analyzer (1/3 octave Real Time Analyzer) and SPL meter using a calibrated microphone.

Selections John Kotches Used for Listening observations

One of the ways we tried to expand the scope of B2 was to include listening observations on players used for the Benchmark as appropriate and as time permitted. Since I did the listening observations, I wanted to write a few words on the various recordings I used, and most of these are from my personal Reference Discs. This is a somewhat liquid list of discs, and includes DVD-A, Redbook CD, HDCD, DTS, and DD encoded formats. The goal was to provide a cross section of materials within the confines of my taste and provide some brief subjective impressions of some of the players under test for the benchmark. Looking back over my notes, I don’t remember writing as much as I did, but this is beneficial to our readership!

Redbook CD Recordings:

Bob Mintzer Big Band, Camouflage, Track 3 Long Ago and Far Away

This is a tenor feature ballad and I listen for a few things along the way. First and foremost the tone of the leaders horn with both rhythm section and as he rides over the top of the band at the songs closing. Additionally, as the tenor solo builds, there’s a section where the trombone and trumpet sections build backing chords with bell tones. This starts with the trombone section hitting one note, with one player holding, 3 trombonists hitting the next note, then 2 trombonists and finally 1. As this section progresses the trumpets do a similar treatment.

The drumkit on this is from left of center to right of center, without exaggeration of the width of the soundstage.

Joni Mitchell, Mingus, Track 11 Goodbye Porkpie Hat

This song is Mingus’ tribute to great Tenor Saxophonist Lester Young, and Joni does her rendition with an all star backing group including Jaco Pastorius on Bass, Wayne Shorter on Saxophones and Peter Erskine on Drums. Mostly I listen for the brushes on the drums and Joni’s vocal qualities. I can’t say as I’m a fan of Jaco’s playing on this disc, as it’s too overbearing, but that’s a personal preference.

I also use this track as a test of HDCD decoding, as it is an HDCD encoded disc.

Keith Jarrett, The Melody at Night With You, Track 8 Shenandoah

This is a traditional American folk song, and this is a lovely rendition. I’m listening for the naturalness of the piano sound – whether I can hear Keith vocalizing or not, and finally, the decay of the last chord. I want to hear every nanosecond of that gorgeous dissonance that he has voiced. It brings a smile to my face when I hear this fade away acoustically into nothingness. It beats the heck out of an artificial decay!

DVD with Linear PCM Soundtrack

The Eagles, Hell Freezes Over, Track 16 The Last Resort

A hidden, often unused reference is on the concert DVD – a full PCM stereo soundtrack, which is of exceptional quality. We often skip over this well done gem to get at the DTS encoded surround track. I use this when I’m testing DACs out pretty regularly. I’m listening for a few things with this track – first and foremost it’s the quality of the Yamaha MidiGrand and how wrong it sounds. The better the playback gear the worse this thing sounds. I listen to the quality of the strings, and Don Henley’s vocals to hear how much sibilance is present, and how the vocal quality changes at the top of his register before breaking into falsetto at the close of the track.

DTS Decoding

Some of my audiophile friends consider me to be a bit of a heathen, because I like immersive style surround mixes where I’m placed in the middle of the band. This comes from my youth when I spent many an hour in the middle of the band!

DTS has done a fabulous job in taking some familiar material and encoding them onto Compact Discs. Also, some of the best sounding Concert DVDs to my ears have DTS encoded tracks. Here’s what I listen to:

Lyle Lovett, Joshua Judges Ruth, Track 8, Baltimore

This may be the best DTS encoded CD out there, and I hope DTS can get the rights to release this as a DVD-Audio disc. This one is a fairly simply orchestrated track, with voice, guitar, piano, and cello if memory serves me correctly. Sometimes the simplest is the toughest to reproduce. I’m listening for the nuances here – the noises of Lyle’s mouth as he completes some phrases. The edge (yes edge) that the recording takes on (almost certainly intentional) in certain places is an excellent contrast to the gentleness exhibited by other passages.

Alan Parsons, On Air, Track 9, So Far Away

This disc was intended for 5.1 treatment from the start, and as such makes aggressive use of surrounds. Chris Cross is the featured vocalist and his particularly smooth voice as well as the interplay on harmonizations is a treat. On a standout recording it’s hard to pick one track, but the combination of lead and harmony vocals, and the variety of instruments is both enjoyable and to me revealing of components in the system.

The Police, The Singles, Track 7 Every Little Thing She Does is Magic

So many little details in layer upon layer on this track. Vocals, pianos (is that a vocoder? on the bridge). I’ve always known this was a fairly complex textured selection, but the 5.1 mix emphasizes this aspect. This track also has Stewart Copeland’s drum kit all over the place, and so, this track does a fine job of showing a DTS decoder’s ability to deliver transients.

Fun stuff, and the out chorus is a "high volume required" item in my system at home!

Dolby Digital Decoding

I only use one test track for DD decoding, as the majority of my reference material is DTS or DVD-A encoded.

James Taylor, Live at the Beacon Theater, Track 6 Only a Dream in Rio

This is the only track I used for testing the Dolby Digital prowess of a player. On this track I listen for the sound of JTs voice and guitar, which should be solidly placed in the center channel. Can you hear the details of his plucking clearly – on the best players you can tell when his thumb and his fingers pluck as they have mildly different timbres. Background vocalists should spread from left to nearly dead center. How is the timbre of the guitar in the right channel? How clear (and powerful) is the bass line?


As alluded to earlier, I’m a bit unusual in the audiophile community with respect to my enjoyment of surround music. I wonder how many audiophiles realize that for the vast majority of recordings the soundstage and imaging that they so love to talk about is all in the hands of the mixing engineer? There are certainly exceptions, but for the most part this is controlled by a pot on the mixing console.

Between this and my background as a musician, I’m not averse to sounds coming at me from all directions. Last time I checked, this happens in the real world too!

I’ve been acquiring DVD-A discs over the past several months and am currently using some of these in my collection as reference discs:

Joni Mitchell, Both Sides Now, Track 5 Answer Me, My Love

Forget about anything else – the brass choir that opens this selection is absolute ear candy! Joni is expressive in this cut, and she manages to both "belt out" as needed, and treat other passages with a wonderful delicacy. Wayne Shorter sticks the bell of his soprano saxophone into this one, and at one point has some spittle in his mouthpiece – you can tell what’s going on there! Strings sound like they’re real, as does the brass section, which is nicely featured in this track.

kd lang, Invincible Summer, Track 2 Summer Fling

This one is a lilting, medium tempo number, and it’s kd multi-tracked for the background vocals. If it’s not obvious who’s singing background vocals, you have a problem somewhere. In addition, the "disco strings" (that’s how I describe the texture / sound) are real, not synthesized, and if it’s not immediately apparent, the player is lacking in resolution. Synthesizer counterpoint used in several sections should be fixed in space, and solid. Any wavering indicates an issue with imaging on the player.

Blue Man Group, Audio, Track 2 PVC IV

This disc has tremendous energy. It’s tribal / world music / just plain fun! Things can be a little repetitive, and they often take a sequence and build layers upon a simple foundation. It’s a lot of percussive instruments, with PVC used to make many of the sounds. The instruments are home made and take on interesting and unique tonalities.

I’m not sure where the ideas came up for the various concoctions of PVC piping and mallets to strike them originated, but all this disc does is make me want to attend the show. Hey JJ — let’s get tickets for 2002 CES for one of their shows at the Luxor now! I know you wanted this disc being the bass fiend you are, but I won’t part with it! Get your own copy editor dude!

So what do I listen for on this track? The roundness of the PVC marimbas as they’re struck. Because the paddles are flat, you get a kind of an "oh" sound from the tubes. The crackle of the drum well across the soundstage, from rear right to front left. The quality of the bass notes out of the PVC pipes. The energy delivered as the piece reaches its climax, then fades down. Is the forward momentum sustained?

Please consider all of my listening sessions as the preliminary impressions they are – time did not permit listening to any of the players to anywhere near the amount we would have liked.


This article joins the previous five in order to round out the DVD technology. It is our first attempt, and like the other articles, will grow and improve over time. If you find new issues or suggestions in regards to DVD-A, please e-mail the staff at Secrets, and we might use it in updates to the article.


1. Alastair Roxburgh, EAD Corporation, Chief Technical Officer. Alastair gave permission to use diagrams from Theatermaster-8 manual which we modified to help build our own versions.

2. David Chesky, Chesky Records. David and John Kotches had some e-mail discussions about Chesky’s 6.0 channel setup. We wanted to confirm that our understanding was correct prior to committing this article to print.

3. Michael Bishop, Telarc Records, Chief Mixing Engineer. Michael confirmed that we indeed understood the correct configuration and reasoning behind dual height channels in Telarcs surround sound mixing room.

4. Manfred Görgen, ‘2+2+2 Information for installation the HiFi-Equipment’, MDG.

5. Manfred Görgen, ‘2+2+2 – Recording "Breakthrough into a new dimension"’, MDG.

6. JR Stuart, PG Craven, MA Gerzon, MJ Law, RJ Wilson, ‘MLP Lossless Compression’, AES 9th Regional Convention Tokyo 1999. Available for download at http://www.meridian-audio.com/w_paper/mlp_jap_aes9_1.PDF

7. RJ Wilson, ‘DVD-Audio, with a focus on MLP’, AES Singapore July 2000.

8. Meridian Audio, "Meridian MLP Encoder User Guide".

If you want to license our Benchmark explanatory articles for use in your company, please send an e-mail to our staff.