Now that you have a basic understanding of where we are going with HDR video, we need to point out that the term “HDR” is unfortunately and confusingly used for something which has little in common with HDR photography. Cripes, even the iPhone and iPad camera apps have an “HDR” mode. There is a good chance a number of you reading this right now are thinking “I know all about HDR. I’ve been working with it for years”. Sadly, that’s not categorically true. Please let me explain.
What most people in the “HDR” in the photography world is not true high dynamic range at all. It should have been called exposure combining or something, but of course that just doesn’t sound as cool.
As amazing as today’s DSLR and Mirrorless cameras are, they are still quite limited, relatively speaking, in terms of how wide an exposure they can capture. Typically, if a shot has considerable dark-to-light range in it, you the photographer have to choose between a high F-stop which preserves the highlights but in consequence crushes the shadows, or a low F-stop which does the exact opposite.
“HDR” photography is a colloquial phrase for combining content from at least two photos of the same subject, one taken with a low exposure, the other with a high exposure, so that the final photo has content from both. While you could say that the multiple exposures give you high source-image dynamic range, the act of squashing them together (tone mapping and so forth which is its own sub-art) produces a finished product with no more dynamic range than a single source photo! Why? Because the common file containers, and moreover your computer monitor and the photo paper of your prints, cannot display any more dynamic range than they already are. I’m sorry if I just hurt your feelings cher photogs. Blame lousy marketing, not me.
HDR video is a completely different paradigm. The container defines more code values, the luminance curve has been completely redesigned from the gamma of old, color has been expanded well beyond the sRGB of mainstream photography, and most importantly the display devices are capable of realizing all of this. When the dust has settled on HDR video, I’m sure you’ll be able to have an HDR monitor on your PC and actually do HDR photography but for now, please realize you don’t.
Before anyone calls me a heretic, I will note that it is possible to generate HDR images with a very large dynamic range by capturing exposure series and then stitching them to a linear light image format such as OpenEXR, .EXR, LogLUV, JpegHDR, etc. However, without the correct image processor, those images can’t be displayed and would need to be mapped to the display by hand (or by a tone mapper which must be carefully controlled). This is why most of these are used in content creation and not for playout.
But what can capture all that juicy dynamic range in the first place?
Film can… and has been for, oh, 100 years or so. That’s right folks, as long as there is a decent 35mm print of Dr. No or Thunderball hiding somewhere, I stand a chance of seeing them in HDR someday soon. Credit card in hand, ready to purchase.
Most of today’s digital cinema cameras can capture HDR, certainly more than we’ve been able to reproduce (until now). We consider any digital cinema camera which can capture 12+ stops of dynamic range to be capable of HDR.
For extreme dynamic range capture there are even multi-camera mirror rigs which allow two cameras to capture the exact same scene, one of which uses an ND (neutral density) filter to stop it down. When the two are latter combined the resultant dynamic range is crazy-high. Wait, isn’t this exactly what I described as not being HDR in the photography world? Glad you are paying attention. Recall the massive difference between the prevalent HDR photography paradigm and HDR Video: We actually have containers, standards, and most importantly display devices which can realize HDR video. In other words, when in the video world we combine exposures, we don’t have to crush the end product down to SDR the way you do with today’s common photography workflows.
So we have HDR display technology and material with enough inherent dynamic range to be graded as such. Now what?
HDR today starts on the content creation side of things where there currently are two professional HDR monitors to choose from to grade HDR content. Sony’s Broadcast Video Monitor X300 (aka the BVM-X300), a true 4K (4096×2160) OLED capable of 1000 nits, and Dolby’s Pulsar, a 1080p LCD which boasts over 2,000 dimming zones and 4,000 nits. These are “just” HDR displays though in that they are “format agnostic”.
The key point to make is that at this stage the director, cinematographer, and colorist are not concerned with a consumer HDR delivery format at all. They are simply grading their work against the capabilities of the reference display.
It’s interesting to note that the Sony actually has a “power cap” warning integral to its function: a clipping/warning light alerts you when too many pixels are at the 1,000 nits peak, roughly 10% of the raster. If pushed further, the overall light output drops because it just doesn’t have the power. While some of you may be thinking “that sucks”, remember what we said about HDR: used properly the APL shouldn’t be crazy high anyway. If someone is farting with the knobs and pushing things that high they need to back off, hardware capability notwithstanding. After all we don’t want HDR blamed for, you know, blinding people or anything.
Don’t no one tell JJ Abrams about HDR….he’ll blind us with his flare! Image Copyright Paramount Pictures.
The behavior of the Sony is a critical point so I’m going to repeat it: Peak output is achievable only over a small area of the screen. Remember that! It’s going to be most relevant later in this article.
Dolby’s Pulsar does better, but this is a display which is liquid cooled and can draw some 80 amps of power from the wall. Seriously, you could roast a chicken (or two) with the power this thing draws.
So how do we get all this HDR goodness into our homes? Well, first and foremost there is…
Dolby has been working on this stuff longer than anyone. They’ve been developing this technology since long before anyone even heard the term HDR, around 2003. They practically invented the HDR LED local dimming display for crying out loud! The result: the Dolby Vision system. Dolby Vision is a complete and comprehensive system encompassing tools and standards for monitoring and grading of content, through delivery and transmission, and finally to the end user’s display in their home.
On the content creation side, on the surface at least, it’s pretty straight forward: The capabilities of the monitor are known, the artist is free to work their magic, and a ton of metadata is tagged to the content. The real genius comes at the other end.
Here I am with my Dolby Vision TV with, let’s say, a 500 nit capability. I play content which was mastered to 4,000 nits. What happens to the picture data between 500 and 4,000 nits? There is a Dolby Vision processor chip in the TV which, using all that juicy metadata generated during encoding, “remaps” the luminance portion which is above the display’s peak. This is not a canned “4,000-to-500 nits” tone map: it is dynamic and content aware, employed on a per-frame basis.
So for example if one scene has data up to 4,000 nits and the next peaks at 1,500, the two will be tone mapped differently. As cliché as this sounds, it’s going to looks as good as it possibly can while being faithful to the source content within the display’s capabilities. Dolby Vision displays are calibrated by referencing something somewhat humorously called the model’s “Golden Reference”. We’ll talk more about calibrating Dolby Vision in just a few minutes.
Dolby Vision has been crafted and shaped to be very forward looking. Dolby’s research shows that in terms of how bright we should go, 10,000 nits is both doable and desirable, so that is baked right into Dolby Vision. For this reason, they’ve done a lot of work in terms of finding a luminance response curve which can accommodate such an incredible range.
If we simply tried to adapt the existing gamma/log of SDR to HDR, a ton of data gets wasted on picture content which doesn’t need it, or doesn’t need it as much as others. So Dolby came up with the Perceptual Quantizer EOTF, subsequently adopted and formally ratified as ST2084 by SMPTE for all manner of HDR! I wonder… did anyone send Dolby a “Thank You” card?
In the common tongue, as Gandalf would say, Perceptual Quantizer ( PQ for short) is all about distributing the data based on our visual perception. In practice it looks a little something like this:
Looks a little funny at first in that about half the word data is devoted to the range of 0-100 nits: You can barely see it when graphed this way (64-509 in 10 bit values). While that sounds like a lot to bank into what, at a glance, looks like a small portion of HDR’s spectrum, remember we said HDR is about highlights, not turning your screen into a substitute for fluorescent overhead lighting (which, incidentally, is roughly how bright 10,000 nits is). Conversely though, is devoting word length to a future 10,000 nit upper end wise? Considering that, with PQ, going from 5,000 nits to 10,000 nits only “costs” 7% of the word length, I’d say that’s a shrewd investment indeed.
ST2084 is more efficient than gamma which does not have enough code values near black and too many near white. This is why you would need more bit depth with gamma than PQ.
One fundamental tenet of the new curve is that it is an absolute EOTF (code values are explicitly bound to luminance) which is a huge departure from the power gamma curves of yesteryear which are relative EOTFs. There is no fudge factor here, no “I’m going to make it brighter/darker because I want to”. Well, you can I guess but it’s going to look like crap. This makes HDR a nice, tight standard. Content actually has a good chance of being seen the way it was crafted for a change. Of course the more astute among you will note that this conversely makes HDR even less forgiving of poor viewing environments. Wait, didn’t I already say that the ideal viewing environment remains an ambient light controlled one? Yea, I did, but it can’t be overstated so you just got a reminder for free.
So if Dolby has done all the legwork here why is there an alternative on the table at all? Even though we’ve all happily been paying Dolby (in a roundabout way) a couple pennies in licensing each time we hear a movie soundtrack for the past, oh, 50 years or so, some people must have objected and decided to come up with an HDR system which is “free”. Someone is going to throw rotten fruit at me for saying this, but you get what you pay for.
In a nutshell, HDR10 is little more than a collection of HDR related standards, majority of which are cribbed from Dolby Vision anyway, most notably PQ/ST2084! Again I ask: did anyone send Dolby a Thank You card for giving away all their hard work? This is going to come as a shocker to some of you, but there isn’t actually an official standard called HDR10. People just started referring to HDR10 because it’s HDR and 10-bit. You can think of HDR10 as being merely a subset of Dolby Vision, predicated on 10 bit. The KEY differentiator: HDR10 completely omits the playback side of the chain. HDR10 defines NO standards for tone and gamut mapping. None. Nada. In consequence we cannot comprehensively calibrate an HDR10 display (we’ll talk about exactly what you can and cannot do a little later). Write that down, stick it on your fridge, and tell everyone you can. If I could, I would have that on a sandwich board and walk up and down Hollywood Boulevard.
PQ/ST2084 alone does not an HDR system make. What happens when an HDR10 clip mastered to, say, 1000 nits is played on a display capable of 500 nits? Who the heck knows? That’s up to the TV manufacturer and it’s going to be different from manufacturer to manufacturer, probably from model to model as well. HDR10 does not currently have the dynamic frame by frame metadata of Dolby Vision. As such even the best HDR10 displays employing some sort of tone mapping cannot dynamically adjust it from scene to scene. In other words, while peak-brightness scenes may be well served by a generic tone map, anything short of peak will not.
While there is talk of shoehorning something similar to Dolby Vision’s dynamic per-frame metadata in the future, in the interim the best effort to emulate it would be content detection, but that would be a most inelegant approach (does anyone remember all the trouble we had getting DVD players to detect a simple 3-2 cadence?). I can see the marketing rhetoric piling up already, with each manufacturer claiming they do it “better”, but in the end none will be able to claim faithfulness to the mastered material to the extent Dolby Vision can. Some don’t even try and simply clip everything above the display’s native capability. Ouch! Content creators should be scared to death to see their hard work distributed with HDR10: there is no way for anyone at the other end to know if they are seeing all that hard work done in mastering and grading or if they are seeing an aberrant image. Honestly, it’s a wonder anyone so much as farted around with HDR10, let alone went ahead and built/sold TVs last year employing it (without also including the categorically superior Dolby Vision).