DCI HDR Image Tests Might Be Flawed & Why It Doesn’t Matter

Back in January and February 2020, ten double-blind image testing sessions were conducted at Sony Pictures Studios to determine the minimum specifications for peak brightness that would yield “a substantially differentiated HDR viewing experience.” A Sony CLED was used for mastering and viewing since it was the only technology available at the time that could reproduce both the very low black levels required as well as 800 cd/m² peak brightness for the tests. The study included both expert and lay viewers, with 157 participants in all. For the testing, each clip was around 10 seconds in duration and each sequence consisted of two versions of the clip presented twice, as follows: Version A / Version B / Version A / Version B / Scoring. Participants rated their preferences on a 7-point scale. Based on the findings, the DCI determined that (a) 300 cd/m² for HDR content provided a sufficiently differentiated viewing experience; (b) brightness levels below 300 cd/m² were insufficient at achieving a compelling HDR experience; and (c) levels > 300 cd/m² performed similarly but did not provide significant additional value. Was the methodology employed by the DCI sound?

Picture credit: DCI. Levels > 300 cd/m² performed similarly and did not provide significant additional value.

In the excerpt from a paper published in the Journal of Electronic Imaging reproduced below, the authors maintain that sequential testing is not as reliable as side-by-side (SBS) comparisons, either in terms of confidence intervals or in differentiating between different stimuli.

“For the traditional test video clips of 10 to 15 seconds duration, it is known that it is much easier to see differences when the video clips are shown SBS than when they are shown sequentially. A recent study verified this by directly comparing the two methods. The experiment was identical for both cases, including display, stimuli, and task. The experiment tested one parameter of display capability: maximum luminance for HDR. In the sequential testing, one Dolby professional reference display (pulsar) was used. For the SBS testing, two pulsar displays were used. The resolution of each was full HD (1920×1080), the diagonal was 42 in., the bit-depth was 12 bits red, green, blue (RGB), the color gamut of the signal was 709, the black level remained constant at 0.005, and the ambient was 20 lux. A hidden upper anchor was used for each comparison. The viewer’s task was to rate the quality (according to their own personal preference) of each of the two stimuli shown using a Likert scale. The maximum luminances tested were 100, 400, 1000, and 4000 cd/m². Six different HDR video clips were used, where two different max luminances were compared in each trial. The main conclusion of the results (shown in Fig. 2) is that sequential comparisons are more difficult than the SBS.”

Picture Credit: Journal of Electronic Imaging

“This shows up both in terms of the confidence intervals and the shape of the curves. The confidence intervals are clearly seen to be on average 2x larger for the sequential comparison task, and the range of quality is reduced. For example, there is not a significant distinction between the 400 and 1000 cd/m² versions in the sequential testing, while there is a clear distinction across all four tested stimuli parameters in the SBS methodology.” Scott Daly, Allison, R., Brunnström, K., Chandler, D., Colett, H., Corriveau, P. et al. (2018). Perspectives on the definition of visually lossless quality for mobile and large format displays. Journal of Electronic Imaging, 27(5): 1-23.

Accordingly, t’s not unreasonable to expect that there might have been a clearer distinction between 300 cd/m², 500 cd/m² and 800 cd/m² peak brightness levels had the DCI done SBS testing rather than sequential testing. Why it might not matter? DCI’s specifications are just minimums. No one is required to grade content to a peak brightness of 300 nits and manufacturers are not prevented from offering displays/projectors that exceed 300 nits.

The graph below illustrates why it is so difficult to achieve HDR on big screens.

Picture credit: Anders Ballestad, Ronan Boitard, Gerwin Damberg and Goran Stojmenovik.
The graph demonstrates why HDR is so challenging to achieve on big screens.
The HVS response to light is nearly logarithmic in luminance; therefore, in order to
meaningfully increase perceived brightness, displays must achieve exponentially
higher peak luminance levels.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

Up ↑