Can converting 8-bit 4:2:0 UHD to HD really give you 10-bit 4:4:4 video?

As you might guess from the title, this post is mainly of interest to people involved in digital video, and it’s fairly technical, but experience suggests some people out there will find it interesting, so here goes.

If you have a solid grasp of what the numbers and abbreviations in the title refer to, you may want to jump ahead to the conclusions. Otherwise, here’e a quick review of 4:2:0, 4:4:4, 8-bit versus 10-bit, and all that.

What the numbers mean and why they’re important

Just like a movie on film, digital video is a rapid sequence of pictures. In digital video each picture is a rectangular mosaic of little squares called “pixels.” Actually, in standard definition and in interlaced video, the pixels aren’t necessarily square, but we’re not talking about that here. All we’re concerned about are high definition (specifically 1080p) and ultra-high definition. These are commonly abbreviated HD and UHD.

(UHD is sometimes called “4K,” but I think it’s better to reserve the terms “2K” and “4K” for the 2K and 4K standards for digital cinema. More on that below.)

Each HD picture consists of an array of 1920 pixels horizontally and 1080 vertically. UHD is double that in each dimension, 3840 by 2160, for four times the total number of pixels. (But despite what some advertising hype claims, that gives UHD twice the resolution of HD, not four times, given the way resolution is defined and measured.)

Pixels have color as well as brightness, but because our eyes are less sensitive to resolution in color as opposed to black and white, we can save digital storage space and bandwidth by storing color information only every other pixel, both vertically and horizontally. For historical reasons this is referred to as “4:2:0 chroma subsampling.” “Chroma” refers to color and “luma” to brightness.

(Or to be pendantic, “chroma” and “luma” are terms for the encoded, typically non-linear representation of the underlying color and brightness, which are “chrominance” and “luminance.”)

If on the other hand we store and transmit chroma information for every pixel, that’s 4:4:4. There’s also 4:2:2, which means that chroma is sampled for every other pixel horizontally but for every pixel vertically. As I explained in painful detail in an earlier post, the vertical dimension was favored mainly because of the needs of interlaced video, in which each picture (field) has half the nominal vertical resolution. For present purposes we can ignore 4:2:2, though it is commonly used in professional production, whether interlaced or not.

Each brightness measurement is encoded as a binary number and each chroma measurement as a pair of them, basically “redness” and “blueness.” If you’re wondering what happened to “greenness,” it can be deduced from redness, blueness, and brightness (the corresponding luma value).

The minimum practical size of each of these binary numbers is 8 bits, which is enough, if just barely, to allow for a smooth translation between brightness levels, at least given the right sort of encoding and not too much digital compression. Even so, you sometimes see what’s called “banding” the sky and on blank walls when watching DVDs, which use 8 bits per sample to encode the picture.

But if you’re going to tweak the image in post-production, for example for purposes of color correction and grading, it’s better to have a couple more bits per measurement. A 10-bit binary number, for example, has 1024 possible values versus just 256 for an 8-bit number.

All the foregoing applies to both HD and UHD, though the finer spatial resolution of UHD may make some limitations of 8-bit data less obvious.

What does converting UHD to HD get you?

OK, with that preamble out of the way, what’s the story on converting 8-bit 4:2:0 UHD video to 10-bit 4:4:4? Keep in mind that I’m assuming that the down-conversion is done in a reasonably smart way.

The 4:4:4 part is easy. UHD video shot with 4:2:0 chroma subsampling has 1920 by 1080 chroma measurements per image, which is exactly what you get with HD video and 4:4:4 chroma. So this part of the claim is obviously true: Converting UHD with 4:2:0 chroma to HD should indeed lead to 4:4:4, unless the conversion throws away the extra information. Of course, it might well do that, especially when finally exported into a distribution format, but if your editing and effects software and your computer hardware can handle 4:4:4, there’s no reason not to use it.

The 8-bit to 10-bit business is a little more complicated. Each HD pixel corresponds to a 4 by 4 array of UHD pixels, so to get each HD luma value we would combine the four 8-bit UHD values. But to record that result without any loss of information, we would need to use a 10-bit number (computed by simply summing the four 8-bit samples, in fact). So at least with respect to luma, there is definitely something to be said for the claim of ending up with 10-bit samples.

On the other hand, suppose the luma is uniform across the four UHD pixels. Then since the UHD pixels we’re starting with are 8-bit pixels, they can each register only one of the 256 possible values for an 8-bit image. Adding those four identical luma values together isn’t going to get any gradations between those values. You can get a 10-bit number all right, but the last two bits of the sum will be 00. (If it’s not obvious why, it’s because you’re in effect multiplying the 8-bit value by 4, which in binary is 100, and just as when multiplying by 100 in decimal arithmetic, the result ends in 00.)

On the other hand, if there’s a gradual gradation in brightness across the area, which there often will be, you would probably end up with 10-bit luma numbers that aren’t just padded 8-bit numbers. That means that using 10-bit samples for luma is likely preserving some information, even it’s not exactly identical to what you’d get from shooting 10-bit video to start with.

Finally, since with 8-bit 4:2:0 UHD there is only one chroma sample for each four pixels, the chroma samples after conversion are essentially what you’d get from a camera with 8-bit 4:4:4 chroma. So while you do wind up with more spatial chroma resolution, you’re not really getting any more in the way of subtle gradations in color as opposed to brightness. Each pixel is still going to be limited to 256 possible values each for “redness” and “blueness,” even if you pad them out to 10 bits by appending zeros (which is what should happen for consistency with the luma).

So basically, in terms of bit depth, converting 8-bit UHD to 10-bit HD doesn’t really give you the same as you’d get from a camera shooting 10-bit HD video.

Now, what does this mean in terms of things we actually care about? I think the advantages for 10-bit 4:4:4 over 8-bit 4:2:0 are

pulling chroma keys in green-screen or blue-screen work,
avoiding banding and blotching artifacts on bare walls, the sky, and the like, and
getting decent results from color correction and grading.

Taking each of these in turn:

Chroma key

Having 4:4:4 chroma should maximize spatial resolution in chroma keying, so this would appear to be an unambiguous true advantage of shooting in UHD and down-converting over shooting HD even when both start out 4:2:0.

Banding and blotching artifacts

With 8-bit images you will sometimes see blotches on a blank wall and banding in a clear blue sky as a result of gradual changes in brightness and color and the limited number of distinct values that can be represented with 8 bits per sample. (I suspect compression enters into this as well.) True 10-bit images are much less likely to be afflicted by this. The good news is that if we’re talking about changes in brightness across an image, adjacent UHD pixels should have slightly different 8-bit values and this should translate into more gradations of 10-bit pixel values after UHD to HD conversion. So I think this should avoid or at least greatly reduce the banding and blotching problem, at least in terms of brightness gradations as opposed to color gradations.

One reason I think this is true is what Dave Dugdale’s tests showed in this YouTube video. Basically, UHD 8-bit video was pretty much as free of banding in a blue sky as 10-bit HD video.

Color correction and grading in post-production

There seems to be agreement that simply transcoding HD 8-bit material to 10-bit for post work helps with color grading (though I’m not going to go off on a tangent about why). With 10-bit HD downconverted from 8-bit UHD I’d expect this to hold true or even slightly improve (because of the additional information in the luma channel).

Note that I haven’t done my own tests to verify my theoretical musings, so don’t take this as authoritative. Then again, shooting UHD and converting to HD may be the only practical way most of us can get 4:4:4 HD, at least with affordable equipment available today.

For more on this subject, including some ideas for how best to do the down-conversion, see the following pages (and search the web for more recent items on the same subject, since I suspect we’re going to be hearing more about it):

Andrew Reid’s article at EOSHD.

Allan Tepper’s article at Pro Video Coalition.

Barry Green’s article at HD Warrior.

A final note on 2K and 4K versus HD and UHD

Some people refer to UHD as “4K,” which I consider a bit misleading. I think it’s better to reserve the term “4K” for the 4K standard for digital cinema. UHD is indeed quite close to 4K in resolution (and HD is close to 2K), but the digital cinema standard has more bit depth and other advantages.

The 2K frame has 2048 by 1080 pixels and 4K has (as you’d expect) 4096 by 2160 pixels. The corresponding numbers for HD and UHD are (as noted a bunch of paragraphs ago) 1920 by 1080 and 3840 by 2160.

Note that in cinema the term 2K is derived from the 2048 horizontal pixels, while in television we refer to 1080p based on the vertical pixel count. They probably do this just to mess with us.

Notice that the pixel counts for cinematic 2K and 4K imply an aspect ratio of about 1.90:1, but in practice the full frame is pretty much never used. Most movies are instead shot for exhibition at 1.85:1 (often called “flat”) or 2.39:1 (“scope”), so in practice either the top and bottom or the sides are blacked out. In 2K the actual projection standard is 2048×858 for scope and 1998×1024 for 1.85:1. In 4K the pixel counts are simply doubled.

You might be wondering about what happened to 2.35:1 and 2.40:1. The old Cinemascope 2.35 standard was actually replaced by 2.39 in 1970, but people still call it “2.35” out of habit. As for 2.40, that’s 2.39 rounded to 2.4 and a zero tacked onto the end. When you see 2.35 and 2.40 in connection with a film made since the Nixon administration, just keep in mind that both really mean 2.39 in modern cinema, no matter what IMDb and the backs of your DVDs or Blu-rays say.

(Updated 2016 June 20, September 12, and November 18 and 2017 August 14 to correct some typos and improve clarity, I hope.)