(Update 2016 June 4: If this subject interests you, you might also be curious about claims that 8-bit 4:2:0 UHD or 4K video can be downconverted into 10-bit 4:4:4 HD or 2K. I think the 4:4:4 part is on the level, but the 10-bit part is only partly true. See this later post for a discussion.)
Among professional videographers, it’s broadly accepted that 4:2:2 chroma subsampling is better than 4:2:0, which of course it is. But people accustomed to shooting standard definition interlaced video may overestimate the difference, or at least that’s what I’m going to argue here. You can skip the details and go right to the conclusion by clicking here.
First, some quick background: As with digital photography, digital video treats each image as a rectangular array of picture elements, or “pixels.” More pixels means more fine detail, that is, more spacial resolution.
(There’s also temporal resolution, basically the number of images per second. You might think that the more temporal resolution the better, but that’s not necessarily true, as witness negative reactions to the look of Peter Jackson’s Hobbit films when they were shown theatrically at 48 frames per second. I’ll have more on that in a later post.)
A typical Blu-ray movie has frames that are 1920 pixels wide and 1080 pixels high. Or more accurately, those are the dimensions of the black and white part of the image. Color information is handled separately and usually at a lower resolution. Consumer digital video formats (DVD, Blu-ray, broadcast HDTV, etc.) use only half as many pixels both vertically and horizontally to carry “chroma” (color) information as they do for “luma” (gray-scale brightness) information.
Our eyes are less sensitive to fine detail in color than in black and white, so we don’t really notice the difference, and it saves a huge amount of space on disk and what’s called “bandwidth” (basically the total amount of information per second that has to be moved around). That’s an even bigger deal than it sounds, because a chroma pixel requires twice as many bits as a luma pixel.
For historical reasons, recording chroma resolution at half the resolution of luma both vertically and horizontally is referred to as “4:2:0 chroma subsampling” or just “4:2:0 color.”
(You sometimes hear it called a “4:2:0 color space,” but that’s not strictly correct. “Color space” technically refers to the range of different possible colors that can be represented, not to the amount of fine color detail on the screen.)
But professional digital production often works with more color resolution than this, up to 4:4:4, which means color is handled at the same resolution as black and white. In practice, the most common color subsampling rate for high-quality professional productions is 4:2:2, meaning that color resolution is the same as black and white from top to bottom but only half as great from side to side. Low-end production uses 4:2:0, half-resolution color in both dimensions, the same as DVD or Blu-ray.
In terms of pixel count, with 1080p (“full HD”) production, 4:4:4 means 1920×1080 pixels of color data per frame, 4:2:2 means 960×1080, and 4:2:0 means 960×540.
If it seems to you that this nomenclature is confusing, I agree. It’s a historical artifact going back to the early days of converting analog video to digital. Luma (brightness) was sampled every scan line at 4 times a specified base rate, while chroma was sampled from a pair of successive scan lines at a possibly different multiple of the same base rate. What 4:2:2 means is that both lines of the pair were sampled for color at twice the base rate, and 4:2:0 means that one line was sampled at twice the base rate and the other was not sampled at all.
It might also strike you as a little odd that 4:2:2 has half-resolution color information horizontally but full resolution vertically. What’s so special about vertical resolution?
The answer is nothing, if we’re talking about so-called progressive video, the kind used today for pretty much everything except certain broadcast work. But for interlaced video, the difference is important.
When analog television was being developed the engineers came up with the idea of offsetting each successive picture (called a “field”) alternately up and down from the previous one, by half the height of a scan line. When viewed as a continuous stream of images, this “interlacing” of the scan lines in successive fields would effectively convey about 40 percent more vertical detail than existed in any one field. This allowed reducing the number of lines per field to save broadcast bandwidth at little cost in picture sharpness.
(For a more detailed explanation of interlacing, see this post.)
Note that while fields are paired together as video “frames” for editing and other purposes, when we watch interlaced television what we’re seeing is a rapid sequence of fields. (Some sources erroneously claim that our eyes or our brains somehow merge the fields back together into frames, but this simply isn’t true, as explained in the link above.) Chroma subsampling in interlaced video happens at the field level. That is, when we talk about sampling color information from successive scan lines, we mean successive scan lines of a single field, not a frame. (I’ve sometimes seen this described in a way that implies 4:2:0 chroma is sampled from every other field. The sources I trust say this isn’t the case.)
In 1080i video, one of the standards common in broadcast HDTV, each frame has 1080 scan lines, but each picture (field) has only 540 rows of 1920 pixels each. So when chroma is subsampled at 4:2:0, you wind up with just 270 rows per picture of color information. Fortunately our eyes are forgiving enough that we don’t notice. (It helps that interlacing raised the effective vertical resolution for chroma as well as luma.)
It doesn’t even bother us to watch an old 480i television show on DVD, where the 4:2:0 color leaves us with just 120 rows of color information per field.
But now suppose we want to shoot a person or an object in front of a green or blue screen and replace that color background with something else, such as a weather map. Now you have a situation in which color information affects the black and white part of the picture, and that’s where the loss of vertical detail really comes to bite you.
Actually, low-end standard definition digital video production formats (DV, Sony’s DVCAM, Panasonic’s DVCPRO), at least as used in NTSC countries, make use of 4:1:1 chroma subsampling. With 4:1:1, color information is as detailed as black-and-white vertically but only 1/4 as great horizontally. To see why, consider this comparison of pixel dimensions for standard definition digital video in NTSC countries:
480i (interlaced) | 480p (non-interlaced) | |
---|---|---|
4:4:4 chroma / luma | 720×240 | 720×480 |
4:2:2 chroma | 360×240 | 360×480 |
4:2:0 chroma | 360×120 | 360×240 |
4:1:1 chroma | 180:240 | 180:480 |
Note that with interlaced 480i, 4:2:0 makes the ratio of horizontal to vertical chroma resolution 3 to 1, whereas 4:1:1 gives you 3 to 4, which is a lot closer to being the same in both directions. (I’m ignoring for simplicity’s sake the fact that the picture is wider than it is high.) The great majority of cameras shooting DV, DVCAM, and DVCPRO were intended to shoot interlaced video, so 480p was less important.
Curiously, the PAL versions of DV and DVCAM used 4:2:0 chroma subsampling if I recall correctly (though Panasonic’s DVCPRO used 4:1:1 for both NTSC and PAL). In the PAL world, vertical resolution is 20 percent greater (576 pixel rows per frame versus 480), so perhaps the engineers were less concerned the vertical chroma resolution of 576i and liked the close compatibility with the DVD standard’s 4:2:0 chroma.
What’s surprising, by the way, is that you can often get an acceptable key out of 4:1:1 video if you give the edges of the key a little extra “feathering” (blur). It’s not ideal, but it’s not entirely awful. Still, 4:2:2 is clearly preferable.
At this point it should be obvious why anyone shooting standard definition interlaced video would much prefer to shoot 4:2:2 and get 360×240 chroma pixels per field in NTSC or 360×288 in PAL, especially if they need to pull a chroma key.
Stepping up to interlaced high definition at 1080i — which has 1920×540 luma pixels per field — 4:2:2 gives you 960×540 chroma pixels while 4:2:0 leaves you just 960×270. Again the former is clearly better. But remember, in interlaced standard definition it was common to pull decent keys with 4:2:2 color, and that’s even fewer rows of pixels than 1080p at 4:2:0.
Moreover, if you’re shooting 1080p (not interlaced), then 960×540 color pixels (4:2:0) isn’t so bad at all. In fact, some HDTV is originated at 720p (1280×720 black-and-white pixels per image). With 720p at 4:2:2 you have 640×720 chroma pixels, a total count of 460,800. Compare that against 1080p at 4:2:0, which has 518,400 total chroma pixels.
To sum up
My point is that back when professional videography was all about shooting 480i NTSC or 576i PAL, there was a big difference in going from 4:1:1 or 4:2:0 to 4:2:2, and hence pro videographers came to think of anything less than 4:2:2 as simply not professional quality.
But shooting 1080p or even 1080i, the difference between 4:2:0 and 4:2:2 is much less likely to be noticeable.
Once you reach the point of shooting 1080p at 4:2:0, the improvements you see in going to 4:2:2 will be real, but they won’t be dramatic. When higher end formats look better, it’s probably due more to 10-bit (or more) sampling and lower compression than to high chroma subsampling.
Also, in general, don’t spend a lot of money upgrading your camera if it causes you to sacrifice lighting, lenses, audio, and other things that can often make a more noticeable difference.
That last paragraph is basically it; all the rest is commentary.
Now briefly back to some technical esoterica if you can stand it. You should feel free to ignore the following even more than you should have felt free to ignore what you’ve read so far.
In the discussion of standard definition I omitted to note that the same pixel array can be used for either 4:3 or 16:9 content, depending on the setting of the aspect ratio flag. True 16:9 DVDs, that is, those without embedded letterboxing, are often labeled “anamorphic” or “enhanced for wide-screen televisions.” Go and check your player and makes sure it knows your television set is 16:9 (assuming it is), and check the “wide” mode on your television to see that it’s set “normal” (on Sony TVs) or whatever you brand calls the mode that doesn’t muck with the video any more than necessary to put black bars at the sides of 4:3 video. This may or may not give you a better picture, but it’s worth trying.
Standard-definition video pixels aren’t square. (If they were, a 720×480 array of 1:1 pixels would have an overall aspect ratio of 3:2, the same as a 35mm slide.) There are actually four different pixel shapes used, depending on whether the video is intended for 4:3 or 16:9 material, and on whether it’s PAL or NTSC. Remember that digital pixels are just sets of numbers, so the shape depends on how your television set interprets the data. (There is an aspect ratio flag in the metadata to tell it whether the image is old-school or wide screen.)
The shape of a pixel is called the pixel aspect ratio, and you may see that come up in video editing from time to time. Don’t try to work out the pixel aspect ratio for yourself, because it’s not as simple as you’d think. In NTSC, there are actually 704×486 pixels in a 4:3 picture. That’s because when the digital standard (ITU-601) was developed, it was decided to overscan each line by 8 pixels on both sides, for a 720×486 pixel raster, and later the top and bottom 3 lines were dropped to get a number of rows that’s a multiple of 8. PAL is overscanned at the sides as well, but I don’t think any lines are dropped, so with PAL the 4:3 picture is 704×576 pixels. I think. (Those good old “historical reasons” again.) Don’t bet on my having remembered this correctly; look up whatever the pixel aspect ratio you need if you ever have to know for sure. Even some professional NLEs have managed to get it wrong.
While I’m talking about arcane matters, if you’re wondering how SECAM (used in France, parts of Africa much of the Middle East, and the former Communist bloc) fits into this, it’s effectively identical to PAL for production purposes (even in analog provided chroma and luma are handled separately, as was generally the case for professional work from the Betacam era forward.). SECAM is different from PAL only at the level of composite analog video and analog broadcast. (In fact, there are a ridiculous number of broadcast SECAM standards differing with respect to such things as audio subcarrier frequency and modulation.)
PAL DVDs should work in SECAM countries. I admit I have seen DVDs for sale in Finland labeled as “SECAM,” but I suspect they’re identical to PAL.
Finally, just to mess with us, Brazil’s “PAL-M” is in most respects NTSC (that is, 480-line video at 60 fields per second) but with PAL-type color used in composite video encoding, so production for PAL-M is practically the same as for NTSC.
Now back to our regularly scheduled program of YouTube links and political rants.
by
“So with 1080p production, 4:4:4 means you have 1920×1080 pixels of color data, 4:2:2 means 960×1080 color pixels, and 4:2:0 means 960×540 color.”
I’m not quite sure how you arrived at those numbers. I would think that at 1080p the efective color resolution for 4:2:2 would be 960×540.
I can certainly see why you’d think 4:2:2 ought to mean half-resolution both vertically and horizontally, but unfortunately the convention doesn’t make that much sense.
Basically, the first number (for most practical purposes almost always a 4) is an indication of the rate at which luma (brightness) is sampled, while the second and third numbers are the rates at which chroma (color) is sampled in alternating scan lines. (Note that if we’re talking about interlaced video, we’re speaking of alternating scan lines within a single field.)
In 4:2:2 video we’re sampling chroma half as often as luma on every scan line. That is, there are half as many chroma samples as luma horizontally, but chroma and luma are sampled at the same rate vertically.
In 4:2:0 video we’re sampling chroma half as often as luma on every other scan line, and not at all on the remaining scan lines. That is, there are half as many chroma samples both horizontally and vertically.
(As you’d expect, exactly how chroma is sampled is more complicated than that and in practice depends among other things on whether you’re using a one-chip or a three-chip camera)
Pls explain more on interlacing thank you are more understandable to me than most of what l found on the net You explain it better thanks for that
Herbert, I apologize for the long delay in replying to your question. If by chance you read this, I’ve just posted an explanation of interlacing here.
Pingback: Interlaced video | D Gary Grady
I just wanted to say how much I appreciate you putting this article together. I am working towards starting my own media studio and producing commercials that could possibly end up on TV. This helped clear up the majority of my questions, so I thank you again and great work!
Pingback: Can converting 8-bit 4:2:0 UHD to HD really give you 10-bit 4:4:4 video? | D Gary Grady