As with my earlier post (link) about color subsampling, this is another essay about video technology that I expect to be of limited interest (though oddly enough that previous post is one of the more frequently viewed ones on this site).
If don’t particularly care how things worked in the analog world, you can skip the first part of this.
When broadcast electronic television was first introduced in the 1930s, engineers needed to deal with a number of problems simultaneously:
- Transmitting enough pictures per second to create an illusion of motion and to minimize visible flickering.
- Transmitting enough scan lines per picture and enough detail per scan line to produce a clear image.
- Transmitting no more cycles per second than absolutely necessary, so as to allow more TV channels in the available part of the broadcast spectrum.
More detail per scan line, more scan lines per picture, and more pictures per second all require more cycles per second, so obviously something had to give.
And the problem was worse than you might think. The minimum number of pictures per second needed to support an illusion of smooth motion (provided the speed of the motion isn’t too drastic) is about 16, though for compatibility with sound motion pictures it should ideally be 24 or some multiple of that.
However, to keep flicker from being visible, you need a minimum of about 50 pictures per second. (This varies from person to person and with the level of ambient light.) The earliest movies, shown at about 16 frames per second to save film, had awful flickering (which led to calling movies “flickers” or “the flicks,” which we still do today, though most audience members probably don’t know why). For a while Thomas Edison advocated making the frame rate much higher, but a cheaper solution was to interrupt the projector’s light beam two or three times per frame with a two- or three-blade spinning shutter, effectively flashing each picture onto the screen two or three times.
With early television, though, there was no way to store a picture inside the set and display it multiple times. In fact, for decades analog television consisted not of a series of images but rather a continuous spot of varying brightness scanned across the face of a cathode ray tube, leaving a trail of rapidly fading scan lines, and visible as pictures only because of our eyes’ persistence of vision.
Since the receiver and transmitter needed to be in perfect synchronization, there was a lot to be said (especially in the early days) for basing the scanning frequency on that of AC current — 60 Hertz (cycles per second) in North America and 50 in Europe, the two regions where broadcasting began. If you scan the screen 50 or 60 times per second, you need to somehow limit the detail in each scan if you want to hold down bandwidth.
The clever solution was to use a small number of scan lines per picture, but shift successive pictures up or down vertically by half a scan line, so that the scan lines in successive pictures “interlace.” This effectively allows the camera to capture and the eye to see about 40 percent more vertical detail out of the continuing stream of pictures than would be visible in any one of them.
To make this work, each picture (which in interlaced television is called a “field”) contained 242.5 scan lines in the American NTSC system or 287.5 in the European PAL. (For our purposes, SECAM is practically the same as PAL.)
I know, at first blush it seems downright bizarre to have fields with so-many-and-a-half scan lines, but that’s actually what makes the interlacing work! If the first scan line of a picture starts at the left edge, the last line in the picture is abruptly interrupted halfway across, and the top line of the next picture starts halfway across at the top. You can see a nifty picture illustrating the idea on David Stringer’s site (link).
That illustration, and with most others meant to describe how analog scanning works, shows the scan lines as very narrow. In reality, scanning is less like drawing lines with a sharp pencil than like painting with a flat brush. In a perfect analog television set, the scanning spot is tall enough that the scan lines of a field just touch each other, leaving no gaps. The scan lines of the next field don’t fill in the breaks in the previous field as diagrams and even some descriptions imply, because ideally there are no breaks to fill in.
Or to use another painting analogy, scanning a television screen is not like painting a wall with stripes and leaving gaps between them to fill in with the second coat; it’s more like painting a wall with horizontal stripes that just touch each other, then going back and painting a second coat of horizontal stripes, but this time positioning the paint roller so that its center rolls along the boundary between the stripes of the previous coat.
There are a couple of reasons for this, but the main one is to avoid a source of potential flicker. Suppose some very narrow bright bit of horizontal detail (or even a tiny dot) exists somewhere in the picture. If you left Venetian-blind gaps between scan lines when you captured the video, then that bright detail might appear in half the fields but fall into the gap in the other half. That means it would show up on the screen only every other field, flashing on and off at 25 or 30 times per second, slow enough for the resulting flickering (called “line twitter”) to be noticeable. If there are no gaps in the picture, that problem doesn’t exist.
In live television you can cut from one camera to another at any instant in time because the scanning for all the cameras is locked in perfect synchronization with a master sync generator (“genlock”). In analog switching there’s nothing wrong or even remarkable about cutting partway down a field. But consider what would happen if the cameras were not in sync: You might cut from the end of a scan line near the top of the picture from one camera to the middle of a scan line 2/3 down on the other camera. This would produce a noticeable glitch on the screen.
The same problem exists for editing analog videotape, except that it’s even worse because of the way the signal is physically laid out on the tape in diagonal stripes (something done in pretty much all tape-based video recording systems to avoid having a ludicrously high linear tape speed). This makes it necessary to cut between fields, and there’s even a restriction on that: You can’t edit it such a way that an upper field is followed immediately by another upper field or a lower field by another lower one. The up-down-up-down march of fields has to be maintained to prevent glitches on the screen.
That means you have to cut consistently, either always between a lower field and an upper one, or always between an upper field and a lower one. This choice is called “field dominance,” and if you’re ever cursed with having to use a tape-base linear editing system you may even have a switch that lets you exercise your own preference rather than the manufacturer’s.
Since it takes a pair of fields to complete a full lower-upper or upper-lower cycle, for many purposes (not just editing but even simply describing the details of the analog television signal) it makes sense to treat a pair of fields as a unit, termed a “frame.” (But keep in mind that in interlaced television, a “frame” is really two successive pictures.)
You may recall hearing that NTSC has 525 scan lines per frame. Half of 525 is 262.5, but didn’t I just say than an NTSC field has 242.5 scan lines? Yeah, I sort of did, and that’s because I left something out: It takes a tiny bit of time to reset the vertical sweep to start drawing the next field at the top of the screen, so each field includes 20 “scan lines” (or more precisely, cycles of the horizontal oscillator) that don’t really scan anything.
Similarly, in the 625-line PAL system, the vertical blanking interval (as it’s called) takes 25 scan lines per field.
Converting analog video to digital
Remember that an analog field either ends halfway through the bottom scan line or starts halfway through the top one. Since a PAL field has 287.5 scan lines, a frame has a total of 574 full scan lines plus two half lines. To capture them all, digital PAL video has 576 pixel rows per frame.
Similarly, a frame of NTSC analog video has a total of 484 full scan lines and plus two more half-lines, and the original digital standard for NTSC prescribed frames with 486 pixel rows. However, for purposes of compression, it’s helpful to make the number of pixel rows a multiple of 16. Since television sets are normally set to overscan (that is, the picture is little bit larger that the part we see), in practice most NTSC digital video discards a few analog scan lines and uses just 480 pixel rows.
Digital interlacing serves the same purpose as analog interlacing: allowing higher effective vertical resolution for a given number of scan lines per field. In addition, it makes it simpler to convert interlaced analog video to digital.
As far as I know, there are only three interlaced digital formats in actual use:
- 480i (compatible with NTSC),
- 576i (PAL), and
- 1080i (“full” HD, or not quite).
The “i” suffix stands for “interlaced” (duh). Note, incidentally, that the number before the “i” or “p” is always the number of lines (pixel rows) per frame, not per field, even in the case of interlaced video.
Non-interlaced video is usually given a “p” suffix, as in 480p, 576p, 720p, and 1080p. (There’s no 720i in current use that I’m aware of.) The “p” stands for “progressive,” but it strikes me as a bit of a misnomer. “Progressive” ought to mean that the picture is scanned one line after another, but that’s true of all video when captured with “rolling shutter” CMOS chips or the old tube-based imagers, whereas CCDs and “global shutter” CMOS chips capture all the pixels at once, whether shooting interlaced or “progressive.” Be that as it may, we appear to be stuck with using “progressive” (and the “p” suffix) to mean “non-interlaced.”
A single 1080i field contains 540 rows of 1920 pixels per row. As with analog video, a pair of successive fields are termed a “frame,” but there’s more to it than just treating them as a unit for purposes of editing. A 1080i digital frame holds a full array of 1920×1080 pixels (just like a 1080p “progressive” frame), with the odd-numbered rows supplied by one field of the pair and the even-numbered ones by the other. You might suppose that’s how the video is captured and shown on the screen as well, but it’s not, or at least it’s not supposed to be.
To belabor the point, remember that interlaced video is a series of fields. Fields are paired together to make frames for purposes of storage and editing, but the fields are still separate pictures. When you shoot or view interlaced video, you’re capturing or seeing 50 or 60 pictures per second.
The standard in NTSC countries (North America, Japan, South Korea, Taiwan, the Philippines, parts of South America (including Brazil where the standard is confusingly called PAL-M), and a few other places) is 60 pictures (fields) per second. The rest of the world uses 50 fields per second. You will often see this referred to as 60i or 50i, or sometimes combined with the number of pixel rows per frame into a single designation such as 1080i60.
To be precise, NTSC runs at slightly less than 60 fields per second. For arcane technical reasons the introduction of color in the 1950s required slowing the field rate by a factor of 1000/1001, to about 59.9400599401 fields per second, which for obvious reasons is usually rounded to 59.94 or just plain 60.
When you watch interlaced digital video, your television set needs to extract the two fields from each frame and show them to you one after the other — preferably in the correct order! If you watch interlaced video on a computer screen you might sometimes see it displayed as if it were progressive. That is, you see not a field at a time but a frame at a time, with the two fields interwoven. If there’s motion in the frame you get what looks like the teeth of a comb. To avoid that you need to use a video player designed to handle interlaced video, or else the video has to be “de-interlaced” or turned into progressive video. Doing a good job of that requires some fairly sophisticated software.
Material originally shot on film or non-interlaced video at 24 fps is sometimes converted to 60i, a process referred to for historical reasons as “telecine.” Some smart players, hardware and software, can “de-telecine” this 60i video back to 24p.)
As explained in more detail in the section on analog interlacing, it’s not true (as some descriptions of interlacing suggest) that the scan lines of a field have gaps between them, like looking through a Venetian blind. In fact, the 540 rows are expanded into 1080 for viewing, either by doubling or (ideally) by interpolating new rows to fill what would otherwise be gaps.
Roughly the same thing happens when shooting interlaced video with a digital camera: The imager captures a full 1080p picture, then each pair of rows is averaged together to reduce it down to 540 scan lines.
If you always paired the same two rows with a given scan line, you wouldn’t have interlacing, just video with half the vertical resolution. So instead, successive fields alternate how the pixel rows are combined. That is, in one field, the rows are paired odd-even and in the next field even-odd. (What about a row left over at the top or bottom? Typically there are some extra rows on imaging chips so nobody is left without a dance partner. Or the unpaired row can just stand by itself; it’s not likely you’d notice. With many televisions you don’t seen the full raster anyway, which you can blame on those dang historical reasons that keep cropping up.)
That’s basically it. There are still complications, of course, many of them having to do with the way compression works, but that, as they say, is beyond the scope of this discussion.
The main thing to know about interlacing is not to use it unless you have a very good reason. As a rule DSLRs don’t even give you the option of shooting interlaced. For most purposes 24p is the best choice since most movies are still shot that way (whether on film or digitally) and pretty much any video delivery medium from broadcast to disk to Internet streaming is capable of handling 24p.
On the other hand, some video cameras (especially older ones) force you to shoot interlaced, and old video may come to you in interlaced form. Though there are exceptions (mainly in broadcast television), you will usually want to de-interlace interlaced video, not least because it looks awful when viewed on the Internet. If the material was originally shot on film and converted to interlaced video you may alternatively need to de-telecine. Doing a good job of either requires decent software. A good free option is the open-source Handbrake (link), which features both de-telecine and a smart de-interlacing filter called “decomb.” For high-end de-interlacing I’ve heard good things about FieldsKit from RE:Vision (link) but never used it myself.
(In an earlier version of this post I wrote “VLC” when I meant “Handbrake.” Sorry about that. They’re both free and both worth getting, since they have different but overlapping purposes. Consider making a donation to the developers to encourage them to keep updating the products.)
I hope this has been useful for somebody, and as always I welcome comments and corrections telling me how wrong I am. (I’ve already revised this post multiple to correct various misstatements I managed to catch myself on second and third reading, not to mention lots of typos. I hope I haven’t left in anything too serious.)