How JPEG and MPEG picture compression algorithms work

While their names sound similar apart from the first letter, JPEG and MPEG refer to the names of two unrelated organisations.  The Joint Photographic Experts Group (JPEG) created the JPEG image file format (as well as JPEG 2000, a different format), while the Moving Picture Experts Group give their backing to a large number of audio, video, container and stream formats to support the transmission of digital audio or digital video, which are grouped into generations (MPEG 1, MPEG 2, MPEG 4).  For instance, the popular MP3 audio file format is a part of MPEG 1 (MPEG 1 audio layer 3), DVDs and digital television usually use standards from MPEG 2, and the popular “MPEG 4” or “DivX” video format is standardised as part of MPEG 4 (MPEG 4 part 4).

All of the formats, including JPEG images and all types of MPEG videos, work in a very similar way, which I’ll call block based DCT image compression.  This involves chopping up the picture into blocks, 8 pixels by 8 pixels, and compressing each block separately, and using a mathematical transform called a Discrete Cosine Transformation (DCT).

Here are some of the steps involved in JPEG and MPEG image compression:

Colour space conversion and resampling

If necessary, the image is first converted into the correct colour space and resolution.  This involves separating the colour image into 3 separate images or ‘channels’: a luma channel which carries the brightness information (like a black and white image), and two chroma channels which carry the colour information.  Furthermore, the luma channel is usually given double or quadruple the resolution of the colour channels; this saves some storage space already, and works because humans are much better at seeing intricate changes in lightness than colour, which is why we prefer reading light text on dark background and vice versa, to red text on green or similar.

Separate into 8×8 blocks and perform Discrete Cosine Transform

Each channel of the image is then divided into blocks of 8 pixels by 8 pixels (64 pixels per block).  To each block, a mathematical formula called a Discrete Cosine Transform (DCT) is performed.

Before the discrete cosine transform, you have 64 values, each representing a pixel in the block.  After the transform, you have 64 values, each representing a particular frequency of any detail in that block.  It is like taking a piece of audio and constructing a spectrum analysis, where you get a frequency vs amplitude curve, except the curve is more of a surface, because it’s in two dimensions.  The very first resulting value represents a frequency of ‘zero’, ie it is the average value of all of those original 64 pixels.  The rest are co-efficients: values which you can input back into the inverse DCT function to get the original pixel values.

Storing these coefficients is no more efficient than storing the original pixel values.  In fact, due to rounding error you need greater precision in order to be able to store them than the pixel values.  How, then do we achieve any compression?  Well, for that, you need to throw away some data.

Quantisation

Quantisation is the process of reducing the precision of a number so that it is less accurate, but takes fewer digits (that is, bits) to write, or store.  It is, quite simply, dividing a number by a certain value, and throwing away the remainder.  When multiplying by that value again in an attempt to get the original number, chances are that the number is close to, but slightly different to, the number you started with.  The more you divide the number by, the greater the strength of the quantisation and hence the greater the likely loss in precision once you multiple the number again.

So, the previous compression step left us with 64 co-efficients arranged in an 8×8 matrix, representing the values of different horizontal and vertical frequencies within that 8×8 block.  Now, we apply quantisation to these coefficients, reducing their precision by dividing them by a specific number and throwing away the remainders.  When we save a JPEG image or MPEG movie we can usually set either a quality slider or quantisation factor – these just vary the number we are dividing by and hence the amount of precision we want to retain.

The JPEG and MPEG compression can also apply different amounts of quantisation to the different frequency coefficients within a single block, or different amounts per block in an image.  When this is done, it is all in an attempt to get the best looking picture, as perceived by human eyes, for a given amount of quantisation.

Encoding the coefficients

Depending on the amount of quantisation applied, some or most of the smallest coefficients will have been reduced to zero, because they were so small and insignificant to start with and during quantisation, resulted in a value less than 1 and therefore were discarded.  It’s likely that a typical block will only have a few non-zero coefficients remaining after quantisation.

The co-efficients, which were laid out in an 8×8 matrix, are now picked out in a specific order for encoding: usually a “zigzag pattern”, chosen so that the smallest co-efficients will likely be last; any zero-coefficients at the end are not encoded at all.  The remaining coefficients undergo some form of lossless coding, which minimises the bits taken by the remaining coefficients.  The type of coding chosen varies between sub-types of MPEG video, but the general idea is that each coefficient is represented using a variable number of bits, so that smaller ones take less space.

Motion

The process so-far is simplified, but basically true for still images, and for the still component (keyframes or I-frames) of MPEG video.  MPEG video encoding can also involve motion estimation, where a block can be optimised by only encoding the difference between it and a nearby area of some other frame.  Encoding is still done per-block, and the blocks still undergo DCT and quantisation, but the selection of a reference point in another frame can be somewhat complex.

Decoding the image

Decoding is a reverse of the above process: the stream is decoded to find the coefficients for each block, and these coefficients are multiplied by the same number they were previously divided by, in order to reverse the quantisation.  The quantisation process did involve some precision loss, which can be seen by this point.  Then, an inverse DCT is applied to get pixel values for the block.

Finally, if it is a colour image and the two chroma channels used a lower resolution than the luma channel, these are now up-sampled – by interpolation – to match the luma channel, and then the image can be converted into a suitable colour space for viewing, such as RGB (when displaying on a monitor).

More on what’s good about HD

Wired has a blog post called DVD Doomed? The question mark at the end of the title signals the author’s skepticism.

The key sentence, in my opinion, is:

Unlike going from videotape to disc or vinyl to CD, the DVD to hi-def migration isn’t compelling enough to get consumers to re-buy movies they already own.

I have blogged the same thing before. Going from 576 lines of resolution to 1080 lines of resolution is not going to let me see much more acne on an acter’s face in the middle of a gunfight, and the sound on a DVD was already capable of reproducing what was played in the cinema.

The biggest benefit to Blu-ray and HD-DVD, more so with Blu-ray than HD-DVD, is the increase in storage capacity for those using the things to backup software and share video. You can do so much more with 25GB per layer than 4.7. And I think that players are going to be used increasingly for burning and reading burned media, while playing store-bought videos will decrease as a percentage of use.

The format wars are less of a big deal with hybrid drives. With DVD+R and DVD-R (the previous format war), people don’t even have to care which type they buy anymore because any player can burn or read both. With Blu-ray and HD-DVD the technical details are a little different, but nonetheless hybrid drives are coming out which could make the choice of which format to buy more flexible.

I am skeptical about the extent to which this war will be fought over releases from movie studios – I think the burn-your-own market is going to prove more important in the future.

I also don’t think that any slowing in sales of store-bought DVD videos can be attributed to the ‘success’ of Blu-ray or HD-DVD. If anything, it is partly due to confusion over formats. People don’t want to commit to any technology until they know they have backed the right horse. And it is probably partially just an overall decline in people buying videos, as more and more people watch or purchase videos online or share them on burned discs.

Why HD

One of the most important questions that seems to be overshadowed by the media hype surrounding HD formats such as Blu-ray and HD-DVD:

Will it help me enjoy my movie more?

Certainly, throw away your VCRs and buy yourselves DVD players. Apart from the fact you can’t get movies on VHS anymore, DVDs are more reliable, there’s less ‘annoying’ hiss from the audio and there’s none of the annoying snow on the screen. And you won’t ever get tape stretching or unthreading itself inside your machine.

DVD doesn’t have these problems. And DVD players are dirt cheap. What problems does DVD have that require us buying Blu-ray? A slightly lower screen resolution? Is there anything in a movie frame that I could see in 720 or 1080 lines that I couldn’t see in 576, that will help me enjoy the movie more?