Skip to main content

Linear Predictive Coding

Linear Predictive Coding (LPC)

Linear Predictive Coding (LPC) predicts the value of an audio sample based on the values of previous samples. Because consecutive samples in audio waveforms are often correlated, the prediction is usually close to the actual value, resulting in smaller prediction errors, known as residuals. These residuals can then be compressed more efficiently than the original samples.

For example, let's say that we have the following signal x[n]x[n].

x[n]=[125,150,155,180,160,135,110,100]x[n] = [125, 150, 155, 180, 160, 135, 110, 100]

How many bits do we need to encode the signal x[n]x[n]? Assuming we can only use signed integers, then we need 9 bits per sample. There are 8 numbers, so the total size of the encoding is 98=729*8 = 72 bits.

Now let's say that we have the following linear autoregressive predictor. (The "hat" on xˆ\^{x} indicates that it is a predicted value.)

xˆk+1=xk\^{x}_{k+1} = x_{k}

Given our prediction equation, we can encode the signal with LPC in the following way. (rr stands for residuals.)

r[n]=[0,25,5,25,20,25,25,10]r[n] = [0, 25, 5, 25, -20, -25, -25, -10]

In our LPC encoding, the residuals can be encoded with 6-bit signed integers. We also need to encode x0=125x_0 = 125 as a 9-bit signed integer so that we can use it for decoding to the original signal. Therefore, the total size of the compressed LPC coding is 9+86=639 + 8*6 = 63 bits.

In more sophisticated uses of LPC coding, the signal will be much longer than 8 bits, so the savings are more dramatic. In practice, the prediction equation will have more parameters that will be estimated from the signal itself. This results in better predictions/small residuals at the expense of storing the prediction parameters in the compressed format.

Frame Encoding

What happens if the LPC predictions are bad and the residuals aren't smaller than the original samples? To deal with this situation, the signal needs to be decomposed into frames that are independently compressed. This means that the effects of bad predictions are isolated to particular frames. There are four different frame types that can be used for compression. The best type is selected for each frame.

  • Constant Frame: Used when all samples have the same value. This is used when there is a period of complete silence where all samples have value zero.

  • Verbatim Frame: Used when no compression is possible, and the data is stored as-is. This is used when the samples are completely random.

  • Fixed Frame: Uses predefined coefficients for low-order prediction. This is akin to the prediction equation xˆk+1=xk\^{x}_{k+1} = x_{k} used above, although the predefined coefficients are different in practice.

  • Custom Frame: Uses a custom predictive model for higher-order predictions. This is used when the samples follow a pattern that is not modeled nicely by the predefined coefficients.

FLAC Overview

FLAC (Free Lossless Audio Codec) is a lossless compression format for compressing audio. It is largely based on the LPC algorithm. FLAC's compression efficiency depends on the audio content. Simple signals (like silence or pure tones) can be compressed to 50-70% of their original size, while more complex audio (like live music or noise-heavy recordings) typically compresses to 30-50%. Because LPC compresses data into self-contained frames, each frame can be streamed independently of the others, making FLAC good for real-time audio applications.

Copyright © 2024 Audio Internals