The Nyquist Frequency
Aliasing
Sampling is used to convert from a continuous analog signal to a discrete digital signal. Intuitively, if the sample rate is too small, then the digital signal will not be a good representation of the analog signal. If a sound wave is oscillating at 10,000 Hz, then sampling at 5,000 Hz would be insufficient since the samples would completely miss half the cycles of the analog signal. With a sample rate of 5,000 Hz, it is impossible to record frequencies greater than or equal to 5,000 Hz. Those frequencies will appear as noise in the audio recording.
However, there is another issue that occurs when sampling sound frequencies that are smaller than the sample rate. Consider the waves shown in Module 1. The blue wave has frequency 4.5 Hz, the red wave has frequency 1.5 Hz, and the sample rate is 6.0 Hz. The samples line up perfectly where the blue and red waves intersect. This means that both waves appear the same after sampling. This phenomenon is known as aliasing.
- Image
- Code
Loading...
The practical effect of aliasing is that high frequencies come through as low frequencies in the sampled signal. These aliased frequencies essentially become noise in the audio. The sample rate beyond which aliasing occurs is called the Nyquist frequency, which is equal to half of the sample rate. It is also called the folding frequency since the higher frequencies "fold" into the lower frequencies as noise. In Module 1, the Nyquist frequency is 3.0 Hz, half of the sample rate 6.0 Hz.
One interesting property of aliasing is that the specific frequencies that are aliases of each other are symmetric about the Nyquist frequency. For the waves in Module 1, the frequencies of 1.5 Hz and 4.5 Hz are symmetric about the Nyquist frequency 3.0 Hz.
Negative Frequencies
Two waves with the same frequency may or may not alias each other depending on the direction that the wave is traveling. The "direction" a wave is travelling is based on the sign of the amplitude, positive or negative. It turns out that a wave with a negative amplitude is the sample as a wave with a positive amplitude and a negative frequency because of the following formula.
Module 2 shows a negative-frequency wave aliases to multiple positive-frequency waves.
- Image
- Code
Loading...
Intuition Behind the Nyquist Frequency
Let's say we have an analog clock with an "hour" hand that is pointing at 12 o'clock, which we will call 0 for simplicity. In one rotation around the clock (in 12 hours), we "sample" once per hour. In one period, we collect 12 samples.
Now let's say there are two periodic events that occur: eating every 5 hours and sleeping every 17 hours.
The eating times and sleeping times are exactly the same. Eating and sleeping are aliases of each other. Let's add another periodic event, gaming, that occurs every 7 hours.
The samples are the same, but in reverse. If we had sampled traveling backwards, the samples would have been the same. When traveling backwards, the wave can be described as having a negative frequency.
Under what conditions does aliasing occur? Our three aliases so far have frequncies of -7 Hz, 5 Hz, and 17 Hz. Two frequencies and are aliases of each other if both of the following equations are true.
In this equation, "mod" refers to modulus, which means the remainder after dividing. It is sometimes represented as an operator with the percent symbol %. All pairs of frequencies so far follow this property.
This means that the negative frequencies of -6, -7, -8, -9, -10, and -11 alias to the positive frequencies of 6, 5, 4, 3, 2, and 1. Therefore, the set of frequencies that can be unambiguously captured are 0, 1, 2, 3, 4, and 5. Only the frequencies less than the Nyquist frequency can be unambiguously captured.
The Nyquist-Shannon Sampling Theorem
The Nyquist-Shannon sampling theorem states the following:
If a function contains no frequencies higher than hertz, then it can be completely determined from its ordinates at a sequence of points spaced less than seconds apart.
The Nyquist frequency is a direct consequence of this theorem. The proof of the theorem in the general case relies on the fact that can be decomposed into a summation of sines and cosines involving complex numbers , where . Another way to describe the theorem is that it defines the number of samples required to interpolate a sum of sine and cosine waves of different frequencies.
Choosing a Sample Rate
There are two major audio distortions that occur as the sample rate decreases. One distortion is that higher frequencies are no longer present, i.e. the audio becomes lower-pitched overall. The other distortion is that those higher frequencies are "folded" into the lower frequencies, resulting in more noise. Module 2 shows the effects of decreasing the sample rate on audio from a song.
- Audio
The typical human can hear sound frequencies between 20 Hz and 20,000 Hz. (The upper limit is probably closer to 18,000 Hz in reality, but "20 to 20" is easier to remember.) Since most modern microphones have a sample rate of 48,000 Hz, the Nyquist frequency is 24,000 Hz, which is above the maximum 20,000 Hz that humans can hear.
However, for real-world audio applications the sample rate is chosen to cut corners as much as possible without affecting the user experience. Processing audio can be computationally expensive, especially when using machine learning in real-time applications. The following table shows various applications and the sample rates used for each one.
Application | Sample Rate | Reason |
---|---|---|
Landline Phone Calls | 8,000 Hz | Human voice almost never exceeds 4,000 Hz |
Voice Activity Detection | 8,000 Hz | Accuracy doesn't improve with higher sample rates |
Speech-to-Text Transcription | 16,000 Hz | Accuracy doesn't improve with higher sample rates |
Bose QC 45 Microphones | 16,000 Hz | Bluetooth effectively limits the bit rate of microphone audio |
Compact Disc Audio | 44,100 Hz | PAL and NTSC video were used to store audio |
Most Modern Microphones | 48,000 Hz | Allow resampling to lower rates for various reasons |