Digital Audio Resolution (April 2018)

In a recording studio, music starts off as continuous variations in sound pressure level (SPL) and is converted by microphones into electronic signals. Like the variations in SPL, the electronic signals are analogue in nature, varying continuously over time; they retain their analogue nature when ultimately captured on magnetic tape.

On the other hand, signals that have been digitized in the production of commercial CDs or in converting analogue recordings to digital are discrete (i.e., non-continuous) over time. To convert analogue (continuous) signals to digital (discrete), the analogue electronic signal is sampled at regular time intervals, and the strength of the signal when sampled is converted to a numeric value that comes as close to the analogue signal strength as the analogue-to-digital conversion process will allow. How close it comes depends on two factors: bit depth and sampling rate. Let’s explore what those mean.

Bit Depth

Figure 1 depicts one complete cycle of a single-frequency, sinusoidal sound wave, with strength, or amplitude, represented on the vertical axis and time on the horizontal axis. This will be our starting point.

Figure 1. One complete cycle of a single-frequency sound wave.

Let’s say we sample this wave using a bit depth of 4. (A bit is a binary digit, meaning the digit can take on one of two values—0 or 1.) “Sampling with a bit depth of 4” means that every time an analogue wave is sampled, the sample value will be one of 2 x 2 x 2 x 2 = 16 possible values, ranging from 0 to 15. The results appear in Figure 2, with sample values of the amplitude on the vertical axis and time on the horizontal axis. The graph looks like staircases where the digitized values roughly approximate the sound wave’s curve. (You would not hear a squared-off, staircase sound during playback; your Digital-to-Analogue Converter would recreate continuous waves from the samples.)

Figure 2. One complete cycle of a sound wave sampled with a bit depth of 4.

Can digitization do a better job? Let’s try ratcheting the bit depth up to 6, which yields 64 values (2 to the 6th power). As portrayed in Figure 3, the “staircase” shape is still observable, but the digitized values are closer to the analogue wave’s amplitude.

Figure 3. One complete cycle of a sound wave sampled with a bit depth of 6.

Finally, let’s increase the bit depth to 16, which happens to be the published standard for music CDs. Two to the 16th power permits 65,636 different values, and the results appear in Figure 4 (there are 500 possible values between each of the horizontal lines in the graph). As can be seen, the digital samples are exceedingly close to the sound wave’s values.

Figure 4. One complete cycle of a sound wave sampled with a bit depth of 16.

Sampling Rate

The second factor that determines how well digitizing can represent the sound is the sampling rate. Let’s say that the sound wave being sampled is 440 Hz (440 wave cycles per second, “concert A”). If the music is sampled 44,000 times per second, then a single 440 Hz wave cycle is sampled 100 times per second (Figure 5).

Figure 5. 440 Hz sound wave sampled at 44 KHz.

While the sampling rate is a fixed value, the number of times an individual sound wave cycle is sampled depends on the sound wave’s frequency. To illustrate, Figure 6 shows what 44 KHz sampling looks like with a 4,400 Hz wave (C# above the highest C on a piano). That wave would be sampled 10 times per cycle.

Figure 6. 4,400 Hz sound wave sampled at 44 KHz.

One’s initial reaction to Figure 6 might be that digitizing at 10 samples per wave cycle or less must be inadequate to capture the music fully. However, you only need two samples! The “cardinal theorem of interpolation,” known commonly as the Nyquist-Shannon sampling theorem (or just the Nyquist theorem, or some other combination of the names of Harry Nyquist, Claude Shannon, E.T. Whittaker, and Vladimir Kotelnikov, who each independently derived the theorem) states:

If a function x(t) contains no frequencies higher than B Hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.

In other words, you need at least two samples per cycle of the highest frequency present.

The frequency range of hearing for young people with superb hearing is 20 Hz – 22 KHz, so you’d need a sampling rate of 2 x 22 KHz = 44 KHz to capture every frequency within that range. One of the theorem’s stipulations is that there can be no waves present that have a higher frequency than what you want to capture. Otherwise, you get fragments of the very high frequencies in the sound, an artifact known as aliasing. To minimize these artifacts, equalization is applied to the analogue signal that severely attenuates any frequencies above 22 KHz. As an additional precaution, the sampling rate is nudged up a little higher than the theorem requires. The published sampling rate standard for music CDs is 44.1 KHz.

Bit Rate

While you may never see bit depths and sampling rates associated with your music file, you may run into a measure derived from these two figures: bit rate. This can appear in your music player software, audio downloads, streaming services, or applications like YouTube, and its relevance to MP3 files was discussed in the article Types of Audio File Formats. It represents the number of binary digits that need to be processed per second when playing a digital music file, and it’s simply the bit depth multiplied by the sampling rate and the number of channels. To illustrate, the bit rate for a stereo (two-channel) audio file with the music CD standard bit depth of 16 and sampling rate of 44.1 KHz is

2 stereo channels x 16 bits/sample x 44,100 samples/second = 1,411,200 bits/second

≈ 1,411 Kbits/second (Kbps)

High-resolution audio is created by using a bit depth greater than 16 and a sampling rate greater than 44.1 KHz. This is explored in the article High-Resolution Audio.

Back to Paul’s Blog & Contents