Perceptual Coding Considering the Threshold of Hearing

Vicente González Ruiz & Savins Puertas Martín & Marcos Lupión Lorente

March 23, 2025

1 A model of the Threshold of (Human) Hearing
2 Dyadic DWT subbands and quantization steps
3 Deliverables
4 Resources

1 A model of the Threshold of (Human) Hearing

Psychoacoustics (see the sound, the human auditory system, and the human sound perception) has determined that the HAS (Human Auditory System) has a sensitivity that depends on the frequency of the sound, the so called ToH ( Threshold of (Human) Hearing). This basically means that some subbands (intervals of frequencies) can be quantized with a larger quantization step than others without a noticeable increase (from a perceptual perspective) of the quantization noise [2].

Figure 1: A model for the threshold of human hearing.

A good approximation of ToH for a 20-year-old person can be obtained with [1]

\begin{equation} T(f)\text {[dB]} = 3.64(f\text {[kHz]})^{-0.8} - 6.5e^{f\text {[kHz]}-3.3)^2} + 10^{-3}(f\text {[kHz]})^4. \label {eq:ToHH} \end{equation}

This equation has been plotted in Fig. 1.

2 Dyadic DWT subbands and quantization steps

The number of dyadic DWT subbands

\begin{equation} N_{\text {sb}} = N_{\text {levels}} + 1 \end{equation}

where \(N_{\text {levels}}\) is the number of levels of the dyadic DWT [3]. Except for the \({\mathbf l}^{N_{\text {levels}}}\) subband (the lowest-pass frequency of the decomposition), it holds that

\begin{equation} W({\mathbf w}_s) = \frac {1}{2}W({\mathbf w}_{s-1}), \end{equation}

being \(W(\cdot )\) the bandwidth of the corresponding subband. Therefore, considering that (by default, in InterCom) the bandwidth of the audio signal is \(22050\) Hz, the bandwidth \(W({\mathbf w}_1)=11025\) Hz, \(W({\mathbf w}_2)=22025/4\), etc. It also holds that

\begin{equation} W({\mathbf l}^{N_{\text {levels}}}) = W({\mathbf w}^{N_{\text {levels}}}). \end{equation}

The idea is to decide, knowing the frequencies represented in each DWT subband and the ToH curve (see InterCom: a Real-Time Digital Audio Full-Duplex Transmitter/Receiver), the QSS (Quantization Step Size) that should be applied to each subband.

This idea is already implemented in a module named dyadic_ToH.py.

3 Deliverables

Subjectively compare the audio quality obtained by dyadic_ToH.py and its predecessor, temporal_overlapped_DWT_coding.py. Subjectively means that, in groups, you must determine, for the same bit-rate and audio-content configuration, which implementation sounds better.

Mark: 1 point.

4 Resources

[1] M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[3] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.