Psychoacoustics (see the sound, the human auditory system, and the human sound perception) has determined that the HAS (Human Auditory System) has a sensitivity that depends on the frequency of the sound, the so called ToH ( Threshold of (Human) Hearing). This basically means that some subbands (intervals of frequencies) can be quantized with a larger quantization step than others without a noticeable increase (from a perceptual perspective) of the quantization noise [2].
A good approximation of ToH for a 20-year-old person can be obtained with [1]
This equation has been plotted in Fig. 1.
The number of dyadic DWT subbands
where \(N_{\text {levels}}\) is the number of levels of the dyadic DWT [3]. Except for the \({\mathbf l}^{N_{\text {levels}}}\) subband (the lowest-pass frequency of the decomposition), it holds that
being \(W(\cdot )\) the bandwidth of the corresponding subband. Therefore, considering that (by default, in InterCom) the bandwidth of the audio signal is \(22050\) Hz, the bandwidth \(W({\mathbf w}_1)=11025\) Hz, \(W({\mathbf w}_2)=22025/4\), etc. It also holds that
The idea is to decide, knowing the frequencies represented in each DWT subband and the ToH curve (see InterCom: a Real-Time Digital Audio Full-Duplex Transmitter/Receiver), the QSS (Quantization Step Size) that should be applied to each subband.
This idea is already implemented in a module named dyadic_ToH.py
.
Subjectively compare the audio quality obtained by dyadic_ToH.py
and its
predecessor, temporal_overlapped_DWT_coding.py
. Subjectively means that, in
groups, you must determine, for the same bit-rate and audio-content configuration,
which implementation sounds better.
Mark: 1 point.
[1] M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.
[2] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.
[3] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.