In general, the number of subbands provided by the dyadic wavelet domain [2, 3] (remember, if \(l\) is the number of levels of the DWT, we obtain \(l+1\) subbands) is insufficient to accurately represent the diverse auditory thresholds present in a typical hearing threshold curve [1].
To address this issue two different algorithms can be used: (1) change the dyadic decomposition by a linear decomposition, and (2) decompose the dyadic subbands that we already have in smaller subbands.
Obviously, once we have created the frequency subbands, it would be a matter of determining the corresponding quantization step sizes (QSS) based on the ToH curve. We need one QSS per subband.
As an alternative to the Discrete Wavelet Transform (DWT), the Wavelet Packet Transform (WPT) allows for a linear decomposition of the signal’s frequency range. This is essentially achieved by recursively applying wavelet filters to both the low-frequency and high-frequency subbands (see Milestone Transform Coding for Redundancy Removal). Consequently, if \(l\) represents the number of levels, then a total of \(2^l\) subbands are generated. For instance, with \(l=6\), the DWT yields only \(7\) subbands, whereas the WPT produces \(64\) subbands.
In more detail, this is what would need to be implemented:
Temporal_Overlapped_DWT
).
Linear_ToH_NO
).
Temporal_Overlapped_DWT
, extract the
central parts of each subband to obtain a total number of frame-coefficients
that matches to the number of frames in the chunk.
Another solution (more close to the Bark scale) is to divide each one of the dyadic subbands into a number of subbands. Thus, if we have \(l+1\) dyadic subbands and now we decompose each subband into \(n\) (sub)subbands, we get a total number of \(n(l+1)\) subbands.
For this, we can use (again) the WPT applied to each dyadic subband generated by the DWT of the extended chunk. The idea here is to:
Temporal_Overlapped_DWT
).
Temporal_Overlapped_DWT
). Let \(l_{\text {DWT}}\) be the
number of levels of this transform.
Notice that WPT performs a lineal decomposition. Therefore, for example, if the sampling frequency is \(48000\) Hz, \(l_{\text {DWT}}=3\) and \(l_{\text {WPT}}=1\) the lowest frequency dyadic subband goes from \(0\) Hz to \(3000=\frac {24000}{2^3}\) Hz, and it will de divided into two subbands with a size (bandwidth) of \(1500\) Hz.
linear_ToH.py
where the
functionality described in Section 2 has been implemented. Notice that
the current implementation (linear_ToH_no_overlapped.py) does not
overlap de chunks. Mark: 10 points.
dyadic_linear_ToH.py
where the
functionality described in Section 3 has been implemented. Mark: 10
points.
[1] M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.
[2] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.
[3] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.