Increasing the Number of Subbands

Vicente González Ruiz

March 23, 2025

Contents

1 The dyadic frequency partition is not enough
2 Linear decomposition using Wavelet Packet Transform
3 Linear decomposition of the dyadic subbands
4 Deliverables
5 Resources

1 The dyadic frequency partition is not enough

In general, the number of subbands provided by the dyadic wavelet domain [23] (remember, if \(l\) is the number of levels of the DWT, we obtain \(l+1\) subbands) is insufficient to accurately represent the diverse auditory thresholds present in a typical hearing threshold curve [1].

To address this issue two different algorithms can be used: (1) change the dyadic decomposition by a linear decomposition, and (2) decompose the dyadic subbands that we already have in smaller subbands.

Obviously, once we have created the frequency subbands, it would be a matter of determining the corresponding quantization step sizes (QSS) based on the ToH curve. We need one QSS per subband.

2 Linear decomposition using Wavelet Packet Transform

As an alternative to the Discrete Wavelet Transform (DWT), the Wavelet Packet Transform (WPT) allows for a linear decomposition of the signal’s frequency range. This is essentially achieved by recursively applying wavelet filters to both the low-frequency and high-frequency subbands (see Milestone Transform Coding for Redundancy Removal). Consequently, if \(l\) represents the number of levels, then a total of \(2^l\) subbands are generated. For instance, with \(l=6\), the DWT yields only \(7\) subbands, whereas the WPT produces \(64\) subbands.

In more detail, this is what would need to be implemented:

  1. Extend the chunk (this can be inherited from the class Temporal_Overlapped_DWT).
  2. Compute the WPT of the extended chunk (an example of this (but notice that for a non-extended chunk) can be found in the class Linear_ToH_NO).
  3. As we did with the DWT in Temporal_Overlapped_DWT, extract the central parts of each subband to obtain a total number of frame-coefficients that matches to the number of frames in the chunk.

3 Linear decomposition of the dyadic subbands

Another solution (more close to the Bark scale) is to divide each one of the dyadic subbands into a number of subbands. Thus, if we have \(l+1\) dyadic subbands and now we decompose each subband into \(n\) (sub)subbands, we get a total number of \(n(l+1)\) subbands.

For this, we can use (again) the WPT applied to each dyadic subband generated by the DWT of the extended chunk. The idea here is to:

  1. Extend the chunk (this can be inherited from the class Temporal_Overlapped_DWT).
  2. Compute the dyadic DWT transform of the extended chunk (this can be also reused from the class Temporal_Overlapped_DWT). Let \(l_{\text {DWT}}\) be the number of levels of this transform.
  3. Compute the WPT of each dyadic (extended) subband. Let \(l_{\text {WPT}}=\log _2(n)\) be the number of levels of this transform.
  4. Extract the central parts of each packet-subband to obtain a total number of frame-coefficients that matches to the number of frames in the chunk.

Notice that WPT performs a lineal decomposition. Therefore, for example, if the sampling frequency is \(48000\) Hz, \(l_{\text {DWT}}=3\) and \(l_{\text {WPT}}=1\) the lowest frequency dyadic subband goes from \(0\) Hz to \(3000=\frac {24000}{2^3}\) Hz, and it will de divided into two subbands with a size (bandwidth) of \(1500\) Hz.

4 Deliverables

  1. Implement a Python module called linear_ToH.py where the functionality described in Section 2 has been implemented. Notice that the current implementation (linear_ToH_no_overlapped.py) does not overlap de chunks. Mark: 10 points.
  2. Implement a Python module called dyadic_linear_ToH.py where the functionality described in Section 3 has been implemented. Mark: 10 points.

5 Resources

[1]   M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2]   K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[3]   M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.