Simultaneous Masking in the Frequency Domain

Vicente González Ruiz & Savins Puertas Martín & Marcos Lupión Lorente

November 18, 2024

1 Frequencial Masking

The HAS (Human Auditory System) has a finite frequency resolution, which basically means that weaker audio signal (maskee) becomes inaudible in the presence of (is masked by) a louder audio (masker), when they are closed enough[1], in the frequency domain (and obviously in time). When this happens, the subband [2] in which the maskee signal is placed can be quantized more severely without perceiving that the quantization noise in such subband (see Figure 1).

Figure 1: An example of simultaneous masking generated by a tonal sound of 1 kHz. In the vecinity of the tone the ToH has been increased.

2 A dynamic computation of the Quantization Step Sizes

  1. Given a decomposition of a chunk \({\mathbf W}=\{{\mathbf w}_s\}\), determine the energy \(\{E({\mathbf w}_s)\}\) of each subband. For this is a good idea to have the same bandwidth in all the subbands.
  2. Find the subband with the highest energy: \begin {equation} {\mathbf w}_m = \underset {{\mathbf w}_i \in {\mathbf W}}{\operatorname {arg\,max}}~E({\mathbf w}_m) := \{{\mathbf w}_* \in {\mathbf W} ~:~ E({\mathbf w}_i) \leq E({\mathbf w}_m) \text { for all } {\mathbf w}_i \in {\mathbf W} \}. \end {equation}
  3. Being \({\mathbf \Delta }_m\) the current QSS of the subband \({\mathbf w}_m\), compute the set of optimal1 QSSs as \begin {equation} {\mathbf \Delta }^* := \{\cdots ,3{\mathbf \Delta }_{x-2},2{\mathbf \Delta }_{x-1},{\mathbf \Delta }_x,2{\mathbf \Delta }_{x+1},3{\mathbf \Delta }_{x+2}, \cdots \}. \end {equation}

3 Deliverables

Implement the algorithm described in Section 2 in a module named simultaneous_masking.py. You should extend the classes defined in advanced_ToH.py.

4 Resources

[1]   M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2]   M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.

1From a perceptual perspective.