Simultaneous Masking in the Frequency Domain

Vicente González Ruiz & Savins Puertas Martín & Marcos Lupión Lorente

March 23, 2025

1 Frequencial Masking

The HAS (Human Auditory System) has a finite frequency resolution, which basically means that weaker audio signal (maskee) becomes inaudible in the presence of (is masked by) a louder audio signal (masker), when they are close enough [1], in the frequency domain (and obviously in time, i.e., in the same chunk). When this happens, the subband [2] in which the maskee signal is placed can be quantized more severely without perceiving the quantization noise in the maskee subband (see Figure 1).

Figure 1: An example of simultaneous masking generated by a tonal sound of 1 kHz. In the vecinity of the tone the ToH has been increased.

2 A dynamic computation of the Quantization Step Sizes

  1. Given a decomposition of a chunk \({\mathbf W}=\{{\mathbf w}_s\}\), determine the energy \(\{E({\mathbf w}_s)\}\) of each subband. For this is a good idea to have the same bandwidth in all the subbands.
  2. Find the subband with the highest energy:

    \begin{equation} {\mathbf w}_m = \underset {{\mathbf w}_i \in {\mathbf W}}{\operatorname {arg\,max}}~E({\mathbf w}_m) := \{{\mathbf w}_* \in {\mathbf W} ~:~ E({\mathbf w}_i) \leq E({\mathbf w}_m) \text { for all } {\mathbf w}_i \in {\mathbf W} \}. \end{equation}
  3. Being \({\mathbf \Delta }_m\) the current QSS of the subband \({\mathbf w}_m\), compute the set of optimal1 QSSs as

    \begin{equation} {\mathbf \Delta }^* := \{\cdots ,3{\mathbf \Delta }_{m-2},2{\mathbf \Delta }_{m-1},{\mathbf \Delta }_m,2{\mathbf \Delta }_{m+1},3{\mathbf \Delta }_{m+2}, \cdots \}. \end{equation}

3 Deliverables

Implement the algorithm described in Section 2 in a module named simultaneous_masking.py to be used in the InterCom.

Mark: 10 points.

4 Resources

[1]   M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2]   M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.

1From a perceptual perspective.