Bit-Rate Control

Vicente González Ruiz & Savins Puertas Martín & Marcos Lupión Lorente

December 2, 2024

Contents

 1 Description
  1.1 Impact of the network throughput
  1.2 Compressing the audio data with zlib
  1.3 Quantization
  1.4 (Bit-)Rate control and distortion
  1.5 The current implementation(s) for the control of the bit-rate
 2 Deliverables
 3 Resources

1 Description

1.1 Impact of the network throughput

Along with the latency and its variation (jitter), another main aspect to consider about the transmission link used in an InterCom session is the link throughput1 that it can provide [26]. This bit-rate depends on the maximum capacity (a characteristic closely related with the available bandwidth) and the congestion level (that basically depends on the load) of the link. In general, we can suppose that the capacity is constant over time (the bandwidth provided by the link does not vary with time). On the contrary, the throughput is time-varying and quite unpredictable, because it depends on the congestion level that, in turn, depends on the behavior of the network users.

In this milestone, we will measure the impact of the link throughput on the QoE provided by the current implementation of InterCom (echo_cancellation.py). Similarly to the procedure used to measure the impact of latency and jitter, we will use tc [3] to control the amount of data2 that an InterCom instance will be allowed to send in a local environment, with the aim of simulating a real environment.3

1.2 Compressing the audio data with zlib

To reduce the bit-rate, we need some way of compressing the data, an action that also will reduce the data-throughput in InterCom. The pack() and unpack() methods can compress and decompress, respectively, the chunks that are processed. To compress and decompress, we will use a free data codec named DEFLATE, which is based on LZSS and Huffman Coding [4] (see this notebook and this notebook). The DEFLATE algorithm is implemented in the Python’s standard library zlib.

In order to compare the performance of different alternatives, the above methods are implemented in the following modules, with different functionality:

  1. DEFLATE_raw.py: Compress the raw chunks with DEFLATE.
  2. DEFLATE_serial.py: Compress the chunk after concatenating the channels (see Fig. 1). Note that with this data-shuffling, the samples are not interleaved and the correlation between consecutive samples is slightly increased. This should also increase the (data) compression ratio.

    Figure 1: Sample reordering to create two independent channels.
  3. DEFLATE_serial_reset.py: Similar to compress_serial.py, but reseting DEFLATE at each new chunk-channel, i.e., compressing each chunk-channel independtly. The idea here is to see if DEFLATE is exploiting the redundancy between the consecutive channels.
  4. DEFLATE_byteplanes2.py: Similar to compress_serial.py (samples are de-interleaved), but 2 code-streams are generated, one for the LSB (Low Significant Byte) plane and another for the MSB (Most Significant Byte) plane, working with 16 bits/sample. The idea here is to see if the MSB can be compressed more efficiently because it can contain runs of zeros, especially when the audio sequence is quiet.
  5. DEFLATE_byteplanes3.py: Similar to DEFLATE_byteplanes2.py but considering three byte-planes. This would allow compression of coefficients4 that requires more than two bytes to be represented.
  6. DEFLATE_byteplanes4.py: Consider four-byte planes.
  7. DEFLATE_byteplanes2_interlaced.py: Similar to DEFLATE_byteplanes2.py but using the raw chunks (without concatenating the channels).

Finally, notice that the number of UDP packets sent (which now will be variable in length) remains constant.

1.3 Quantization

At the hardware level, audio samples are usually represented using Pulse Code Modulation (PCM). In a PCM sample, the number of levels the signal can take depends on the number of bits/sample (16 bits in our case).

Another key aspect to consider is that the processing that the Human Auditory System (HAS) performs to understand audio signals has several sources of perceptual redundancy. One of these sources is the finite number of different volumen levels that a human being can recognize [1]. In this milestone, we will profit from such fact to decrease the transmission bit-rate by reducing quality. In most lossy compression schemes, quantization is the main source of distortion [7].

Scalar Quantization (SQ) is the process of decreasing the number of discrete levels that a signal can take [5]. Vector Quantization (VQ) is similar, but it is applied to tuples of samples at the same time [8]. SQ is used when the samples are decorrelated or, although correlated, decorrelation will be exploited in a posterior entropy coding stage (which in our case is DEFLATE), because the coding efficiency provided by VQ is marginal in this context [8], and generally requires higher computational resources.

Quantizers can also be classified into uniform and non-uniform [58]. An uniform quantizer distributes the available representation levels uniformely over the range of input values. Non-uniform quantizers use higher density of representation levels (more output levels per input different values) to those intervals of input values that occur more often.5 Quantizers can also be classified into static and adaptive quantizers. In the first case, the distribution of the representation levels remains constant during the quantization stage, and in the second case, the quantizer parameters are adapted dynamically to the characteristics of the input signal. In this milestone we use an uniform dead-zone scalar static quantizer, which can be implemented efficiently (in software) for digital signals. Moreover, dead-zone quantizers tend to produce more quantization indices equal to 0 (which increases compression ratios) at the cost of generating more quantization noise for values of the input signal close to 0, or what is the same, decreasing the SNR for small signal values. A priori, this could be seen as a problem, but in reality it is not because precisely when the amplitude of the signal is small and the noise is independent of its amplitude (which usually happens with electronic noise), the SNR of the input signal has its lowest value precisely for those values close to 0. Therefore, the quantizer will basically change electronic noise by quantization noise6 (see this introduction to signal quantization document and this comparative between digital scalar quantizers document). Finally, although this is a feature that we are not going to exploit for now, dead-zone quantizers are equivalent to encode the signal by bit-planes when the quantization steps sizes are powers of two, allowing the design of progressive entropy encoding schemes, if required.

1.4 (Bit-)Rate control and distortion

The number of representation levels used by a quantizer depends on the quantization step (size), typically denoted by \(\Delta \). The higher the \(\Delta \), the smaller the number of representation levels, and therefore the higher the distortion generated by the quantization error, and the smaller the output bit-rate! This generates a rate/distortion trade-off that is descriptive of all lossy compressors (more bits, less distortion, and viceversa).

In order to minimize the lost of data, the rate can be controlled in real-time transmission systems by modifiying \(\Delta \) when congestion occurs. However, notice that depending of the entropy coding stage and the characteristics of the signal (variance, entropy, etc.) may not exist a clear relationship between \(\Delta \) and the output bit-rate. This happens using DEFLATE.

Notice also that any rate control algorithm based on quantization has a characteristic RD (Rate/Distortion) curve, in which the X axis represents the (in the case of InterCom, received) (bit-)rate, and the Y axis the distortion in the reconstruction (in the case of InterCom, the played audio sequence) obtained after the “de-quantization”7. Some examples can be found in this notebook.

1.5 The current implementation(s) for the control of the bit-rate

Bit-Rate (BR) control through quantization has been implemented in the class BR_Control* of the modules BR_control*.py. This class overrides the inherited methods pack() and unpack(), performing now (remember that the chunks are already “DEFLATE-encoded and -decoded”):

  def pack(chunk_number, chunk): 
   quantized_chunk = quantize(chunk)  # (1) 
   packed_chunk = Buffering.pack(chunk_number, quantized_chunk)  # (2) 
   return packed_chunk  # (3)
  def unpack(packed_chunk): 
   (chunk_number, quantized_chunk) = Buffering.unpack(packed_chunk)  # (1) 
   chunk = dequantize(quantized_chunk)  # (2) 
   return (chunk_number, chunk)  # (3)

Notice that, regarding the bit-rate control, you will find four implementations related to this milestone:

  1. BR_control_no.py: Uses a constant \(\Delta >0\).8 There is not BR control.
  2. BR_control_add_lost.py: Every second runs: \begin {equation} \left \{ \begin {array}{ll} \Delta = \Delta + L - 1 & \quad \text {always}, \\ \Delta = \Delta _{\text {min}} & \quad \text {if}~\Delta < \Delta _{\text {min}}. \end {array} \right . \end {equation} where \(L\) is the number of lost (received) chunks in the previous second. Notice that this heuristic (and the following ones) supposes that the interlocutor is losing (on average) the same number of chunks that we.
  3. BR_control_lost.py: Every second runs: \begin {equation} \left \{ \begin {array}{ll} \Delta = L - 1 & \quad \text {always}, \\ \Delta = \Delta _{\text {min}} & \quad \text {if}~\Delta < \Delta _{\text {min}}. \end {array} \right . \end {equation}
  4. BR_control_conservative.py: Every second runs: \begin {equation} \left \{ \begin {array}{ll} \Delta = 2\Delta & \quad \text {if}~L>2, \\ \Delta = \frac {10}{11}\Delta & \quad \text {always}, \\ \Delta = \Delta _{\text {min}} & \quad \text {if}~\Delta < \Delta _{\text {min}}. \end {array} \right . \end {equation}

2 Deliverables

Remember to report both, the experiments and the results.

  1. Which data-ordering performs better?:

    Determine empirically which ordering of the chunk data is the most efficient from a lossless data compression point of view (the smaller the bit-rates, the higher the compression). Use the audio sequence you want.9 Notice that using different audio files you could obtain different results.

  2. Which BR control algorithm shows the best RD (Rate/Distortion) curve?:

    Considering the RMSE (Root Mean Square Error) as a distortion measure between the sent and the received audio signal, for a concrete audio sequence, generate the RD curve considering a set of different simulated transmission environments (use tc or a any other similar tool).

    Remember that the X-axis must express bit-rate and the Y-axis, distortion.

3 Resources

[1]   M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2]   Behrouz Forouzan. Introduction to Data Communications and Networking. McGraw-Hill, 2007.

[3]   Bert Hubert, Thomas Graf, Greg Maxwell, Remco van Mook, Martijn van Oosterhout, Paul B. Schroeder, Jasper Spaans, and Pedro Larroy. Linux Advanced Routing & Traffic Control. Publisher: Bert Humbert et al., 2012.

[4]   Nelson M. and Gailly J. The Data Compression Book. M&T Books, 1996.

[5]   K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[6]   Andrew S. Tanenbaum. Computer Networks. Prentice Hall, 2011.

[7]   D.S. Taubman and W.M. Marcellin. JPEG2000. Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, 2002.

[8]   M. Vetterli, J. Kovačević, and V.K. Goyal. Foundations of Signal Processing. Cambridge University Press, 2014.

1Measured in bits per second or a \(10\)-multiple of this transmission capacity.

2Notice that this upper bound in the bit-rate will also affect to the loss of chunks because if the link capacity is smaller than the audio bit-rate (throughput), sooner or later the transmission link will discard those chunks that cannot be buffered in the retransmission nodes (routers and switches). In this case, we would be at least contributing, if not causing, the link congestion.

3If tc (or a similar tool) is not avaiable in your OS, you can use a real transmission environment, but you must take into consideration that you will need to control de bit-rate in order obtain the points of the RD curve.

4Used in a future improvements of intercom.

5The decision intervals and the representation levels in each interval can be also optimized using other criteria, such as, minimizing the rate/distortion at a given point of the RD curve.

6The error generated by the quantization stage.

7From a signal processing point of view, the term “de-quantization” refers to restore the original dynamic range of the signal, but notice that this does not imply that the original signal will be restored. This only happens when \(\Delta =1\).

8\(\Delta \) must be always bigger than \(0\), by definition, and this does not depend on the bit-rate control.

9Some samples are stored in the data directory of InterCom.