Bit-Rate Control

Along with the latency and its variation (jitter), another main aspect to consider about the transmission link used in an InterCom session is the link throughput¹ that it can provide [2, 6]. This bit-rate depends on the maximum capacity (a characteristic closely related with the available bandwidth) and the congestion level (that basically depends on the load) of the link. In general, we can suppose that the capacity is constant over time (the bandwidth provided by the link does not vary with time). On the contrary, the throughput is time-varying and quite unpredictable, because it depends on the congestion level that, in turn, depends on the behavior of the network users.

In this milestone, we will measure the impact of the link throughput on the QoE provided by the current implementation of InterCom ( echo_cancellation.py). Similarly to the procedure used to measure the impact of latency and jitter, we will use tc [3] to control the amount of data² that an InterCom instance will be allowed to send in a local environment, with the aim of simulating a real environment.³

1.2 Compressing the audio data with zlib

To reduce the bit-rate, we need some way of compressing the data, an action that also will reduce the data-throughput in InterCom. The pack() and unpack() methods can compress and decompress, respectively, the chunks that are processed. To compress and decompress, we will use a free data codec named DEFLATE, which is based on LZSS and Huffman Coding [4] (see this notebook and this notebook). The DEFLATE algorithm is implemented in the Python’s standard library zlib.

In order to compare the performance of different alternatives, the above methods are implemented in the following modules, with different functionality:

Finally, notice that the number of UDP packets sent (which now will be variable in length) remains constant.

1.3 Quantization

At the hardware level, audio samples are usually represented using Pulse Code Modulation (PCM). In a PCM sample, the number of levels the signal can take depends on the number of bits/sample (16 bits in our case).

Scalar Quantization (SQ) is the process of decreasing the number of discrete levels that a signal can take [5]. Vector Quantization (VQ) is similar, but it is applied to tuples of samples at the same time [8]. SQ is used when the samples are decorrelated or, although correlated, decorrelation will be exploited in a posterior entropy coding stage (which in our case is DEFLATE), because the coding efficiency provided by VQ is marginal in this context [8], and generally requires higher computational resources.

Quantizers can also be classified into uniform and non-uniform [5, 8]. An uniform quantizer distributes the available representation levels uniformely over the range of input values. Non-uniform quantizers use higher density of representation levels (more output levels per input different values) to those intervals of input values that occur more often.⁵ Quantizers can also be classified into static and adaptive quantizers. In the first case, the distribution of the representation levels remains constant during the quantization stage, and in the second case, the quantizer parameters are adapted dynamically to the characteristics of the input signal. In this milestone we use an uniform dead-zone scalar static quantizer, which can be implemented efficiently (in software) for digital signals. Moreover, dead-zone quantizers tend to produce more quantization indices equal to 0 (which increases compression ratios) at the cost of generating more quantization noise for values of the input signal close to 0, or what is the same, decreasing the SNR for small signal values. A priori, this could be seen as a problem, but in reality it is not because precisely when the amplitude of the signal is small and the noise is independent of its amplitude (which usually happens with electronic noise), the SNR of the input signal has its lowest value precisely for those values close to 0. Therefore, the quantizer will basically change electronic noise by quantization noise⁶ (see this introduction to signal quantization document and this comparative between digital scalar quantizers document). Finally, although this is a feature that we are not going to exploit for now, dead-zone quantizers are equivalent to encode the signal by bit-planes when the quantization steps sizes are powers of two, allowing the design of progressive entropy encoding schemes, if required.

1.4 (Bit-)Rate control and distortion

The number of representation levels used by a quantizer depends on the quantization step (size), typically denoted by \(\Delta \). The higher the \(\Delta \), the smaller the number of representation levels, and therefore the higher the distortion generated by the quantization error, and the smaller the output bit-rate! This generates a rate/distortion trade-off that is descriptive of all lossy compressors (more bits, less distortion, and viceversa).

In order to minimize the lost of data, the rate can be controlled in real-time transmission systems by modifiying \(\Delta \) when congestion occurs. However, notice that depending of the entropy coding stage and the characteristics of the signal ( variance, entropy, etc.) may not exist a clear relationship between \(\Delta \) and the output bit-rate. This happens using DEFLATE.

Notice also that any rate control algorithm based on quantization has a characteristic RD (Rate/Distortion) curve, in which the X axis represents the (in the case of InterCom, received) (bit-)rate, and the Y axis the distortion in the reconstruction (in the case of InterCom, the played audio sequence) obtained after the “de-quantization”⁷. Some examples can be found in this notebook.

1.5 The current implementation(s) for the control of the bit-rate

Bit-Rate (BR) control through quantization has been implemented in the class BR_Control* of the modules BR_control*.py. This class overrides the inherited methods pack() and unpack(), performing now (remember that the chunks are already “DEFLATE-encoded and -decoded”):

Notice that, regarding the bit-rate control, you will find four implementations related to this milestone:

2 Deliverables

3 Resources

[1] M. Bosi and R.E. Goldberd. Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, 2003.

[2] Behrouz Forouzan. Introduction to Data Communications and Networking. McGraw-Hill, 2007.

[3] Bert Hubert, Thomas Graf, Greg Maxwell, Remco van Mook, Martijn van Oosterhout, Paul B. Schroeder, Jasper Spaans, and Pedro Larroy. Linux Advanced Routing & Traffic Control. Publisher: Bert Humbert et al., 2012.

[4] Nelson M. and Gailly J. The Data Compression Book. M&T Books, 1996.

[5] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[6] Andrew S. Tanenbaum. Computer Networks. Prentice Hall, 2011.

[7] D.S. Taubman and W.M. Marcellin. JPEG2000. Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, 2002.

[8] M. Vetterli, J. Kovačević, and V.K. Goyal. Foundations of Signal Processing. Cambridge University Press, 2014.

¹Measured in bits per second or a \(10\)-multiple of this transmission capacity.

²Notice that this upper bound in the bit-rate will also affect to the loss of chunks because if the link capacity is smaller than the audio bit-rate (throughput), sooner or later the transmission link will discard those chunks that cannot be buffered in the retransmission nodes (routers and switches). In this case, we would be at least contributing, if not causing, the link congestion.

³If tc (or a similar tool) is not avaiable in your OS, you can use a real transmission environment, but you must take into consideration that you will need to control de bit-rate in order obtain the points of the RD curve.

⁴Used in a future improvements of intercom.

⁵The decision intervals and the representation levels in each interval can be also optimized using other criteria, such as, minimizing the rate/distortion at a given point of the RD curve.

⁶The error generated by the quantization stage.

⁷From a signal processing point of view, the term “de-quantization” refers to restore the original dynamic range of the signal, but notice that this does not imply that the original signal will be restored. This only happens when \(\Delta =1\).

⁸\(\Delta \) must be always bigger than \(0\), by definition, and this does not depend on the bit-rate control.

⁹Some samples are stored in the data directory of InterCom.