scispace - formally typeset

Book ChapterDOI

Realisation of an Adaptive Audio Tool

17 Oct 2000-pp 3-13

TL;DR: A realisation of adaptive FEC subdued to a TCP-friendly rate control is described, to ensure efficiency of FEC, the source rate must be continuously controlled to avoid congestion.
Abstract: Real-time audio over the best effort Internet often suffers from packet loss. So far, Forward Error Correction (FEC) seems to be an efficient way to attenuate the impact of loss. Nevertheless to ensure efficiency of FEC, the source rate must be continuously controlled to avoid congestion. In this paper, we describe a realisation of adaptive FEC subdued to a TCP-friendly rate control.
Topics: Packet loss (53%), Audio signal (52%), Sound quality (50%)

Content maybe subject to copyright    Report

Realisation of an Adaptive Audio Tool
Arnaud Meylan and Catherine Boutremans
Institute for Computer Communication and Applications (ICA)
EPFL (Swiss Federal Institute of Technology), CH-1015 Lausanne
March 31, 2000
Abstract
Real-time audio over the best effort Internet often suffers from packet loss. At this time,
Forward Error Correction (FEC) seems to be an efficient way to attenuate the impact of loss.
Nevertheless to ensure efficiency of FEC, the source rate must be continuously controlled to
avoid congestion. In this paper, we describe a realisation of adaptive FEC subdued to a TCP-
friendly rate control.
1 Introduction
The Internet has changed from mainly a file transfer and e-mail tool to a network for multimedia
and commercial applications, among others. This change brought up many new technical challenges
such as transport of real-time data over non real-time lossy networks, which has been fulfilled by
the Real-time Transport Protocol (RTP) [8] and Forward Error Correction techniques (FEC) [3].
Unfortunately, FEC is too often used without rate control, what leads to more congestion, loss
and then worse audio quality [6]. The purpose of this work is to add adaptive FEC to an existing
software: the Robust Audio Tool [7]. FEC will be constrained by a TCP-friendly rate, proposed
by Mahdavi and Floyd [16].
The general problem of optimizing quality at reception is presented, a more specific solution is
deduced. A short presentation of software’s architecture should help people interested in improving
this work.
2 State of the art
This section is a survey of useful techniques for audio exchange over a network.
2.1 Audio coding
Audio applications on the network need two distinct processing: encoding-decoding and packet-
ization-transmission.
Encoding transforms the audio signal which is usually analog to a digital version, with a given
rate and quality. Three parameters affect the output rate: sampling frequency, resolution, and
the compression means. The audio digital data is put in a packet that is transported through
the network. At destination, digital data is decoded—usually the inverse encoding treatment is
applied— and played on the audio device. We do not speak of encoder and decoder, but simply of
codec (CODer-DECoder), seeing that both actions are bound. Each codec uses a specific method
to perform encoding, in our application two principal classes can be outlined:
The first one uses differential encoding. It is based on the correlation of successive samples
of audio signals. Instead of coding the signal, the difference between samples is coded. As
they are correlated, the range of the difference is smaller than the range of whole signal.
1

This enables compression of the signal. Most popular examples for this encoding are DPCM,
ADPCM.
Instead of working with single samples or differences, synthesis coding considers sequences
of sound. A model of the source is built
1
, and only parameters of the model are transmitted.
This enables a drastic compression, but gives quite artificial sound too. Some examples are
LPC, and the now famous GSM standard.
We will not give more details on coding techniques in this paper because we borned ourselves
to use codecs already implemented in the software. Just remember that the same audio signal can
be encoded by different encoders, resulting in different qualities and rates. In our context the rate
and quality of the codec are the two important parameters, since the choice of codecs is dictated by
the available bandwidth on the network and maximization of audio quality. In the next subsection
we present error control schemes.
2.2 FEC
2.2.1 Definition
Audio transmission is highly susceptible to the indeterminacies of the best effort internet service.
Packets may be lost, misordered or delayed; this leads to a diminution of the perceived sound
quality for the receiver. Clearly, typical error control/loss recovery mechanisms based on the
retransmission of lost packets are not acceptable because they dramatically increase end to end
latency. Our purpose is to propose a technique able to minimize the impact of loss and delay
introduced by transmission.
It seems important to mention that the Quality of Service (QoS) on the Internet is a wide
contemporary research topic. Many network researchers think that loss should be explicitly avoided
or controlled by re-engineering the network to provide QoS guaranteed, through techniques like
admission control and reservation [1], [2]. Nevertheless, our objective is to use the “original”
Internet, since it is cheaper and more accessible.
Forward Error Correction (FEC) relies on the addition of repair data to a stream, from which
the content of lost packets may be recovered at destination, at least in part. Two classes of repair
data may be added to a stream [3]: those which are independent of the contents of that stream
(e.g. Parity coding, Reed-Solomon codes) and those which use knowledge of the stream to improve
the repair process. In the second class several means exist. The most popular is to add copies of
audio units to the stream, it is also possible to use a special codec that performs layered coding;
layers are transmitted in different packets. Only the most popular will be considered here.
2.2.2 SFEC
In order to simplify the following discussion, we distinguish a (media) unit of data from a packet.
A unit is an interval of audio data, as stored internally in an audio tool. A packet comprises one
or more units, encapsulated for transmission over the network.
The principle of FEC used here is to transmit each unit of audio in multiple packets. If a
packet is lost then another packet containing the same unit will be able to cover loss and be
played—providing it arrives. The principle is illustrated in Figure 1. This approach has been
advocated in [4] and [5] for use on the Mbone, and extensively simulated by Podolsky [6] who calls
this framework “Signal-processing based FEC” (SFEC
2
). Redundant audio units are piggy-backed
onto a later packet, which is preferable to the transmission of additional packets, as this decrease
the amount of packet overhead and routing decisions.
1
codecs designed especially for voice are called vocoders
2
“because this approach exploits a signal-processing based model of the audio signal to effectively compute its
error-correcting information”
2

Figure 1: Redundancy principle
There is a potential problem with apply redundancy in response to loss because when loss
occurs the (user) reflex is to add redundancy at the source. This leads to increase the overall rate
transmission, and congestion, probably producing even worse quality. Effectively, redundancy can
be added when loss occurs, but in the same time the source encoding must be changed to use less
bandwidth, in response to congestion [6]. Our framework is to continuously ensure that the overall
rate transmission is smaller or equal than a TCP-friendly rate proposed by Mahdavi and Floyd [16].
It is a combined source rate/redundancy control. Two important parameters concerning SFEC
are:
k denotes the number of copies for the same audio unit. In the figure 1, for example, k =2.
k is bound to the loss probability of the network and the the desired apparent loss rate after
reconstruction at reception.
o =[o
1
,o
2
,...,o
k
] denotes the offset of redundant unit i relative to the first unit transmitted.
Note that one always have o
1
=0. OnFigure1o
2
=1.
Algorithms used to adapt redundancy are presented in section 4.
2.3 RTP/RTCP overview
TCP is not appropriated to transmit real-time data since it performs retransmissions and does
not offer sufficient support for time management. That is why the IETF developed the Real
Time Transport Protocol (RTP). It provides end-to-end delivery services for data with real-time
characteristics such as interactive audio and video. Those services include payload type identi-
fication, sequence numbering, time-stamping and delivery monitoring. RTP also offers a control
protocol—RTCP—usually carried on a separate lower level transport association. Below we present
a selection of useful fields present in RTP packets:
Payload type Identifies the format of the RTP payload. A profile specifies a default static map-
ping of payload types codes to payload format. It is also used to identify type of redundancy
inthecaseofaudio.
Sequence number The sequence number increments by one for each RTP data sent, and may
be used by the receiver to detect packet loss and/or restore packet sequence.
Time-stamp Is derived from a clock at the sender. Reflects the sampling instant of the first byte
in the RTP data packet.
Moreover RTCP Sender Reports (SR) and Receiver Reports (RR) provide reception quality
feedback (i.e. the fraction of packet lost and the interarrival jitter). They also permit to compute
the Round Trip Time (RTT).
3

2.4 TCP-Friendly r ate control
As networked multimedia applications become widespread, it becomes increasingly important to
ensure that they can share resources fairly with each other and with current TCP-based applica-
tions, the dominant source of Internet traffic. The TCP protocol is designed to reduce its sending
rate when congestion is detected. Networked multimedia applications should exhibit similar be-
havior, if they wish to co-exist with TCP-based applications.
One way to ensure such co-existence is to implement some form of congestion control that
adapts the transmission rate in a way that fairly shares congested bandwidth with TCP applica-
tions. One definition of fair is that of TCP friendliness [16] - if a non-TCP connection shares a
bottleneck link with TCP connections, traveling over the same network path, the non-TCP con-
nection should receive the same share of bandwidth (namely achieve the same throughput) as a
TCP connection.
Mahdavi and Floyd [16] have derived an expression relating the average TCP throughput (R)
to the packet loss rate:
R
TCP
=1.22
MTU
RT T ×
π
1
where MTU is the packet size being used on the connection; RT T is the round trip time and π
1
is the loss rate being experienced by the connection.
In order to implement a TCP friendly congestion control algorithm, our application will simply
choose to send at a rate no higher than the TCP-friendly rate R
TCP
. The accurate computation
of this value requires the application to know the MTU, RT T and current loss rate π
1
.TheMTU
may be determined using an MTU discovery algorithm, or assumed to be the minimum acceptable
value for TCP of 576 bytes. In our application, we fix the MTU to 576 bytes. Current values for
RT T are obtained using the Time-stamp fields of the RTCP Sender Reports and Receiver Reports.
The packet loss rate π
1
is computed at the receiver and reported to the sender via the Fraction
lost field of the Receiver Reports.
3 The Robust Audio Tool
3.1 Software presentation
The Robust Audio Tool (RAT) [7] is an open-source audio conferencing and streaming application
that allows users to particpate in audio conferences over the internet. These can be between two
participants directly, or between a group of participants on a multicast group.
This software is based on IETF standards, it uses RTP above UDP/IP as its transport protocol,
according to the RTP profile for audio and video conference with minimal control [8] [9]. RAT
features a range of different rate and quality codecs, receiver based loss concealment to mask packet
losses, and sender based channel coding in the form of redundant audio transmission.
3.2 Some useful concepts about RAT
3.2.1 The Message bus
RAT comprises three separate processes: controller, media engine and user interface. Communica-
tion between them is provided by the Message bus (Mbus), the sole means to ensure coordination
of multimedia conferencing systems.
The Message bus was proposed by Colin Perkins and Jorg Ott [10]. It solves the typical
problem of separate tools providing audio video and shared workspace functionality. It maps
well on the underlying RTP media streams, which are also transmitted separately. Given such
architecture, it is useful to be able to perform some coordination of the separate media tools. For
example, it may be desirable to communicate playout-point information between audio and video
tools.
4

A further refinement of this architecture relies on the presence of a number of media engines,
and one (or more) user interface agents, which control the media engines and provide a unified
conferencing experience. This approach allows flexibility of user interface, but obviously requires
some means by which the user interface agent may communicate with the media engines. The
Message bus as implemented in RAT provides such a communication channel.
A message contains a header and a body. The rst part indicates notably the source and
destination adresses, the latter contains the message having to be delivered to the application.
The message is in the form of a string and a function maps it into a C function call at the
destination (unmarshalling).
Messages are transmitted as UDP messages, an unreliable delivery mechanism. Since it may be
necessary to deliver some messages reliably, the message bus allows this through retransmission of
lost messages. However most of the messages are sent unreliably, since they are not very important
for the application and have a little probability to be lost.
In the next paragraph some useful Mbus commands of RAT are presented. The first ones are
quite general , the following are designed specifically for RAT.
Command name Use
mbus.hello() Sent as a heartbeat message every few seconds to indicate liveness
mbus.quit() Sent to indicate the receiving entity should quit
rtp.cname(.) Indicates that the receiver should use the specified cname during
this session
More specific commands
tool.rat.codec(.) Specifies primary codec being used by this source
audio.channel.coding(.) Specifies secondary codec, and its relative offset to the primary
tool.rat.rate(.) Set the number of ”units” (codec frames, typically) placed in each
packet when transmitting, assuming a unit is t ms long
tool.rat.playout.max(.) Set the maximum playout delay allowed
tool.rat.lecture.mode(.) Enables/disables the lecture mode
These messages provide powerful high level access to the program. The biggest part of imple-
mentation was made using these commands.
3
3.2.2 The channel coder
There is a specific vocabulary used for FEC, we will try to explain it. First recall that any audio
unit can be encoded with various codecs, and then put in packets. To avoid that kind of vague
descriptions, let us define the following terms:
A codec is defined as an encoder-decoder performing one type of compression/decompression
on the audio stream. The encoder part takes a buffer of audio stream in input, performs encoding
and outputs a playout buffer of media units. Typically this buffer is 20, 40 or 60 ms long.
After that a black box performs the preparation of channel units, which represent the payload of
the RTP packets. In RAT there is a generic framework for this, the channel coder.Itperformsfour
different kind of channel coding, especially three allowing FEC called redundancy, interleaving and
layered. First mode uses scheme described above (SFEC), second one performs a resequencement
of audio units before transmission, so that originally adjacent units are separated by a guaranteed
distance in the transmitted stream; it disperses the effect of packet losses. The last one uses special
codecs, performing a layered coding. There is also a simple mode
4
that simply sends one copy of
each media unit. Channel units are then a composing of media units depending on the selected
channel coder. For our work, FEC will be added in the form of SFEC, using the redundant channel
coder.
3
For interested programmer, it may be useful to check files README.mbus mbus.c, mbus engine.c, mbus ui,
mbus
control.c.
4
Called “vanilla” in RAT’s context
5

Citations
More filters

DOI
01 Jan 2003
TL;DR: This work develops a joint rate/error/playout delay control algorithm which optimizes this measure of quality and is TCP-Friendly, and proposes an adaptive service choosing algorithm that allows audio sources to choose in real-time the service providing the highest audio quality.
Abstract: In this work, we address the transport of high quality voice over the Internet with a particular concern for delays. Transport of interactive audio over IP networks often suffers from packet loss and variations in the network delay (jitter). Forward Error Correction (FEC) mitigates the impact of packet loss at the expense of an increase of the end-to-end delay and the bit rate requirement of an audio source. Furthermore, adaptive playout buffer algorithms at the receiver compensate for jitter, but again this may come at the expense of additional delay. As a consequence, existing error control and playout adjustment schemes often have end-to-end delays exceeding 150 ms, which significantly impairs the perceived quality, while it would be more important to keep delay low and accept some small loss. We develop a joint playout buffer and FEC adjustment scheme for Internet Telephony that incorporates the impact of end-to-end delay on perceived audio quality. To this end, we take a utility function approach. We represent the perceived audio quality as a function of both the end-to-end delay and the distortion of the voice signal. We develop a joint rate/error/playout delay control algorithm which optimizes this measure of quality and is TCP-Friendly. It uses a channel model for both loss and delay. We validate our approach by simulation and show that (1) our scheme allows a source to increase its utility by avoiding increasing the playout delay when it is not really necessary and (2) it provides better quality than the adjustment schemes for playout and FEC that were previously published. We use this scheme in the framework of non-elevated services which allow applications to select a service class with reduced end-to-end delay at the expense of a higher loss rate. The tradeoff between delay and loss is not straightforward since audio sources may be forced to compensate the additional losses by more FEC and hence more delay. We show that the use of non-elevated services can lead to quality improvements, but that the choice of service depends on network conditions and on the importance that users attach to delay. Based on this observation, we propose an adaptive service choosing algorithm that allows audio sources to choose in real-time the service providing the highest audio quality. In addition, when used over the standard IP best effort service, an audio source should also control its rate in order to react to network congestion and to share the bandwidth in a fair way. Current congestion control mechanisms are based on packets (i.e., they aim to reduce or increase the number of packets sent per time interval to adjust to the current level of congestion in the network). However, voice is an inelastic traffic where packets are generated at regular intervals but packet size varies with the codec that is used. Therefore, standard congestion control is not directly applicable to this type of traffic. We present three alternative modifications to equation based congestion control protocols and evaluate them through mathematical analysis and network simulation.

7 citations


Dissertation
Víctor Manuel Ramos Ramos1Institutions (1)
07 Dec 2004
TL;DR: This thesis proposes models to evaluate the performance of real-time multimedia applications and proposes the first analytical model for this kind of protocols accounting for delay variability based on stochastic difference equations.
Abstract: In this thesis, we propose models to evaluate the performance of real-time multimedia applications. Besides, we propose a model for AIMD protocols. The first subject we study is a simple error correction (FEC) mechanism. We first model the network as an M/M/1/K queuing system. We assume a linear utility function relating the audio quality and the amount of redundancy. The redundancy of packet i is carried by packet i+f. Our analysis shows that, even for the case f->inf, this simple FEC scheme always leads to a quality deterioration. Next, we model the bottleneck router as an M/G/1/K queue. We consider two cases that may contribute to a quality improvement: (a) multiplexing the audio flow with an exogenous flow, and (b) considering non-linear utility functions. Under these assumptions, we show that this FEC scheme can lead to a quality improvement. The second subject investigated is about playout delay algorithms. We propose a set of moving average algorithms allowing to control the average loss rate in an audio session. We study and compare the performance of our algorithms by simulation with real packet traces. The third subject we study is about the performance of AIMD protocols. We propose, at the best of our knowledge, the first analytical model for this kind of protocols accounting for delay variability. The model is based on stochastic difference equations. It provides a closed-form expression for the throughput and for the window size in steady state. We show by analysis and simulation that an increase in delay variability improves the performance of AIMD protocols.

1 citations


References
More filters

01 Jul 2003
TL;DR: RTP provides end-to-end network transport functions suitable for applications transmitting real-time data over multicast or unicast network services and is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks.
Abstract: This memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers.

7,115 citations


Journal ArticleDOI
Lixia Zhang1, Stephen Deering1, Deborah Estrin, Scott Shenker  +1 moreInstitutions (1)
Abstract: A resource reservation protocol (RSVP), a flexible and scalable receiver-oriented simplex protocol, is described. RSVP provides receiver-initiated reservations to accommodate heterogeneity among receivers as well as dynamic membership changes; separates the filters from the reservation, thus allowing channel changing behavior; supports a dynamic and robust multipoint-to-multipoint communication model by taking a soft-state approach in maintaining resource reservations; and decouples the reservation and routing functions. A simple network configuration with five hosts connected by seven point-to-point links and three switches is presented to illustrate how RSVP works. Related work and unresolved issues are discussed. >

1,466 citations


Journal ArticleDOI
Colin Perkins1, Orion Hodson1, Vicky Hardman1Institutions (1)
TL;DR: A number of packet loss recovery techniques for streaming audio applications operating using IP multicast, and a series of recommendations for repair schemes to be used based on application requirements and network conditions are made.
Abstract: We survey a number of packet loss recovery techniques for streaming audio applications operating using IP multicast. We begin with a discussion of the loss and delay characteristics of an IP multicast channel, and from this show the need for packet loss recovery. Recovery techniques may be divided into two classes: sender- and receiver-based. We compare and contrast several sender-based recovery schemes: forward error correction (both media-specific and media-independent), interleaving, and retransmission. In addition, a number of error concealment schemes are discussed. We conclude with a series of recommendations for repair schemes to be used based on application requirements and network conditions.

666 citations


01 Jul 2003
TL;DR: This document describes a profile called "RTP/AVP" for the use of the real-time transport protocol (RTP) and the associated control protocol, RTCP, within audio and video multiparticipant conferences with minimal control.
Abstract: This document describes a profile called "RTP/AVP" for the use of the real-time transport protocol (RTP), version 2, and the associated control protocol, RTCP, within audio and video multiparticipant conferences with minimal control. It provides interpretations of generic fields within the RTP specification suitable for audio and video conferences. In particular, this document defines a set of default mappings from payload type numbers to encodings.

608 citations


Proceedings ArticleDOI
21 Mar 1999
TL;DR: A simple algorithm is obtained that optimizes a subjective measure as opposed to an objective measure of quality, and incorporates the constraints of rate control and playout delay adjustment schemes, and it adapts to varying loss conditions in the network.
Abstract: Excessive packet loss rates can dramatically decrease the audio quality perceived by users of Internet telephony applications. Previous results suggest that error control schemes using forward error correction (FEC) are good candidates for decreasing the impact of packet loss on audio quality. However, the FEC scheme must be coupled to a rate control scheme. Furthermore, the amount of redundant information used at any given point in time should also depend on the characteristics of the loss process at that time (it would make no sense to send much redundant information when the channel is loss free), on the end to end delay constraints (destination typically have to wait longer to decode the FEC as more FEC information is used), on the quality of the redundant information, etc. However, it is not clear given all these constraints how to choose the "best" possible redundant information. We address this issue, and illustrate the approach using an FEC scheme for packet audio standardized in the IETF. We show that the problem of finding the best redundant information can be expressed mathematically as a constrained optimization problem for which we give explicit solutions. We obtain from these solutions a simple algorithm with very interesting features, namely (i) the algorithm optimizes a subjective measure (such as the audio quality perceived at a destination) as opposed to an objective measure of quality (such as the packet loss rate at a destination), (ii) it incorporates the constraints of rate control and playout delay adjustment schemes, and (iii) it adapts to varying loss conditions in the network (estimated online with RTCP feedback). We have been using the algorithm, together with a TCP-friendly rate control scheme and we have found it to provide very good audio quality even over paths with high and varying loss rates. We present simulation and experimental results to illustrate its performance.

374 citations


Network Information
Related Papers (5)
01 Jan 2000

Catherine Boutremans, Jean-Yves Le Boudec

21 Mar 1999

Jean-Chrysostome Bolot, S. Fosse-Parisis +1 more

12 Nov 2001

Kenneth French, Mark Claypool

01 Nov 2006

Fernando Silveira Filho, Edson H. Watanabe +1 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20041
20031