scispace - formally typeset
Open AccessJournal ArticleDOI

Voice over IP performance monitoring

Robert Cole, +1 more
- Vol. 31, Iss: 2, pp 9-24
Reads0
Chats0
TLDR
It is found that an in-path monitor requires the definition of a reference de-jitter buffer implementation to estimate voice quality based upon observed transport measurements, and it is suggested that more studies are required, which evaluate the quality of various VoIP codecs in the presence of representative packet loss patterns.
Abstract:Ā 
We describe a method for monitoring Voice over IP (VoIP) applications based upon a reduction of the ITU-T's E-Model to transport level, measurable quantities. In the process, 1) we identify the relevant transport level quantities, 2) we discuss the tradeoffs between placing the monitors within the VoIP gateways versus placement of the monitors within the transport path, and 3) we identify several areas where further work and consensus within the industry are required. We discover that the relevant transport level quantities are the delay, network packet loss and the decoder's de-jitter buffer packet loss. We find that an in-path monitor requires the definition of a reference de-jitter buffer implementation to estimate voice quality based upon observed transport measurements. Finally, we suggest that more studies are required, which evaluate the quality of various VoIP codecs in the presence of representative packet loss patterns.

read more

Content maybe subject toĀ copyrightĀ Ā Ā  Report

VOICE OVER IP PERFORMANCE MONITORING
COLE, R. G. AND ROSENBLUTH, J. H.
AT&T LABORATORIES
MIDDLETOWN, NJ
{rgcole, jrosenbluth} @att.com
ABSTRACT
We describe a method for monitoring Voice over
IP (VolP) applications based upon a reduction of the
ITU-T's E-Model to transport level, measurable
quantities. In the process, 1) we identify the relevant
transport level quantities, 2) we discuss the tradeoffs
between placing the monitors within the VolP
gateways versus placement of the monitors within the
transport path, and 3) we identify several areas where
further work and consensus within the industry are
required. We discover that the relevant transport level
quantities are the delay, network packet loss and the
decoder's de-jitter buffer packet loss. We find that an
in-path monitor requires the definition of a reference
de-jitter buffer implementation to estimate voice
quality based upon observed transport measurements.
Finally, we suggest that more studies are required,
which evaluate the quality of various VolP codecs in
the presence of representative packet loss patterns.
1 INTRODUCTION
There is great interest in supporting voice
applications over both the public Internet and private
intra-nets, i.e., Voice over IP (VoIP). Several popular
Internet implementations are the Video Audio Tool
(VAT) [1] and the Robust Audio Tool (RAT) [2], as
well as a host of ITU-T H.323 implementations. An
important aspect of VoIP is developing a performance
monitoring capability to track the quality of the voice
transport. In this paper, we discuss one approach to
monitoring the performance of conversational voice
applications over Internet transport. Specifically, we
investigate the use of the 1TU-T's E-Model[3] as a tool
to relate several transport level metrics to an estimate
of conversational voice quality. To accomplish this,
we analyze the reduction of the existing E-Model in
terms of transport-level metrics for the purpose of
monitoring conversational voice quality. In the
process, we discuss the advantages and shortcomings
of our approach and identify a set of issues which we
believe need to be addressed within the open
literature.
The ITU-T's E-Model is a network planning tool
used in the design of hybrid circuit-switched and
packet-switched networks for carrying high quality
voice applications. The tool estimates the relative
impairments to voice quality when comparing
different network equipment and network designs.
The tool provides a means to estimate the subjective
Mean Opinion Score (MOS) rating of voice quality
over these planned network environments. We
describe the E-Model in more detail in Section 3
below.
The specific method we advocate is to:
Measure the low-level transport metrics
(characterizing the channel), which impact voice
performance, i.e., delay, delay variation and packet
IOSS~
Combine the packet loss and delay variation
measurements, de-jitter buffer operations, packet
size and coder frame size into an error mask (the
exact sequence of good and bad coder frames)
that can be characterized in a simple manner (e.g.,
average frame loss rate along with some measure
of burstiness),
Combine the characterized error mask with the
coder and its frameqoss concealment algorithm
via a look-up table (or curve fit) based on
ACM SIGCOMM 9 Computer Communication Review

subjective testing to produce an E-Model
equipment impairment factor (lee), and
Combine the I~f with other E-Model low-level
measurable elements, i.e., delay and echo, to
produce a predicted opinion score on the quality
of the voice conversation.
We illustrate this measurement and data reduction
methodology in Figure 1 below. In this figure we
capture the channel characteristics via a set of
transport level measurements, e.g., packet loss and
delay distributions. We combine this with various
architectural characteristics of the VolP gateways,
specifically the de-jitter buffer implementations, the
transport packet size and the codec frame size.
Channel Characteristics
Packet Loss Distribution
Delay Jitter Distribution
Frame Erasure Distribution
(Error Mask)
Opinion ..... > Equipment Impairment Factor
Architecture Choices
De-jitter Buffer
Packet Size
Codee Frame Size
Codec
Loss Concealment Algorithm
Figure h A measurement and data reduction methodology for
VolP quality monitorine~ which highlights the equipment
impairment factor elements.
The result of combining the channel characterization
and the architectural characterization of the gateways
is a
Frame Erasure Distribution
(or
Error Mask).
The
Error Mask characterizes the salient features of the
loss distribution as observed by the decoder. (Note:
This loss distribution captures both the transport
packet loss and the loss in the decoder's de-jitter
buffer I due to late packet arrivals.) When the Error
Mask is combined with the specific loss concealment
algorithm implemented within codec, we generate an
Equipment Impairment Factor, which captures the
t The decoder must intentionally delay the variable delayed,
arriving voice packets in its de-jitter buffer in order to
reconstruct a synchronous bit stream. In some cases this
de-jitter buffer delay is not large enough to absorb the
transport delay variation. This results in de-jitter buffer
losses as observed by the decoder[
expected impairment of the codec under the above
conditions. From this, the E-Model provides a means
to estimate a quality score for the conversational voice
application. We discuss the methodology in detail in
Section 4 below,
Following this methodology, it is possible to build
a relatively simple VolP performance monitoring
capability. It is a further goal of this paper to help
foster an industry consensus around the specific
methodology tO follow in developing such a VolP
performance monitoring capability. Only then, would
it be possible ~o obtain consistent quality estimates
from this type of VolP quality monitor.
I
i
The remaiflder of this paper is organized as
follows: We next discuss the relationship of the E-
Model approacl! we are advocating to other methods
of monitoring VolP performance. In Section 3, we
present an ove~iew of the ITU-T's E-Model. Section
4 covers our efforts to reduce the E-Model's formulae
to transport-levd, directly measurable quantities in an
unambiguous fashion. Section 5 discusses several
issues with this reduction and, in particular, discusses
the issue of identifying a 'reference model' for
performance monitoring. We follow this with a
discussion of measurement methodologies and report
on some field-work we have performed with this
approach. We finish the paper with a section on our
conclusions.
2
RELATIONSHIP TO OTHER METHODS
We know of a few commercial products, which
implement monitoring approaches similar to the one
we advocate within this paper. Also, the approach we
advocate is not the only approach to monitoring the
quality of VolP applications. Other approaches range
from "objective models of quality", involving the
injection of sample speech segments across the
network, to simple packet level measurements. In this
section we discusĀ§ these alternatives.
We have run across several commercial products,
which monitor VoIP quality in a fashion similar to the
approach we advocate. These include the Cisco voice
dial control MIB [4] and Telchemy's monitoring
software [5]. The information on these products
refers to the E-Model in the description of their
approach. However, both of these products appear to
rely on information extracted from within the voice
gateways. As such, they require implementation
within the VolP gateways themselves. In this paper,
ACM SIGCOMM 10 CompUter Communication Review

we discuss a generalized approach, which is
independent of the implementation location of the
monitors. As such, we attempt to highlight the
relative advantages of each approach.
An alternative method, commonly in use, requires
the injection of sample speech segments across a voice
transport path, i.e., the coder-to-network-to-decoder
path. This method is often referred to as "objective
models of quality". The models compare the output
speech to the input speech, using psycho-acoustic
fundamentals, to produce opinion without reference
to the underlying channel conditions. However, it is
still necessary to overlay the low-level transport
measurements of delay and echo on top of this, using
the E-Model, in order to capture the conversational
speech impairments due to delay. The ITU-T has
standardized one such model [6] and continues to
investigate improved models [7].
The advantages of objective models of quality are:
1) no knowledge of, or assumptions about, the
underlying network is required (coder, de-jitter buffer,
error mask, packet size), and 2) predicted opinion is
based on fundamental psycho-acoustics rather than an
interpolation of subjective testing results as with the
E-Model, and thus the results may be more accurate
and more robust. With regard to accuracy, the E-
Model was intended to be used as a network planning
tool, not a network maintenance tool. As such, it only
needs to be accurate enough to distinguish between
broad ranges of quality (see Table 1). On the other
hand, objective models of quality can often distinguish
between quality levels within a broad range. With
regard to robustness, the E-Model cannot predict
opinion for conditions that have not been previously
scored by subjective panels. On the other hand,
objective models of quality can predict opinion for
such conditions, although their accuracy in such cases
is not always good.
The disadvantages of objective models are: 1) they
are complex and costly, 2) there are some conditions
for which they are known not to be accurate (e.g.,
temporal clipping), 3) they are intrusive whereas the E-
Model can be implemented as either intrusive or non-
intrusive, and 4) they reveal nothing about the
underlying cause of quality problems. Because we
make low-level measurements with the E-Model, we
have causality information.
Another method is to rely on direct packet level
measurements or straightforward combinations of
packet level measurements. Thresholds are then
defined as to when the quality of voice conversations
would degrade to a critical point. The advantage of
this approach is that it is relatively simple to
implement. Its disadvantage is that the thresholds it
relies upon are somewhat arbitrarily chosen. Further,
this approach does not attempt to combine the
transport metrics in a meaningful way with respect to
voice quality.
3 THE E-MODEL: AN END-TO-END VOICE
QUALITY MODEL
The E-Model, defined in the ITU-T Rec. G.107
[3] as well as other associated ITU-T
recommendations [8], is an analytic model of voice
quality used for network planning purposes.
Specifically, the E-Model presents a method for
estimating the relative voice quality when comparing
two reference connections [3]. According to [3], the
E-Model has proven useful as a transmission planning
tool, however further studies are underway to address
the assumptions of the E-Model under specific
parameter combinations. For a fuller discussion of the
validity of the E-Model, refer to [3].
A basic result of the E-Model is the calculation of
the R-factor, which is a simple measure of voice
quality ranging from a best case of 100 to a worst case
of 0. The R-factor uniquely determines the familiar
Mean Opinion Score (MOS), which is the arithmetic
average of opinion when
"excellent"
quality is given a
score of 5, "good" a 4, "fair" a 3, "poor" a 2, and
"'bad"
a 1. The R-factor is defined in terms of several
parameters associated with a voice channel across a
mixed Switched Circuit Network (SCN) and a Packet
Switched Network (PSN). The parameters included in
the computation of the R-factor are fairly extensive
covering such factors as echo, background noise,
signal loss, codec impairments, and others. An
excellent discussion of the E-Modal is found in [9].
The R-factor is related to the MOS through the
following set of expressions:
ForR <O: MOS = I
ForR > IO0:MOS = 4.5
ForO < R < 100 : MOS = 1 + 0.035 R
+ 7xlO" R(R-60)(IOO-R)
(Equation I)
For reference, Eq.(1) is plotted in Figure 2.
ACM SIGCOMM 11 Computer Communication Review

6O
O
eJ
I I I I I
0 Z0 40 60 80 100
R
Figure 2: A plot showing the relationship between the R factor
and the MOS (see Eq. (1)).
Typically, the values of the R-factor are categorized as
shown in Table 1 below. Here we see that
connections with R-factors of less than 60 are
expected to provide a 'poor' quality of service to users.
Table 1:Rfactor6 quality ratings and the assodated MOS.
R-factor
90 < R < 100
Quality of
voice rating
Best
MOS
4.34 - 4.5
80 < R < 90 High 4.03 - 4.34
70 < R < 80 Medium 3.60 - 4.03
60 < R < 70 Low 3.10 - 3.60
50 < R < 60 Poor 2.58 - 3.10
The R-factor is expressed as the sum of four
terms:
a= loo-x,-zd-zĀ¢ +A
(Equation 2)
where Is is the signal-to-noise impairments associated
with typical SCN paths, Id is the impairment associated
with the mouth-to-ear delay of the path, I~f is an
equipment impairment factor associated with the
losses within the gateway codecs and A is the
Expectation Factor.
An interesting aspect of the E-
Model is that these terms, i.e., Is, Ia, and I~f are additive
and further, that the delay and packet loss
contributions are isolated into
Id
and Ief, respectively.
This does not imply that delay and packet loss are un-
correlated in the underlying transport media, but only
that their contributions to the estimated impairments
are separable.
The ExpeCtation Factor covers those intangible
quantities that are difficult (or impossible) to quantify.
This term accounts for lowered customer expectations
of quality because of, e.g., a cell phone user's tendency
to tolerate lower quality in exchange for the
convenience afforded by mobility, or in exchange for a
lower price. For the most part it is difficult to estimate
the Expectation Factor, although there appears to be
some agreement that an Expectation Factor of around
10 for a cellular network is appropriate [10]. However,
no such agreement has been reached for the case of
lower prices aS expected with some VolP services.
For this reasori, we will drop the Expectation Factor
from our future discussions of the R-factor.
The signal-to-noise impairment factor, I,, is a
function of several parameters, none of which are a
function of the underlying packet transport. However,
the ITU-T Rec. G.107 [3] recommends a set of default
values for these parameters for planning purposes.
Because this is inot the focus of our discussion, and is
dependent upon the method to access the VolP
network, we will rely upon the default
recommendations for all but a few parameters, e.g., all
except for the delay and packet loss parameters. For
example, it is *ufficient for our purposes to assume
that echo cancelers are present and working properly
(no echo). Choosing these default values, we can
reduce the expression for the R-factor [3] to:
R = 94.2 - I d - I,f (Equation 3)
Not only have we chosen the default values for the
various SCN signal impairments, but we have also
dropped reference to the Expectation Factor.
The delay components within the function Id are:
1) T, the average, absolute one-way mouth-to-ear
delay, 2) T the average, one-way delay from the receive
side to the point in the end-to-end path where a signal
coupling occurs as a source of echo, and 3) T~ the
average, round trip delay in the four-wire loop. Note
that T~, T, and T represent various measures of delay
from different points within a general reference
ACM SIGCOMM 12 Computer Communication Review

connection. G.107 gives a fully analytical expression
for the function
Id,
in terms of T~, T, T, and
parameters associated with a general reference
connection describing various circuit switched and
packet switch inter-working scenarios.
Table 2: Values of the delay impairment for selected, one-way
delay values. (Note: The one-way delay is defined as mouth-to-
ear delay.)
One-way delay
(msec)
Ia
25 0.9
50 1.5
75 2.1
100 2.6
125 3.1
150 3.7
175 5.0
200 7.4
225 10.6
250 14.1
275 17.4
300 20.6
325 23.5
350 26.2
375 28.7
400 31.0
Since the focus of this paper is on the
development of an IP-based monitoring system, we
choose to simplify the expression for Ia in three ways
(and hence focus our discussion on IP-based transport
and VolP gateway issues). First, for the cases we are
interested in, i.e., VolP with no circuit switched
network interworking, the various measurement points
for the delay measures collapse into a single pair of
points, such that
d=ro
=r=r,/2
(Equation 4)
and that Id(d) is now a function only of the single delay
measurement d. Second, we choose to use the default
values listed in [3] for all terms in the Ia expression
other than Ta, T and T,. Third, we plot out the delay
component and then fit the resulting curve to a simple
expression for discussion purposes. For reference, the
full expression for Id, assuming only the default values
listed in G.107, could be used for our purposes instead
of our simplified expression derived below. But, we
find it much more convenient for discussion and
modeling purposes to use our simplified expression
below. Table 2 above gives the values for the delay
component for selected values of the one way delay
[11].
In Figure 3, we plot these values and find that Ia
has two roughly linear regions. A knee in the curve
occurs at a delay of 177 msec. For one way delays less
than 177 msec, conversations occur naturally, whereas
at delays in excess of 177 msec conversations begin to
strain and breakdown; often degenerating into simplex
communications at high delay values.
40
35
30
25
_
20
15
10
5
0
One-way delay
(ms)
[ #
Delay component
- ,11-- Estimate I
Figure 3: A plot of the Ia as a function of delay along, with a
simple fit.
Also on this plot, we fit the values of Id to the
expression:
Ia = 0.024d + O.t l (d- t 77.3) H(d- 177.3) (EquationS)
Here d is the one-way delay (in milliseconds) and H(x)
is the Heavyside (or step) function:
H(x) = O if x < O, else
H(x) = 1 for x >= 0 (Equation 6)
We can now express the R-factor in the form:
R ~ 94.2- 0.024d+0.11(d-177.3)H(d-177.3) - IĀ¢
(Equation 7)
All that remains is to fred suitable estimates for
the equipment impairment factors. Currently, no
analytic expressions exist for the equipment
ACM SIGCOMM 13 Computer Communication Review

Citations
More filters
Journal ArticleDOI

A Survey on Security Threats and Detection Techniques in Cognitive Radio Networks

TL;DR: The scope of this work is to give an overview of the security threats and challenges that cognitive radios and cognitive radio networks face, along with the current state-of-the-art to detect the corresponding attacks.
Journal ArticleDOI

From QoS to QoE: A Tutorial on Video Quality Assessment

TL;DR: A comprehensive survey of the evolution of video quality assessment methods, analyzing their characteristics, advantages, and drawbacks and identifying the future research directions of QoE is given.
Journal ArticleDOI

Interactive wifi connectivity for moving vehicles

TL;DR: ViFi is developed, a protocol that opportunistically exploits basestation diversity to minimize disruptions and support interactive applications for mobile clients that doubles the number of successful short TCP transfers and doubles the length of disruption-free VoIP sessions compared to an existing WiFi-style handoff protocol.
Journal ArticleDOI

Assessing the quality of voice communications over Internet backbones

TL;DR: The findings indicate that although voice services can be adequately provided by some ISPs, a significant number of Internet backbone paths lead to poor performance.
Journal ArticleDOI

Voice quality prediction models and their application in VoIP networks

TL;DR: A new methodology for developing perceptually accurate models for nonintrusive prediction of voice quality which avoids time-consuming subjective tests is presented and is generic and has wide applicability in multimedia applications.
References
More filters
Proceedings ArticleDOI

Adaptive playout mechanisms for packetized audio applications in wide-area networks

TL;DR: The authors investigate the performance of four different algorithms for adaptively adjusting the playout delay of audio packets in an interactive packet-audio terminal application, and indicate that an adaptive algorithm which explicitly adjusts to the sharp, spike-like increases in packet delay can achieve a lower rate of lost packets.
Journal ArticleDOI

Characterizing End-to-End Packet Delay and Loss in the Internet

TL;DR: Estimates of Internet workload are consistent with the hypothesis of a mix of bulk traffic with larger packet size, and interactive traffic with smaller packet size and a phenomenon of compression of the probe packets similar to the acknowledgement compression phenomenon recently observed in TCP.
Journal ArticleDOI

Packet audio playout delay adjustment: performance bounds and algorithms

TL;DR: A new adaptive delay adjustment algorithm that tracks the network delay of recently received packets and efficiently maintains delay percentile information is presented and it is shown that this algorithm outperforms existing delay adjustment algorithms over a number of measured audio delay traces and performs close to the theoretical optimum over a range of parameter values of interest.
Journal ArticleDOI

Successful multiparty audio communication over the Internet

TL;DR: This work states that ensuring sufficient audio quality is a major stepping stone for realizing the potential of multicast conferencing, which allows groups of users to participate in real-time, simultaneous audio conferences and costs a fraction of the cost of other solutions.
Proceedings Article

Overcoming workstation scheduling problems in a real-time audio tool

TL;DR: This paper presents an architecture for a real-time audio media agent that copes with the problems presented by the UNIX operating system at the application level, and shows substantial reductions in both the average end-to-end delay, and the audio sample loss caused by the operating system.