What contributions have the authors mentioned in the paper "Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach" ?

Experience in many scientific and engineering fields has shown that it is essential to approach such illposed problems via “ regularization. ” This paper presents a new approach to traffic matrix estimation using a regularization based on “ entropy penalization. ” their solution chooses the traffic matrix consistent with the measured data that is information-theoretically closest to a model in which source/destination pairs are stochastically independent. The authors evaluate their algorithm with real backbone traffic and routing data, and demonstrate that it is fast, accurate, robust, and flexible.

What future works have the authors mentioned in the paper "Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach" ?

Other areas of future work include, understanding why the methods are so insensitive to the value of, and performing further validations of the method, on alternate data sets ( including different traffic patterns ). However, noting that all traffic from peering must go to access, and likewise, all traffic to peering links comes from access, and further that the four probabilities must add to one, the authors get Substituting into ( 37 ) they get ( 26 ).

(Open Access) Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach (2005) | Yin Zhang

IEEE - ACM Transactions on Networking, 2005; 13 (5):947-960

This material is posted here with permission of the IEEE. Such

permission of the IEEE does not in any way imply IEEE endorsement of

any of the University of Adelaide's products or services. Internal or

personal use of this material is permitted. However, permission to

reprint/republish this material for advertising or promotional purposes or

for creating new collective works for resale or redistribution must be

obtained from the IEEE by writing to pubs-permissions@ieee.org.

By choosing to view this document, you agree to all provisions of the

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005 947

Estimating Point-to-Point and Point-to-Multipoint

Trafﬁc Matrices: An Information-Theoretic Approach

Yin Zhang, Member, IEEE, Matthew Roughan, Member, IEEE, Carsten Lund, and David L. Donoho, Member, IEEE

Abstract—Trafﬁc matrices are required inputs for many IP net-

work management tasks, such as capacity planning, trafﬁc engi-

neering, and network reliability analysis. However, it is difﬁcult to

measure these matrices directly in large operational IP networks,

so there has been recent interest in inferring trafﬁc matrices from

link measurements and other more easily measured data. Typi-

cally, this inference problem is ill-posed, as it involves signiﬁcantly

more unknowns than data. Experience in many scientiﬁc and en-

gineering ﬁelds has shown that it is essential to approach such ill-

posed problems via “regularization.” This paper presents a new

approach to trafﬁc matrix estimation using a regularization based

on “entropy penalization.” Our solution chooses the trafﬁc matrix

consistent with the measured data that is information-theoretically

closest to a model in which source/destination pairs are stochasti-

cally independent. It applies to both point-to-point and point-to-

multipoint trafﬁc matrix estimation. We use fast algorithms based

on modern convex optimization theory to solve for our trafﬁc ma-

trices. We evaluate our algorithm with real backbone trafﬁc and

routing data, and demonstrate that it is fast, accurate, robust, and

ﬂexible.

Index Terms—Failure analysis, information theory, minimum

mutual information, point-to-multipoint, point-to-point, regular-

ization, SNMP, trafﬁc engineering, trafﬁc matrix estimation.

I. INTRODUCTION

RAFFIC matrices, which specify the amount of trafﬁc be-

tween origin and destination in a network, are required in-

puts for many IP network management tasks, such as capacity

planning, trafﬁc engineering and network reliability analysis.

However, it is often difﬁcult to measure these matrices directly

in large operational IP networks. So there has been a surge of

interest in inferring trafﬁc matrices from link load statistics and

other more easily measured data [1]–[5].

Trafﬁc matrices may be estimated or measured at varying

levels of detail [6]: between Points-of-Presence (PoPs) [4],

routers [5], links, or even IP preﬁxes [7]. The ﬁner grained

trafﬁc matrices are generally more useful, for example, in the

analysis of the reliability of a network under a component

failure. During a failure, IP trafﬁc is rerouted to ﬁnd the new

Manuscript received March 13, 2004; revised November 11, 2004; approved

by IEEE/ACM T

RANSACTIONS ON NETWORKING Editor J. Crowcroft. An ear-

lier version of this paper appeared in the Proceedings of the ACM SIGCOMM,

August 2003.

Y. Zhang was with AT&T Labs-Research, Florham Park, NJ, 07932 USA.

He is now with the Department of Computer Sciences, University of Texas at

Austin, Austin, TX 78712-0233 USA (e-mail: yzhang@cs.utexas.edu).

M. Roughan is with the School of Mathematical Sciences, University

of Adelaide, Adelaide, SA 5005, Australia (e-mail: matthew.roughan@

adelaide.edu.au).

C. Lund is with AT&T Labs-Research, Florham Park, NJ 07932-0971 USA

(e-mail: lund@research.att.com).

D. Donoho is with the Statistics Department, Stanford University, Stanford,

CA 94305 USA (e-mail: donoho@stat.stanford.edu).

Digital Object Identiﬁer 10.1109/TNET.2005.857115

path through the network, and one wishes to test if this would

cause a link overload anywhere in the network. Failure of a

link within a PoP may cause trafﬁc to reroute via alternate links

within the PoP without changing the inter-PoP routing. Thus, to

understand failure loads on the network we must measure trafﬁc

at a router-to-router level. In general, the inference problem

is more challenging at ﬁner levels of detail, the ﬁnest so far

considered being router-to-router.

Estimating trafﬁc matrices from link loads is a nontrivial task.

The challenge lies in the ill-posed nature of the problem: for a

network with

ingress/egress points we need to estimate the

origin/destination demands. At a PoP level is in the tens,

at a router level may be in the hundreds, at a link level

may be tens of thousands, and at the preﬁx level may be

of the order of one hundred thousand. However, the number of

pieces of information available, the link measurements, remains

approximately constant. One can see the difﬁculty—for large

the problem becomes massively underconstrained.

There is extensive experience with ill-posed linear inverse

problems from ﬁelds as diverse as seismology, astronomy, and

medical imaging [8]–[12], all leading to the conclusion that

some sort of side information must be brought in, with results

that may be good or bad depending on the quality of this infor-

mation. All of the previous work on IP trafﬁc matrix estimation

has incorporated prior information: for instance, Vardi [1] and

Tebaldi and West [2] assume a Poisson trafﬁc model, Cao

et al.

[3] assume a Gaussian trafﬁc model, Zhang et al. [5] assume

an underlying gravity model, and Medina et al. [4] assume a

logit-choice model. Each method is sensitive to the accuracy of

this prior: for instance, [4] showed that the methods in [1]–[3]

were sensitive to their prior assumptions, while [5] showed that

their method improved if the prior (the so-called gravity model)

was generalized to reﬂect real routing rules more accurately.

In contrast, this paper starts from a regularization formula-

tion of the problem drawn from the ﬁeld of ill-posed problems,

and derives a prior distribution that is most appropriate to

this problem. Our prior assumes source/destination indepen-

dence, until proven otherwise by measurements. The method

then blends measurements with prior information, producing

the reconstruction closest to independence, but consistent

with the measured data. The method proceeds by solving an

optimization problem that is understandable and intuitively

appealing. This approach allows a convenient implementation

using modern optimization software, with the result that the

algorithm is very efﬁcient.

An advantage of the approach used in this paper is that it

also provides some insight into alternative algorithms. For in-

stance, the simple gravity model of [5] is equivalent to com-

plete independence of source and destination, while the general-

ized gravity model corresponds to independence conditional on

source and destination link classes. Furthermore, the algorithm

of [5] is a ﬁrst-order approximation of the algorithm presented

948 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005

here, explaining the success of that algorithm, and suggesting

that it also can be extended to measure point-to-multipoint de-

mand matrices. Our method opens up further opportunities for

extensions, given the better understanding of the importance of

prior information about network trafﬁc and how it can be incor-

porated into the process of ﬁnding trafﬁc matrices. For instance,

an appealing alternative prior generation procedure is suggested

in [4]. Alternatively, the Bayesian method of [2] can be placed

into the optimization framework here, with a different penalty

function, as could the methods of [1], [3].

Our approach also allows us to estimate both point-to-point

trafﬁc matrices and point-to-multipoint demand matrices. Prior

work on estimating trafﬁc matrices from link data has concen-

trated on the point-to-point trafﬁc, i.e., the trafﬁc from a single

source to a single destination. While point-to-point trafﬁc ma-

trices are of great practical importance, they are not always

enough for applications (as shown in [7]). Under some fail-

ures the trafﬁc may actually change its origin and destination;

its network entry and exit points. The point-to-point trafﬁc ma-

trix will be altered, because the point-to-point trafﬁc matrix de-

scribes the “carried” load on the network between two points.

In contrast, the

demand matrix describes the “offered” trafﬁc

demands on the IP network and is therefore invariant under a

much larger class of changes. The demand matrix is inherently

point-to-multipoint in the sense that trafﬁc coming into the net-

work from a customer, may often depart the network via mul-

tiple egress points in order to reach its ﬁnal destination. To un-

derstand this, consider a packet entering a backbone ISP through

a customer link, destined for another backbone ISP’s customer.

Large North-American backbone providers typically are con-

nected at multiple peering points. Our packet could reach its

ﬁnal destination through any of these peering links; the actual

decision is made through a combination of Border Gateway Pro-

tocol (BGP) and Interior Gateway Protocol (IGP) routing pro-

tocols. If the normal exit link fails, then the routing protocols

would choose a different exit point. In a more complicated sce-

nario, the recipient of the packet might be multi-homed—that is,

connected to more than one ISP. In this case the packet may exit

the ﬁrst ISP through multiple sets of peering links. Finally, even

single homed customers may sometimes be reached through

multiple inter-AS (Autonomous System) paths.

We test the estimation algorithm extensively on network

trafﬁc and topology data from an operational backbone ISP

(AT&T’s North American IP network). The results show that

the algorithm is fast, and accurate for point-to-point trafﬁc

matrix estimation. We also test the algorithm on topologies

generated through the Rocketfuel project [13]–[15] to resemble

alternative ISPs, providing useful insight into where the algo-

rithm will work well. One interesting side result is that there

is a relationship between the network trafﬁc and topology that

is beneﬁcial in this estimation problem. We also test the sensi-

tivity of the algorithm to measurements errors, demonstrating

that the algorithm is highly robust to errors and missing data in

the trafﬁc measurements.

We further examine some alternative measurement strategies

that could beneﬁt our estimates. We examine two possibilities:

the ﬁrst (suggested in [4]) is to make direct measurements of

some rows of the trafﬁc matrix, the second is to measure local

trafﬁc matrices as suggested in [16]. Both result in improve-

ments in accuracy, however, we found in contrast to [4] that

the order in which rows of the trafﬁc matrix are included does

matter—adding rows in order of the largest row sum ﬁrst is

better than random ordering.

Finally, the results of our evaluation of the algorithm for

point-to-multipoint demand matrices are interesting in that

these estimates are less accurate than the corresponding

point-to-point results, for the very good reason that this esti-

mation problem contains more ambiguity. However, we also

show in this paper that the results are far more accurate (than

point-to-point results) when used in real applications such as

link failure analysis. In fact, the point-to-multipoint estimates

produce astoundingly accurate link failure estimates. Likewise,

in [17], we have also demonstrated that the resulting accuracy

is well within the bounds required for another operational task,

IGP route optimization.

To summarize, this paper demonstrates a speciﬁc tool that

works well on large scale point-to-point and point-to-multipoint

trafﬁc matrix estimation. The results show that it is important

to add appropriate prior information. Our prior information

is based on independence-until-proven-otherwise, which is

plausible, computationally convenient, and results in accurate

estimates.

The paper begins in Section II with some background:

deﬁnitions of terminology and descriptions of the types of data

available. Section III describes the regularization approach

used here, and our algorithm, followed by Section IV, the

evaluation methodology, and Section V, which shows the al-

gorithm’s performance on a large set of measurements from

an operational tier-1 ISP. Section VI, examines the algorithm’s

robustness to errors in its inputs, and Section VII shows the

ﬂexibility of the algorithm to incorporate additional informa-

tion. Section VIII shows the results for point-to-multipoint

estimation, and Section IX demonstrates the utility of the

point-to-multipoint results in reliability analysis. We conclude

the paper in Section X.

II. B

ACKGROUND

A. Network

An IP network is made up of routers and adjacencies between

those routers, within a single AS or administrative domain. It is

natural to think of the network as a set of nodes and links, as-

sociated with the routers and adjacencies, as shown in Fig. 1.

We refer to routers and links that are wholly internal to the net-

work as Backbone Routers (BRs) and links, and refer to others

as Edge Routers (ERs) and links.

One could compute trafﬁc matrices with different levels of ag-

gregation at the source and destination end-points, for instance,

at the level of PoP to PoP, or router to router, or link to link [6].

In this paper, we are primarily interested in computing router

to router trafﬁc matrices, which are appropriate for a number of

network and trafﬁc engineering applications, and can be used

to construct more highly aggregated trafﬁc matrices (e.g., PoP

to PoP) using topology information [6]. We may further specify

the trafﬁc matrix to be between BRs, by aggregating up to this

level.

In addition, it is helpful for IP networks managed by In-

ternet Service Providers (ISPs) to further classify the edge links.

We categorize the edge links into access links, connecting cus-

tomers, and peering links, which connect other (noncustomer)

ASs. A signiﬁcant fraction of the trafﬁc in an ISP is inter-do-

main and is exchanged between customers and peer networks.

Today trafﬁc to peer networks is largely focused on dedicated

peering links, as illustrated in Fig. 1. Under the typical routing

policies implemented by large ISPs, very little trafﬁc will transit

the backbone from one peer to another. Transit trafﬁc between

ZHANG et al.: ESTIMATING POINT-TO-POINT AND POINT-TO-MULTIPOINT TRAFFIC MATRICES: AN INFORMATION-THEORETIC APPROACH 949

Fig. 1. IP network components and terminology.

peers may reﬂect a temporary step in network consolidation fol-

lowing an ISP merger or acquisition, but should not occur under

normal operations.

In large IP networks, distributed routing protocols are used to

build the forwarding tables within each router. It is possible to

predict the results of these distributed computations from data

gathered from router conﬁguration ﬁles, or a route monitor such

as [18]. In our investigation, we employ a routing simulator such

as in [19] that makes use of this routing information to compute

a routing matrix (deﬁned in Section III-A). Note that this simu-

lation includes load balancing across multiple shortest paths.

B. Trafﬁc Data

In IP networks today, link load measurements are readily

available via the Simple Network Management Protocol

(SNMP). SNMP is unique in that it is supported by essentially

every device in an IP network. The SNMP data that is available

on a device is deﬁned in a abstract data structure known as

a Management Information Base (MIB). An SNMP poller

periodically requests the appropriate SNMP MIB data from a

router (or other device). Since every router maintains a cyclic

counter of the number of bytes transmitted and received on

each of its interfaces, we can obtain basic trafﬁc statistics for

the entire network with little additional infrastructure.

The properties of data gathered via SNMP are important for

the implementation of a useful algorithm SNMP data has many

limitations. Data may be lost in transit (SNMP uses unreliable

UDP transport; copying to our research archive may also intro-

duce loss). Data may be incorrect (through poor router vendor

implementations). The sampling interval is coarse (in our case

5 minutes). Many of the typical problems in SNMP data may

be mitigated by using hourly trafﬁc averages (of ﬁve minute

data), and we shall use this approach. The problems with the

ﬁner time-scale data make time-series approaches to trafﬁc ma-

trix estimation more difﬁcult.

We use ﬂow level data in this paper for validation purposes.

This data is aggregated by IP source and destination address, and

port numbers at each router. This level of granularity is sufﬁcient

to obtain a real trafﬁc matrix [7], and in the future such mea-

surement may provide direct trafﬁc matrix measurements, but

at present limitations in vendor implementations prevent collec-

tion of this data from the entire network.

C. Information Theory

Information theory is of course a standard tool in communica-

tions systems [20], but a brief review will set up our terminology.

We begin with basic probabilistic notation: we deﬁne

mean the probability that a random variable

is equal to .We

shall typically abuse this notation (where it is clear) and simply

write

. Suppose that and are independent

random variables, then

(1)

i.e., the joint distribution is the product of its marginals. This

can be equivalently written using the conditional probability

(2)

In this paper we shall typically use the source

and the desti-

nation

of a packet (or bit), rather than the standard random

variables and . Thus, is the conditional probability

of a packet (bit) exiting the network at

, given that it en-

tered at

, and is the unconditional probability of a

packet (bit) going to .

We can now deﬁne the Discrete Shannon Entropy of a discrete

random variable

taking values as

(3)

The entropy is a measure of the uncertainty about the value of

. For instance, if with certainty, then ,

and

takes its maximum value when is uniformly dis-

tributed, when the uncertainty about its value is greatest.

We can also deﬁne the conditional entropy of one random

variable

with respect to another by

(4)

where

is the probability that conditional on

. can be thought of as the uncertainty re-

maining about

given that we know the outcome of . Notice

that the joint entropy of and can be shown to be

(5)

We can also deﬁne the Shannon information

(6)

which therefore represents the decrease in uncertainty about

from measurement of , or the information that we gain about

from . The information is symmetric,

and so we can refer to this as the mutual information of and

, and write as . Note that , with equality

if and only if and are independent—when and are

independent

gives us no additional information about . The

mutual information can be written in a number of ways, but here

we write it

(7)

where

is the Kullback–Leibler

divergence of

with respect to , a well-known measure of

distance between probability distributions.

Discrete Entropy is frequently used in coding because the en-

tropy

gives a measure of the number of bits required to

code the values of . That is, if we had a large number of

randomly-generated instances

and needed to

represent this stream as compactly as possible, we could repre-

sent this stream using only

bits, using entropy coding

as practiced, for example, in various standard commercial com-

pression schemes.

Entropy has also been advocated as a tool in the estimation of

probabilities. Simply put, the maximum entropy principle states

that we should estimate an unknown probability distribution

950 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005

by enumerating all the constraints we know it must obey on

‘physical’ grounds, and searching for the probability distribu-

tion that maximizes the entropy subject to those constraints. It is

well known that the probability distributions occurring in many

physical situations can be obtained by the maximum entropy

principle. Heuristically, if we had no prior information about a

random variable

, our uncertainty about is at its peak, and

therefore we should choose a distribution for which maxi-

mizes this uncertainty, or the entropy. In the case where we do

have information about the variable, usually in the form of some

set of mathematical constraints

, then the principle states that

we should maximize the entropy

of conditional

on consistency with these constraints. That is, we choose the

solution which maintains the most uncertainty while satisfying

the constraints. The principle can also be derived directly from

some simple axioms which we wish the solution to obey [21].

D. Ill-Posed Linear Inverse Problems

Many scientiﬁc and engineering problems can be posed as

follows. We observe data

which are thought to follow a system

of linear equations

(8)

where the

by 1 vector contains the data, and the by 1

vector

contains unknowns to be estimated. The matrix is

an by matrix. In many cases of interest , and so

there is no unique solution to the equations. Such problems are

called ill-posed linear inverse problems. In addition, frequently

the data are noisy, so that it is more accurate to write

(9)

In that case any reconstruction procedure needs to remain stable

under perturbations of the observations. In our case,

are the

SNMP link measurements,

is the trafﬁc matrix written as a

vector, and is the routing matrix.

There is extensive experience with ill-posed linear inverse

problems from ﬁelds as diverse as seismology, astronomy, and

medical imaging [8]–[12], all leading to the conclusion that

some sort of side information must be brought in, producing

a reconstruction which may be good or bad depending on the

quality of the prior information. Many such proposals solve the

minimization problem

(10)

where

denotes the norm, is a regularization

parameter, and

is a penalization functional. Proposals of

this kind have been used in a wide range of ﬁelds, with consider-

able practical and theoretical success when the data matched the

assumptions leading to the method, and the regularization func-

tional matched the properties of the estimand. These are gen-

erally called strategies for regularization of ill-posed problems

(for a more general description of regularization see [22]).

A general approach to deriving such regularization ideas is

the Bayesian approach (such as used in [2]), where we model the

estimand

as being drawn at random from a so-called ‘prior‘

probability distribution with density

and the noise is

taken as a Gaussian white noise with variance

. Then the

so-called posterior probability density has its maximum

at the solution of

(11)

Comparing this with (10) we see the penalized least-squares

problems as giving the most likely reconstructions under a given

model. Thus, the method of regularization has a Bayesian in-

terpretation, assuming Gaussian noise and assuming

. We stress that there should be a good match between

the regularization functional

and the properties of the esti-

mand—that is, a good choice of prior distribution. The penal-

ization in (10) may be thought of as expressing the fact that re-

constructions are very implausible if they have large values of

Regularization can help us understand approaches such as

that of Vardi [1] and Cao et al. [3], which treat this as a max-

imum likelihood problem where the

are independent random

variables following a particular model. In these cases they use

the model to form a penalty function which measures the dis-

tance from the model by considering higher order moments of

the distributions.

III. R

EGULARIZATION OF THE

TRAFFIC ESTIMATION

PROBLEM

USING MINIMUM

MUTUAL

INFORMATION

The problem of inference of the end-to-end trafﬁc matrix is

massively ill-posed because there are so many more routes than

links in a network. In this section, we develop a regularization

approach using a penalty that seems well-adapted to the struc-

ture of actual trafﬁc matrices, and which has some appealing in-

formation-theoretic structure. Effectively, among all trafﬁc ma-

trices agreeing with the link measurements, we choose the one

that minimizes the mutual information between the source and

destination random variables.

Under this criterion, absent any information to the contrary,

we assume that the conditional probability

that a source

sends trafﬁc to a destination is the same as , the proba-

bility that the network as a whole sends packets or bytes to des-

tination

. There are strong heuristic reasons why the largest-

volume links in the network should obey this principle—they

are so highly aggregated that they intuitively should behave sim-

ilarly to the network as a whole.

On the other hand, as evidence accumulates in the link-level

statistics, the conditional probabilities are adapted to be consis-

tent with the link-level statistics in such a way as to minimize the

mutual information between the source and destination random

variables.

This Minimum Mutual Information (MMI) criterion is

well-suited to efﬁcient computation. It can be implemented

as a convex optimization problem; in effect one simply adds

a minimum weighted entropy term to the usual least-squares

lack of ﬁt criterion. There are several widely-available software

packages for solving this optimization problem, even on very

large scale problems; some of these packages can take advan-

tages of the sparsity of routing matrices.

A. Trafﬁc-Matrix Estimation

Let

denote the trafﬁc volume going from source to

destination

in a unit time. Note that is unknown to us;

what can be known is the trafﬁc

on link . Let

denote the routing matrix, i.e., gives the fraction of

trafﬁc from

to which crosses link (and which is zero if the

trafﬁc on this route does not use this link at all). The link-level

trafﬁc counts are

(12)

where

is the set of backbone links. We would like to recover

the trafﬁc matrix

from the link measurements ,but

this is the same as solving the matrix equation (8), where

Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach

Figures

Citations

Bayesian Neural Networks for Internet Traffic Classification

Spatio-temporal compressive sensing and internet traffic matrices

Spatio-Temporal Compressive Sensing and Internet Traffic Matrices (Extended Version)

Understanding internet topology: principles, models, and validation

Detection and Localization of Network Black Holes

References

A and V.

Atomic Decomposition by Basis Pursuit

Atomic Decomposition by Basis Pursuit

Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion

REGULARIZATION TOOLS: A Matlab package for analysis and solution of discrete ill-posed problems

Related Papers (5)

Fast accurate computation of large-scale IP traffic matrices from link loads

Traffic matrix estimation: existing techniques and new directions

Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data

Structural analysis of network traffic flows

Deriving traffic demands for operational IP networks: methodology and experience

Frequently Asked Questions (2)

Q1. What contributions have the authors mentioned in the paper "Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach" ?

Q2. What future works have the authors mentioned in the paper "Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach" ?