Information-based complexity

doi:10.1038/328029A0

BULLETIN (New Series) OF THE

AMERICAN MATHEMATICAL SOCIETY

Volume 26, Number 1, January 1992

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY

J. F. TRAUB AND H. WOZNIAKOWSKI

1. Introduction

Computational complexity studies the intrinsic difficulty of mathematically

posed problems and seeks optimal means for their solutions. This is a rich

and diverse field; for the purpose of this paper we present a greatly simplified

picture.

Computational complexity may be divided into two branches, discrete and

continuous. Discrete computational complexity studies problems such as graph

theoretic, routing, and discrete optimization; see, for example, Garey and John-

son [79]. Continuous computational complexity studies problems such as ordi-

nary and partial differential equations, multivariate integration, matrix multi-

plication, and systems of polynomial equations. Discrete computational com-

plexity often uses the Turing machine model whereas continuous computational

complexity tends to use the real number model.

Continuous computational complexity may again be split into two branches.

The first deals with problems for which the information is complete. Problems

where the information may be complete are those for which the input is specified

by a finite number of parameters. Examples include linear algebraic systems,

matrix multiplication, and systems of polynomial equations. Recently, Blum,

Shub and Smale [89] obtained the first NP-completeness result over the reals

for a problem with complete information.

The other branch of continuous computational complexity is information-

based complexity, which is denoted for brevity as IBC. Typically, IBC studies

infinite-dimensional problems. These are problems where either the input or

the output are elements of infinite-dimensional spaces. Since digital comput-

ers can handle only finite sets of numbers, infinite-dimensional objects such

as functions on the reals must be replaced by finite sets of numbers. Thus,

complete information is not available about such objects. Only partial infor-

mation is available when solving an infinite-dimensional problem on a digital

computer. Typically, information is contaminated with errors such as round-off

error, measurement error, and human error. Thus, the available information is

partial and/or contaminated.

We want to emphasize this point for it is central to IBC. Since only partial

and/or contaminated information is available, we can solve the original problem

only approximately. The goal of IBC is to compute such an approximation as

inexpensively as possible.

In Figure 1 (see p. 30) we schematize the structure of computational com-

plexity described above.

Received by the editors April, 1991.

1991 Mathematics Subject Classification. Primary 68Q25.

This research was supported in part by the National Science Foundation.

29

0273-0979/92 $1.00+ $.25 per page

30 J. F. TRAUB AND H. WOZNIAKOWSKI

Computational Complexity

discrete continuous

partial information (IBC) complete information

Figure 1

Research in the spirit of IBC was initiated in the Soviet Union by Kol-

mogorov in the late 1940s. Nikolskij [50], then a graduate student of Kol-

mogorov, studied optimal quadrature. This line of research was greatly ad-

vanced by Bakhvalov; see, e.g., Bakhvalov [59, 64, 71]. In the United States

research in the spirit of IBC was initiated by Sard [49] and Kiefer [53]. Kiefer

reported the results of his 1948 MIT Master's Thesis that Fibonacci sampling

is optimal when approximating the maximum of a unimodal function. Sard

studied optimal quadrature. Golomb and Weinberger [59] studied optimal ap-

proximation of linear functionals. Schoenberg [64] realized the close connection

between splines and algorithms optimal in the sense of Sard.

IBC is formulated as an abstract theory and it has applications in numerous

areas. The reader may consult TWW [88] ' for some of the applications. IBC

has benefitted from research in many fields. Influential have been questions,

concepts, and results from complexity theory, algorithmic analysis, applied

mathematics, numerical analysis, statistics, and the theory of approximation

(particularly the work on «-widths and splines).

In this paper we discuss, in particular, IBC research for two problems of

numerical analysis. We first contrast IBC and numerical analysis, limiting our-

selves to just one characteristic of each.

IBC is a branch of computational complexity, and optimal (or almost opti-

mal) information and algorithms are obtained from the theory. In numerical

analysis, particular classes of algorithms are carefully analyzed to see if they

satisfy certain criteria such as convergence, error bounds, efficiency, and stabil-

ity.

Numerical analysis and IBC have different views on the problems which lie

in their common domain. The authors of this paper have worked in both nu-

merical analysis and IBC, and believe the viewpoints are not right or wrong,

just different.

On the other hand, in many research groups around the world, people work on

both numerical analysis and IBC, and do not draw a sharp distinction between

the two. They believe IBC can serve as part of the theoretical foundation of

numerical analysis.

We believe there might be some profit in discussing the views of numerical

analysis and IBC. Unfortunately Parlett [92]2 does not serve this purpose since,

as we shall show, this paper ignores relevant literature and is mistaken on issues

of complexity theory.

1 When one of us is a coauthor, the citation will be made using only initials.

2Citation to this paper will be made using only an initial.

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY 31

For example, P [92] contains a central misconception about IBC which im-

mediately invalidates large portions of the paper. P [92] assumes that the in-

formation is specified (or fixed). Indeed, the first "high level criticism" is that

IBC "is not complexity theory" (see P [92, 2.A]), since "specified information"

is used.

But it is the very essence of IBC that both the information and the algorithms

are varied. Indeed, one of the central problems of IBC is the optimal choice of

information. Significant portions of three monographs, TW [80] and TWW [83,

88], all of which are cited in P [92], are devoted to this issue. We return to this

issue in §3 after notation has been established.

In P [92], the author limits himself to "matrix computations, which is the

area we understand best." We do not object to discussing matrix computations,

although they constitute a small fraction and are atypical of IBC. For example, in

the recent monograph TWW [88], some ten pages, just 2%, are devoted to matrix

computations. Matrix computations are atypical since complete information

can be obtained at finite cost. However, even in this particular area, P [92]

ignores relevant literature and does not exhibit a grasp of the complexity issues.

Since the discussion will, of necessity, assume some rather technical details

concerning matrix computations, we will defer it to §§5 and 6.

We stress that we are not questioning the importance of matrix computations.

On the contrary, they play a central role in scientific computation. Furthermore,

we believe there are some nice results and deep open questions regarding matrix

computations in IBC.

But the real issue is, after all, IBC in its entirety. P [92] is merely using the

two papers TW [84] and Kuczyñski [86] on matrix computations to criticize all

of IBC. We therefore respond to general criticisms in §§3 and 4.

To make this paper self-contained we briefly summarize the basic concepts

of IBC in §2. Section 7 deals with possible refinements of IBC. A summary of

our rebuttal to criticisms in P [92] is presented in §8.

2. Outline of IBC

In this section we introduce the basic concepts of IBC and define the notation

which will be used for the remainder of this paper. We illustrate the concepts

with the example of multivariate integration, a typical application of IBC. A

more detailed account may be found in TWW [88]. Expository material may

be found in W [85], PT [87], PW [87], and TW [91]. Let

S:F -» G,

where F is a subset of a linear space and G is a normed linear space. We wish

to compute an approximation to S(f) for all / from F .

Typically, / is an element from an infinite-dimensional space and it cannot

be represented on a digital computer. We therefore assume that only partial

information3 about / is available. We gather this partial information about

/ by computing information operations L(f), where L e A. Here the class

A denotes a collection of information operations that may be computed. We

illustrate these concepts by an example.

3For simplicity, we will not consider contaminated information in this paper.

32 J. F. TRAUB AND H. WOZNIAKOWSKI

Example: Multivariate integration. Let F be a unit ball of the Sobolev class

Wp'd of real functions defined on the ¿-dimensional cube D = [0, l]d whose

r th distributional derivatives exist and are bounded in Lp norm. Let G = E

and

S(f)= [ f(t)dt.

Jd

Assume pr > d. To approximate S(f), we assume we can compute only

function values. That is, the class A is a collection of L : F —> R, such that

for some x from D, L(f) = f{x), V/ 6 F . D

For each / e F , we compute a number of information operations from the

class A. Let

N(f) = [Liif), L2(f),..., L„(f)], Li G A,

be the computed information about /. We stress that the L, as well as the num-

ber n can be chosen adaptively. That is, the choice of L, may depend on the al-

ready computed L\{f), L2(f), ... , L,_i(/). The number n may also depend

on the computed L¡(f). (This permits arbitrary termination criteria.)

N(f) is called the information about /, and N the information operator.

In general, N is many-to-one, and that is why it is impossible to recover the

element /, knowing y = N(f) for f e F . For this reason, the information

N is called partial.

Having computed N(f), we approximate S(f) by an element U(f) =

4>(N(f)), where 4>: N(F) -» G. A mapping <f> is called an algorithm.

The definition of error of the approximation U depends on the setting. We

restrict ourselves here to only two settings. In the worst case setting

e(U) = SW\\S(f)-U(f)\\,

feF

and in the average case setting, given a probability measure fi on F,

e(U)=(J \\S(f)-U(f)\\2/i(df)

Example (continued). The information is given by

N(f) = [f(xl),f(x2),...,f(xn)]

with the points x¡ and the number n adaptively chosen. An example of an

algorithm is a linear algorithm given by U(f) = <f>(N(f)) = Y!¡=\ aif(xô f°r

some numbers a,.

In the worst case setting, the error is defined as the maximal distance \S(f)-

U{f)\ in the set F. In the average case setting, the error is the L2 mean of

\S(f) - U(f)\ with respect to the probability measure ß. The measure fi is

sometimes taken as a truncated Gaussian measure. D

To define the computational complexity we need a model of computation. It

is defined by two assumptions:

( 1 ) We are charged for each information operation. That is, for every L e

A and for every / e F , the computation of L{f) costs c, where c is

positive and fixed, independent of L and /.

'/-

PERSPECTIVES ON INFORMATION-BASED COMPLEXITY 33

(2) Let Q denote the set of permissible combinatory operations including

the addition of two elements in G, multiplication by a scalar in G,

arithmetic operations, comparison of real numbers, and evaluations of

certain elementary functions. We assume that each combinatory oper-

ation is performed exactly with unit cost.

In particular, this means that we use the real number model, where we can

perform operations on real numbers exactly and at unit cost. Modulo roundoffs

and the very important concept of numerical stability, this corresponds to float-

ing point arithmetic widely used for solving scientific computational problems.

We now define the cost of the approximations. Let cost(A/, /) denote the

cost of computing the information N(f ). Note that cost( N, f) > en , and the

inequality may occur since adaptive selection of L, and n may require some

combinatory operations.

Knowing y = N(f), we compute U(f) = <f>(y) by combining the informa-

tion Li(f). Let cost((f),y) denote the number of combinatory operations from

ñ needed to compute 4>{y). We stress that cost{N,f) or cost(4>,y) maybe

equal to infinity if N(f) or <f>{y) use an operation outside fí or infinitely

many operations from A or Q, respectively.

The cost of computing U(f), cost (U, /), is given by

cost(U,f) = cost(N,f)+cost(<t>,N(f)).

Depending on the setting, the cost of U is defined as follows. In the worst case

setting

cost(i/) = sup cost(i7, /),

feF

and in the average case setting

cost(C7) = / cos\(U,f)/i(df).

We are ready to define the basic notion of e-complexity. The e-complexity

is defined as the minimal cost among all U with error at most e,

comp(e) = inf{cost(C/): U such that e(U) < e}.

(Here we use the convention that the infimum of the empty set is taken to be

infinity.) Depending on the setting, this defines the worst case or average case

e-complexity.

We stress that we take the infimum over all possible U for which the error

does not exceed e . Since U can be identified with the pair ( N, <f>), where N is

the information and </> is the algorithm that uses that information, this means

that we take the infimum over all information N consisting of information

operations from the class A, and over all algorithms <j> that use N such that

(N, 4>) computes approximations with error at most e .

Remark. The complexity depends on the set A of permissible information op-

erations and on the set Í2 of permissible combinatory operations. Both sets

are necessary to define the complexity of a problem. This is beneficial because

the dependence of complexity on A and Q enriches the theory; it enables us

to study the power of specified information or combinatory operations. We

illustrate the role of A and Q by a number of examples.

Information-based complexity

Citations

De-noising by soft-thresholding

Wavelet Shrinkage: Asymptopia?

Complexity and Real Computation

Compressed sensing and best k-term approximation

Numerical integration using sparse grids

References

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

Problem complexity and method efficiency in optimization

On a theory of computation and complexity over the real numbers: $NP$- completeness, recursive functions and universal machines

Sequential minimax search for a maximum

Information-Based Complexity

Related Papers (5)

Deterministic and Stochastic Error Bounds in Numerical Analysis

A General Theory of Optimal Algorithms

Tractability of Multivariate Problems

When Are Quasi-Monte Carlo Algorithms Efficient for High Dimensional Integrals?

Theory of Reproducing Kernels.