What have the authors contributed in "Convex piecewise-linear fitting" ?

Q: What have the authors contributed in "Convex piecewise-linear fitting" ?

The authors consider the problem of fitting a convex piecewise-linear function, with some specified form, to given multi-dimensional data. The method the authors describe, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function. The authors focus on the simplest function form, a maximum of a fixed number of affine functions, and then show how the methods extend to a more general form.

(Open Access) Convex piecewise-linear fitting (2009) | Alessandro Magnani

Optim Eng (2009) 10: 1–17

DOI 10.1007/s11081-008-9045-3

Convex piecewise-linear ﬁtting

Alessandro Magnani · Stephen P. Boyd

Received: 14 April 2006 / Accepted: 4 March 2008 / Published online: 25 March 2008

Abstract We consider the problem of ﬁtting a convex piecewise-linear function, with

some speciﬁed form, to given multi-dimensional data. Except for a few special cases,

this problem is hard to solve exactly, so we focus on heuristic methods that ﬁnd

locally optimal ﬁts. The method we describe, which is a variation on the K-means

algorithm for clustering, seems to work well in practice, at least on data that can be

ﬁt well by a convex function. We focus on the simplest function form, a maximum of

aﬁxednumberofafﬁnefunctions,andthenshowhowthemethodsextendtoamore

general form.

Keywords Convex optimization · Piecewise-linear approximation · Data ﬁtting

1Convexpiecewise-linearﬁttingproblem

We consider the problem of ﬁtting some given data

), . . . , (u

) ∈ R

× R

with a convex piecewise-linear function f : R

→ R from some set F of candidate

functions. With a least-squares ﬁtting criterion, we obtain the problem

minimize J(f)=

i=1

(f (u

) − y

)

subject to f ∈ F ,

(1)

A. Magnani · S.P. Boyd (

)

Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA

e-mail: boyd@stanford.edu

A. Magnani

e-mail: alem@stanford.edu

2 A. Magnani, S.P. Boyd

with variable f .Wereferto(J (f )/m)

1/2

as the RMS (root-mean-square) ﬁt of the

function f to the data. The convex piecewise-linear ﬁtting problem (1)istoﬁndthe

function f ,fromthegivenfamilyF of convex piecewise-linear functions, that gives

the best (smallest) RMS ﬁt to the given data.

Our main interest is in the case when n (the dimension of the data) is relatively

small, say not more than 5 or so, while m (the number of data points) can be relatively

large, e.g., 10

or more. The methods we describe, however, work for any values of

n and m.

Several special cases of the convex piecewise-linear ﬁtting problem (1)canbe

solved exactly. When F consists of the afﬁne functions, i.e., f has the form f(x)=

x + b,theproblem(1)reducestoanordinarylinearleast-squaresprobleminthe

function parameters a ∈ R

and b ∈ R and so is readily solved. As a less trivial ex-

ample, consider the case when F consists of all piecewise-linear functions from R

into R,withnootherconstraintontheformoff .Thisisthenonparametric convex

piecewise-linear ﬁtting problem. Then the problem (1)canbesolved,exactly,viaa

quadratic program (QP); see (Boyd and Vandenberghe 2004,Sect.6.5.5).Thisnon-

parametric approach, however, has two potential practical disadvantages. First, the

QP that must be solved is very large (containing more than mn variables), limiting

the method to modest values of m (say, a thousand). The second potential disadvan-

tage is that the piecewise-linear function ﬁt obtained can be very complex, with many

terms (up to m).

Of course, not all data can be ﬁt well (i.e., with small RMS ﬁt) with a convex

piecewise-linear function. For example, if the data are samples from a function that

has strong negative (concave) curvature, then no convex function can ﬁt it well. More-

over, the best ﬁt (which will be poor) will be obtained with an afﬁne function. We can

also have the opposite situation: it can occur that the data can be perfectly ﬁt by an

afﬁne function, i.e., we can have J = 0. In this case we say that the data is interpo-

lated by the convex piecewise-linear function f .

1.1 Max-afﬁne functions

In this paper we consider the parametric ﬁtting problem, in which the candidate func-

tions are parametrized by a ﬁnite-dimensional vector of coefﬁcients α ∈ R

,where

p is the number of parameters needed to describe the candidate functions. One very

simple form is given by F

,thesetoffunctionsonR

with the form

f(x)= max{a

x + b

,...,a

x + b

}, (2)

i.e., a maximum of k afﬁne functions. We refer to a function of this form as ‘max-

afﬁne’, with k terms. The set F

is parametrized by the coefﬁcient vector

α = (a

,...,a

,...,b

) ∈ R

k(n+1)

In fact, any convex piecewise-linear function on R

can be expressed as a max-afﬁne

function, for some k,sothisformisinasenseuniversal.Ourinterest,however,is

in the case when the number of terms k is relatively small, say no more than 10 ,

or a few 10s. In this case the max-afﬁne representation (2)iscompact,inthesense

Convex piecewise-linear ﬁtting 3

that the number of parameters needed to describe f (i.e., p)ismuchsmallerthan

the number of parameters in the original data set (i.e., m(n + 1)). The methods we

describe, however, do not require k to be small.

When F = F

,theﬁttingproblem(1)reducestothenonlinearleast-squares

problem

minimize J(α)=

i=1

max

j=1,...,k

+ b

) − y

, (3)

with variables a

,...,a

∈ R

, b

,...,b

∈ R.ThefunctionJ is a piecewise-

quadratic function of α.Indeed,foreachi, f(u

) − y

is piecewise-linear, and J

is the sum of squares of these functions, so J is convex quadratic on the (polyhe-

dral) regions on which f(u

) is afﬁne. But J is not globally convex, so the ﬁtting

problem (3)isnotconvex.

1.2 A more general parametrization

We will also consider a more general parametrized form for convex piecewise-linear

functions,

f(x)= ψ(φ(x,α)), (4)

where ψ : R

→ R is a (ﬁxed) convex piecewise-linear function, and φ :

× R

→ R

is a (ﬁxed) bi-afﬁne function. (This means that for each x, φ(x, α)

is an afﬁne function of α,andforeachα, φ(x, α) is an afﬁne function of x.) The

simple max-afﬁne parametrization (2)hasthisform,withq = k, ψ(z

,...,z

) =

max{z

,...,z

},andφ

(x, α) = a

x + b

As an example, consider the set of functions F that are sums of k terms, each of

which is the maximum of two afﬁne functions,

f(x)=

i=1

max{a

x + b

x + d

}, (5)

parametrized by a

,...,a

,...,c

∈ R

and b

,...,b

,...,d

∈ R.This

family corresponds to the general form (4)with

ψ(z

,...,z

,...,w

) =

i=1

max{z

and

φ(x,α) = (a

x + b

,...,a

x + b

x + d

,...,c

x + d

Of course we can expand any function with the more general form (4)intoits

max-afﬁne representation. But the resulting max-afﬁne representation can be very

much larger than the original general form representation. For example, the function

form (5)requiresp = 2k(n + 1) parameters. If the same function is written out as

amax-afﬁnefunction,itrequires2

terms, and therefore 2

(n + 1) parameters. The

4 A. Magnani, S.P. Boyd

hope is that a well chosen general form can give us a more compact ﬁt to the given

data than a max-afﬁne form with the same number of parameters.

As another interesting example of the general form (4), consider the case in which

f is given as the optimal value of a linear program (LP) with the right-hand side of

the constraints depending bi-afﬁnely on x and the parameters:

f(x)= min{c

v | Av ≤ b + Bx}.

Here c and A are ﬁxed; b and B are considered the parameters that deﬁne f .This

function can be put in the general form (4)using

ψ(z) = min{c

v | Av ≤ z},φ(x,b,B)= b + Bx.

The function ψ is convex and piecewise-linear (see, e.g., Boyd and Vandenberghe

2004); the function φ is evidently bi-afﬁne in x and (b, B).

1.3 Dependent variable transformation and normalization

We can apply a nonsingular afﬁne transformation to the dependent variable u,by

forming

˜u

= Tu

+ s, i = 1,...,m,

where T ∈ R

n×n

is nonsingular and s ∈ R

.Deﬁning

f(˜x) = f(T

−1

(x − s)),we

have

f(˜u

) = f(u

).Iff is piecewise-linear and convex, then so is

f (and of course,

vice versa). Provided F is invariant under composition with afﬁne functions, the

problem of ﬁtting the data (u

) with a function f ∈ F is the same as the prob-

lem of ﬁtting the data ( ˜u

) with a function

f ∈ F .

This allows us to normalize the dependent variable data in various ways. For ex-

ample, we can assume that it has zero (sample) mean and unit (sample) covariance,

¯u = (1/m)

i=1

= 0,$

= (1/m)

i=1

= I, (6)

provided the data u

are afﬁnely independent. (If they are not, we can reduce the

problem to an equivalent one with smaller dimension.)

1.4 Outline

In Sect. 2 we describe several applications of convex piecewise-linear ﬁtting. In

Sect. 3,wedescribeabasicheuristicalgorithmfor(approximately)solvingthemax-

afﬁne ﬁtting problem (1). This basic algorithm has several shortcomings, such as

convergence to a poor local minimum, or failure to converge at all. By running this

algorithm a modest number of times, from different initial points, however, we obtain

afairlyreliablealgorithmforleast-squaresﬁttingofamax-afﬁnefunctiontogiven

data. Finally, we show how the algorithm can be extended to handle the more general

function parametrization (4). In Sect. 4 we present some numerical examples.

Convex piecewise-linear ﬁtting 5
1.5 Previous work
Piecewise-linear functions arise in many areas and contexts. Some general forms for
representing piecewise-linear functions can be found in, e.g., Kang and Chua, Kahlert
and Chua (1978, 1990). Several methods have been proposed for ﬁtting general
piecewise-linear functions to (multidimensional) data. A neural network algorithm is
used in Gothoskar et al. (2002); a Gauss-Newton method is used in Julian et al., Horst
and Beichel (1998, 1997)toﬁndpiecewise-linearapproximationsofsmoothfunc-
tions. A recent reference on methods for least-squares with semismooth functions is
Kanzow and Petra (2004). An iterative procedure, similar in spirit to our method,
is described in Ferrari-Trecate and Muselli (2002). Software for ﬁtting general
piecewise-linear functions to data include, e.g., Torrisi and Bemporad (2004), Storace
and De Feo (2002).
The special case n = 1, i.e., ﬁtting a function on R,byapiecewise-linearfunction
has been extensively studied. For example, a method for ﬁnding the minimum num-
ber of segments to achieve a given maximum error is described in Dunham (1986);
the same problem can be approached using dynamic programming (Goodrich 1994;
Bellman and Roth 1969;HakimiandSchmeichel1991;Wangetal.1993), or a ge-
netic algorithm (Pittman and Murthy 2000). The problem of simplifying a given
piecewise-linear function on R,toonewithfewersegments,isconsideredinImai
and Iri (1986).
Another related problem that has received much attention is the problem of ﬁtting
apiecewise-linearcurve,orpolygon,inR
2
to given data; see, e.g., Aggarwal et al.
(1985), Mitchell and Suri (1992). An iterative procedure, closely related to the k-
means algorithm and therefore similar in spirit to our method, is described in Phillips
and Rosenfeld (1988), Yin (1998).
Piecewise-linear functions and approximations have been used in many appli-
cations, such as detection of patterns in images (Rives et al. 1985), contour trac-
ing (Dobkin et al. 1990), extraction of straight lines in aerial images (Venkateswar
and Chellappa 1992), global optimization (Mangasarian et al. 2005), compression of
chemical process data (Bakshi and Stephanopoulos 1996), and circuit modeling (Ju-
lian et al. 1998;ChuaandDeng1986;Vandenbergheetal.1989).
We are aware of only two papers which consider the problem of ﬁtting a piecewise-
linear convex function to given data. Mangasarian et al. (2005)describeaheuristic
method for ﬁtting a piecewise-linear convex function of the form a + b
T
x +&Ax +
c&
1
to given data (along with the constraint that the function underestimate the data).
The focus of their paper is on ﬁnding piecewise-linear convex underestimators for
known (nonconvex) functions, for use in global optimization; our focus, in contrast,
is on simply ﬁtting some given data. The closest related work that we know of is Kim
et al. (2004). In this paper, Kim et al. describe a method for ﬁtting a (convex) max-
afﬁne function to given data, increasing the number of terms to get a better ﬁt. (In
fact they describe a method for ﬁtting a max-monomial function to circuit models;
see Sect. 2.3.)

Convex piecewise-linear fitting

Figures

Citations

Fast Model Predictive Control Using Online Optimization

A tutorial on geometric programming

Security-Constrained Unit Commitment With Linearized System Frequency Limit Constraints

Input Convex Neural Networks

Multivariate convex regression with adaptive partitioning

References

Techniques for improving the accuracy of geometric-programming based analog circuit design optimization

Compression of chemical process data by functional approximation and feature extraction

A New Learning Method for Piecewise Linear Regression

Fitting optimal piecewise linear functions using genetic algorithms

The generalized linear complementarity problem revisited

Related Papers (5)

Convex Optimization

A tutorial on geometric programming

Point Estimates of Ordinates of Concave Functions

Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function

Optimal design of a CMOS op-amp via geometric programming

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Convex piecewise-linear fitting" ?