scispace - formally typeset
Open AccessJournal ArticleDOI

Convex piecewise-linear fitting

Alessandro Magnani, +1 more
- 01 Mar 2009 - 
- Vol. 10, Iss: 1, pp 1-17
Reads0
Chats0
TLDR
The method described, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function.
Abstract
We consider the problem of fitting a convex piecewise-linear function, with some specified form, to given multi-dimensional data. Except for a few special cases, this problem is hard to solve exactly, so we focus on heuristic methods that find locally optimal fits. The method we describe, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function. We focus on the simplest function form, a maximum of a fixed number of affine functions, and then show how the methods extend to a more general form.

read more

Content maybe subject to copyright    Report

Optim Eng (2009) 10: 1–17
DOI 10.1007/s11081-008-9045-3
Convex piecewise-linear fitting
Alessandro Magnani · Stephen P. Boyd
Received: 14 April 2006 / Accepted: 4 March 2008 / Published online: 25 March 2008
© Springer Science+Business Media, LLC 2008
Abstract We consider the problem of tting a convex piecewise-linear function, with
some specified form, to given multi-dimensional data. Except for a few special cases,
this problem is hard to solve exactly, so we focus on heuristic methods that find
locally optimal fits. The method we describe, which is a variation on the K-means
algorithm for clustering, seems to work well in practice, at least on data that can be
fit well by a convex function. We focus on the simplest function form, a maximum of
afixednumberofafnefunctions,andthenshowhowthemethodsextendtoamore
general form.
Keywords Convex optimization · Piecewise-linear approximation · Data fitting
1Convexpiecewise-linearttingproblem
We consider the problem of tting some given data
(u
1
,y
1
), . . . , (u
m
,y
m
) R
n
× R
with a convex piecewise-linear function f : R
n
R from some set F of candidate
functions. With a least-squares fitting criterion, we obtain the problem
minimize J(f)=
m
!
i=1
(f (u
i
) y
i
)
2
subject to f F ,
(1)
A. Magnani · S.P. Boyd (
!
)
Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA
e-mail: boyd@stanford.edu
A. Magnani
e-mail: alem@stanford.edu

2 A. Magnani, S.P. Boyd
with variable f .Wereferto(J (f )/m)
1/2
as the RMS (root-mean-square) fit of the
function f to the data. The convex piecewise-linear fitting problem (1)istofindthe
function f ,fromthegivenfamilyF of convex piecewise-linear functions, that gives
the best (smallest) RMS fit to the given data.
Our main interest is in the case when n (the dimension of the data) is relatively
small, say not more than 5 or so, while m (the number of data points) can be relatively
large, e.g., 10
4
or more. The methods we describe, however, work for any values of
n and m.
Several special cases of the convex piecewise-linear fitting problem (1)canbe
solved exactly. When F consists of the affine functions, i.e., f has the form f(x)=
a
T
x + b,theproblem(1)reducestoanordinarylinearleast-squaresprobleminthe
function parameters a R
n
and b R and so is readily solved. As a less trivial ex-
ample, consider the case when F consists of all piecewise-linear functions from R
n
into R,withnootherconstraintontheformoff .Thisisthenonparametric convex
piecewise-linear fitting problem. Then the problem (1)canbesolved,exactly,viaa
quadratic program (QP); see (Boyd and Vandenberghe 2004,Sect.6.5.5).Thisnon-
parametric approach, however, has two potential practical disadvantages. First, the
QP that must be solved is very large (containing more than mn variables), limiting
the method to modest values of m (say, a thousand). The second potential disadvan-
tage is that the piecewise-linear function fit obtained can be very complex, with many
terms (up to m).
Of course, not all data can be fit well (i.e., with small RMS fit) with a convex
piecewise-linear function. For example, if the data are samples from a function that
has strong negative (concave) curvature, then no convex function can fit it well. More-
over, the best fit (which will be poor) will be obtained with an affine function. We can
also have the opposite situation: it can occur that the data can be perfectly fit by an
affine function, i.e., we can have J = 0. In this case we say that the data is interpo-
lated by the convex piecewise-linear function f .
1.1 Max-affine functions
In this paper we consider the parametric fitting problem, in which the candidate func-
tions are parametrized by a finite-dimensional vector of coefficients α R
p
,where
p is the number of parameters needed to describe the candidate functions. One very
simple form is given by F
k
ma
,thesetoffunctionsonR
n
with the form
f(x)= max{a
T
1
x + b
1
,...,a
T
k
x + b
k
}, (2)
i.e., a maximum of k affine functions. We refer to a function of this form as ‘max-
affine’, with k terms. The set F
k
ma
is parametrized by the coefficient vector
α = (a
1
,...,a
k
,b
1
,...,b
k
) R
k(n+1)
.
In fact, any convex piecewise-linear function on R
n
can be expressed as a max-affine
function, for some k,sothisformisinasenseuniversal.Ourinterest,however,is
in the case when the number of terms k is relatively small, say no more than 10 ,
or a few 10s. In this case the max-affine representation (2)iscompact,inthesense

Convex piecewise-linear fitting 3
that the number of parameters needed to describe f (i.e., p)ismuchsmallerthan
the number of parameters in the original data set (i.e., m(n + 1)). The methods we
describe, however, do not require k to be small.
When F = F
k
ma
,thefittingproblem(1)reducestothenonlinearleast-squares
problem
minimize J(α)=
m
!
i=1
"
max
j=1,...,k
(a
T
j
u
i
+ b
j
) y
i
#
2
, (3)
with variables a
1
,...,a
k
R
n
, b
1
,...,b
k
R.ThefunctionJ is a piecewise-
quadratic function of α.Indeed,foreachi, f(u
i
) y
i
is piecewise-linear, and J
is the sum of squares of these functions, so J is convex quadratic on the (polyhe-
dral) regions on which f(u
i
) is affine. But J is not globally convex, so the fitting
problem (3)isnotconvex.
1.2 A more general parametrization
We will also consider a more general parametrized form for convex piecewise-linear
functions,
f(x)= ψ(φ(x,α)), (4)
where ψ : R
q
R is a (fixed) convex piecewise-linear function, and φ :
R
n
× R
p
R
q
is a (fixed) bi-affine function. (This means that for each x, φ(x, α)
is an affine function of α,andforeachα, φ(x, α) is an affine function of x.) The
simple max-affine parametrization (2)hasthisform,withq = k, ψ(z
1
,...,z
k
) =
max{z
1
,...,z
k
},andφ
i
(x, α) = a
T
i
x + b
i
.
As an example, consider the set of functions F that are sums of k terms, each of
which is the maximum of two affine functions,
f(x)=
k
!
i=1
max{a
T
i
x + b
i
,c
T
i
x + d
i
}, (5)
parametrized by a
1
,...,a
k
,c
1
,...,c
k
R
n
and b
1
,...,b
k
,d
1
,...,d
k
R.This
family corresponds to the general form (4)with
ψ(z
1
,...,z
k
,w
1
,...,w
k
) =
k
!
i=1
max{z
i
,w
i
},
and
φ(x,α) = (a
T
1
x + b
1
,...,a
T
k
x + b
k
,c
T
1
x + d
1
,...,c
T
k
x + d
k
).
Of course we can expand any function with the more general form (4)intoits
max-affine representation. But the resulting max-affine representation can be very
much larger than the original general form representation. For example, the function
form (5)requiresp = 2k(n + 1) parameters. If the same function is written out as
amax-afnefunction,itrequires2
k
terms, and therefore 2
k
(n + 1) parameters. The

4 A. Magnani, S.P. Boyd
hope is that a well chosen general form can give us a more compact fit to the given
data than a max-affine form with the same number of parameters.
As another interesting example of the general form (4), consider the case in which
f is given as the optimal value of a linear program (LP) with the right-hand side of
the constraints depending bi-affinely on x and the parameters:
f(x)= min{c
T
v | Av b + Bx}.
Here c and A are fixed; b and B are considered the parameters that define f .This
function can be put in the general form (4)using
ψ(z) = min{c
T
v | Av z},φ(x,b,B)= b + Bx.
The function ψ is convex and piecewise-linear (see, e.g., Boyd and Vandenberghe
2004); the function φ is evidently bi-affine in x and (b, B).
1.3 Dependent variable transformation and normalization
We can apply a nonsingular afne transformation to the dependent variable u,by
forming
˜u
i
= Tu
i
+ s, i = 1,...,m,
where T R
n×n
is nonsingular and s R
n
.Dening
˜
f(˜x) = f(T
1
(x s)),we
have
˜
f(˜u
i
) = f(u
i
).Iff is piecewise-linear and convex, then so is
˜
f (and of course,
vice versa). Provided F is invariant under composition with affine functions, the
problem of fitting the data (u
i
,y
i
) with a function f F is the same as the prob-
lem of fitting the data ( ˜u
i
,y
i
) with a function
˜
f F .
This allows us to normalize the dependent variable data in various ways. For ex-
ample, we can assume that it has zero (sample) mean and unit (sample) covariance,
¯u = (1/m)
m
!
i=1
u
i
= 0,$
u
= (1/m)
m
!
i=1
u
i
u
T
i
= I, (6)
provided the data u
i
are affinely independent. (If they are not, we can reduce the
problem to an equivalent one with smaller dimension.)
1.4 Outline
In Sect. 2 we describe several applications of convex piecewise-linear fitting. In
Sect. 3,wedescribeabasicheuristicalgorithmfor(approximately)solvingthemax-
affine fitting problem (1). This basic algorithm has several shortcomings, such as
convergence to a poor local minimum, or failure to converge at all. By running this
algorithm a modest number of times, from different initial points, however, we obtain
afairlyreliablealgorithmforleast-squaresfittingofamax-afnefunctiontogiven
data. Finally, we show how the algorithm can be extended to handle the more general
function parametrization (4). In Sect. 4 we present some numerical examples.

Convex piecewise-linear fitting 5
1.5 Previous work
Piecewise-linear functions arise in many areas and contexts. Some general forms for
representing piecewise-linear functions can be found in, e.g., Kang and Chua, Kahlert
and Chua (1978, 1990). Several methods have been proposed for fitting general
piecewise-linear functions to (multidimensional) data. A neural network algorithm is
used in Gothoskar et al. (2002); a Gauss-Newton method is used in Julian et al., Horst
and Beichel (1998, 1997)tofindpiecewise-linearapproximationsofsmoothfunc-
tions. A recent reference on methods for least-squares with semismooth functions is
Kanzow and Petra (2004). An iterative procedure, similar in spirit to our method,
is described in Ferrari-Trecate and Muselli (2002). Software for fitting general
piecewise-linear functions to data include, e.g., Torrisi and Bemporad (2004), Storace
and De Feo (2002).
The special case n = 1, i.e., fitting a function on R,byapiecewise-linearfunction
has been extensively studied. For example, a method for finding the minimum num-
ber of segments to achieve a given maximum error is described in Dunham (1986);
the same problem can be approached using dynamic programming (Goodrich 1994;
Bellman and Roth 1969;HakimiandSchmeichel1991;Wangetal.1993), or a ge-
netic algorithm (Pittman and Murthy 2000). The problem of simplifying a given
piecewise-linear function on R,toonewithfewersegments,isconsideredinImai
and Iri (1986).
Another related problem that has received much attention is the problem of fitting
apiecewise-linearcurve,orpolygon,inR
2
to given data; see, e.g., Aggarwal et al.
(1985), Mitchell and Suri (1992). An iterative procedure, closely related to the k-
means algorithm and therefore similar in spirit to our method, is described in Phillips
and Rosenfeld (1988), Yin (1998).
Piecewise-linear functions and approximations have been used in many appli-
cations, such as detection of patterns in images (Rives et al. 1985), contour trac-
ing (Dobkin et al. 1990), extraction of straight lines in aerial images (Venkateswar
and Chellappa 1992), global optimization (Mangasarian et al. 2005), compression of
chemical process data (Bakshi and Stephanopoulos 1996), and circuit modeling (Ju-
lian et al. 1998;ChuaandDeng1986;Vandenbergheetal.1989).
We are aware of only two papers which consider the problem of tting a piecewise-
linear convex function to given data. Mangasarian et al. (2005)describeaheuristic
method for fitting a piecewise-linear convex function of the form a + b
T
x +&Ax +
c&
1
to given data (along with the constraint that the function underestimate the data).
The focus of their paper is on finding piecewise-linear convex underestimators for
known (nonconvex) functions, for use in global optimization; our focus, in contrast,
is on simply fitting some given data. The closest related work that we know of is Kim
et al. (2004). In this paper, Kim et al. describe a method for fitting a (convex) max-
affine function to given data, increasing the number of terms to get a better fit. (In
fact they describe a method for tting a max-monomial function to circuit models;
see Sect. 2.3.)

Citations
More filters
Journal ArticleDOI

Fast Model Predictive Control Using Online Optimization

TL;DR: A collection of methods for improving the speed of MPC, using online optimization, which can compute the control action on the order of 100 times faster than a method that uses a generic optimizer.
Journal ArticleDOI

A tutorial on geometric programming

TL;DR: This tutorial paper collects together in one place the basic background material needed to do GP modeling, and shows how to recognize functions and problems compatible with GP, and how to approximate functions or data in a formcompatible with GP.
Journal ArticleDOI

Security-Constrained Unit Commitment With Linearized System Frequency Limit Constraints

TL;DR: In this paper, a modified system frequency response model is derived and used to find analytical representation of system minimum frequency in thermal-dominant multi-machine systems, and an effective piecewise linearization (PWL) technique is employed to linearize the nonlinear function representing the minimum system frequency, facilitating its integration in the SCUC problem.
Proceedings Article

Input Convex Neural Networks

TL;DR: Input convex neural networks as discussed by the authors are a generalization of neural networks with constraints on the network parameters such that the output of the network is a convex function of some of the inputs.
Journal ArticleDOI

Multivariate convex regression with adaptive partitioning

TL;DR: This work introduces convex adaptive partitioning (CAP), which creates a globally convex regression model from locally linear estimates fit on adaptively selected covariate partitions and demonstrates empirical performance by comparing the performance of CAP to other shape-constrained and unconstrained regression methods for predicting weekly wages and value function approximation for pricing American basket options.
References
More filters
Proceedings ArticleDOI

Techniques for improving the accuracy of geometric-programming based analog circuit design optimization

TL;DR: Techniques for improving the accuracy of geometric-programming (GP) based analog circuit design optimization are presented and a simple method to take the modeling error into account in GP optimization results in a robust design over the inherent errors in GP device models.
Journal ArticleDOI

Compression of chemical process data by functional approximation and feature extraction

TL;DR: Extensive case studies on industrial data demonstrate the superior performance of wavelet-based techniques as compared to existing piecewise linear techniques.
Book ChapterDOI

A New Learning Method for Piecewise Linear Regression

TL;DR: A new connectionist model for the solution of piecewise linear regression problems is introduced; it is able to reconstruct both continuous and non continuous real valued mappings starting from a finite set of possibly noisy samples.
Journal ArticleDOI

Fitting optimal piecewise linear functions using genetic algorithms

TL;DR: This work introduces a new method which employs genetic algorithms to optimize the number and location of the pieces and concludes that it represents a valuable tool for fitting both robust and nonrobust piecewise linear functions.
Journal ArticleDOI

The generalized linear complementarity problem revisited

TL;DR: This paper forms this problem as a linear complementarity problem with a square matrixM, a formulation which is different from a similar formulation given earlier by Lemke, and shows that the class of vertical block matrices which Cottle and Dantzig's algorithm can process is the same as theclass of equivalent square matrices in Lemke's algorithm.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What have the authors contributed in "Convex piecewise-linear fitting" ?

The authors consider the problem of fitting a convex piecewise-linear function, with some specified form, to given multi-dimensional data. The method the authors describe, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function. The authors focus on the simplest function form, a maximum of a fixed number of affine functions, and then show how the methods extend to a more general form.