scispace - formally typeset
Open AccessJournal ArticleDOI

Convex piecewise-linear fitting

Alessandro Magnani, +1 more
- 01 Mar 2009 - 
- Vol. 10, Iss: 1, pp 1-17
Reads0
Chats0
TLDR
The method described, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function.
Abstract
We consider the problem of fitting a convex piecewise-linear function, with some specified form, to given multi-dimensional data. Except for a few special cases, this problem is hard to solve exactly, so we focus on heuristic methods that find locally optimal fits. The method we describe, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function. We focus on the simplest function form, a maximum of a fixed number of affine functions, and then show how the methods extend to a more general form.

read more

Content maybe subject to copyright    Report

Optim Eng (2009) 10: 1–17
DOI 10.1007/s11081-008-9045-3
Convex piecewise-linear fitting
Alessandro Magnani · Stephen P. Boyd
Received: 14 April 2006 / Accepted: 4 March 2008 / Published online: 25 March 2008
© Springer Science+Business Media, LLC 2008
Abstract We consider the problem of tting a convex piecewise-linear function, with
some specified form, to given multi-dimensional data. Except for a few special cases,
this problem is hard to solve exactly, so we focus on heuristic methods that find
locally optimal fits. The method we describe, which is a variation on the K-means
algorithm for clustering, seems to work well in practice, at least on data that can be
fit well by a convex function. We focus on the simplest function form, a maximum of
afixednumberofafnefunctions,andthenshowhowthemethodsextendtoamore
general form.
Keywords Convex optimization · Piecewise-linear approximation · Data fitting
1Convexpiecewise-linearttingproblem
We consider the problem of tting some given data
(u
1
,y
1
), . . . , (u
m
,y
m
) R
n
× R
with a convex piecewise-linear function f : R
n
R from some set F of candidate
functions. With a least-squares fitting criterion, we obtain the problem
minimize J(f)=
m
!
i=1
(f (u
i
) y
i
)
2
subject to f F ,
(1)
A. Magnani · S.P. Boyd (
!
)
Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA
e-mail: boyd@stanford.edu
A. Magnani
e-mail: alem@stanford.edu

2 A. Magnani, S.P. Boyd
with variable f .Wereferto(J (f )/m)
1/2
as the RMS (root-mean-square) fit of the
function f to the data. The convex piecewise-linear fitting problem (1)istofindthe
function f ,fromthegivenfamilyF of convex piecewise-linear functions, that gives
the best (smallest) RMS fit to the given data.
Our main interest is in the case when n (the dimension of the data) is relatively
small, say not more than 5 or so, while m (the number of data points) can be relatively
large, e.g., 10
4
or more. The methods we describe, however, work for any values of
n and m.
Several special cases of the convex piecewise-linear fitting problem (1)canbe
solved exactly. When F consists of the affine functions, i.e., f has the form f(x)=
a
T
x + b,theproblem(1)reducestoanordinarylinearleast-squaresprobleminthe
function parameters a R
n
and b R and so is readily solved. As a less trivial ex-
ample, consider the case when F consists of all piecewise-linear functions from R
n
into R,withnootherconstraintontheformoff .Thisisthenonparametric convex
piecewise-linear fitting problem. Then the problem (1)canbesolved,exactly,viaa
quadratic program (QP); see (Boyd and Vandenberghe 2004,Sect.6.5.5).Thisnon-
parametric approach, however, has two potential practical disadvantages. First, the
QP that must be solved is very large (containing more than mn variables), limiting
the method to modest values of m (say, a thousand). The second potential disadvan-
tage is that the piecewise-linear function fit obtained can be very complex, with many
terms (up to m).
Of course, not all data can be fit well (i.e., with small RMS fit) with a convex
piecewise-linear function. For example, if the data are samples from a function that
has strong negative (concave) curvature, then no convex function can fit it well. More-
over, the best fit (which will be poor) will be obtained with an affine function. We can
also have the opposite situation: it can occur that the data can be perfectly fit by an
affine function, i.e., we can have J = 0. In this case we say that the data is interpo-
lated by the convex piecewise-linear function f .
1.1 Max-affine functions
In this paper we consider the parametric fitting problem, in which the candidate func-
tions are parametrized by a finite-dimensional vector of coefficients α R
p
,where
p is the number of parameters needed to describe the candidate functions. One very
simple form is given by F
k
ma
,thesetoffunctionsonR
n
with the form
f(x)= max{a
T
1
x + b
1
,...,a
T
k
x + b
k
}, (2)
i.e., a maximum of k affine functions. We refer to a function of this form as ‘max-
affine’, with k terms. The set F
k
ma
is parametrized by the coefficient vector
α = (a
1
,...,a
k
,b
1
,...,b
k
) R
k(n+1)
.
In fact, any convex piecewise-linear function on R
n
can be expressed as a max-affine
function, for some k,sothisformisinasenseuniversal.Ourinterest,however,is
in the case when the number of terms k is relatively small, say no more than 10 ,
or a few 10s. In this case the max-affine representation (2)iscompact,inthesense

Convex piecewise-linear fitting 3
that the number of parameters needed to describe f (i.e., p)ismuchsmallerthan
the number of parameters in the original data set (i.e., m(n + 1)). The methods we
describe, however, do not require k to be small.
When F = F
k
ma
,thefittingproblem(1)reducestothenonlinearleast-squares
problem
minimize J(α)=
m
!
i=1
"
max
j=1,...,k
(a
T
j
u
i
+ b
j
) y
i
#
2
, (3)
with variables a
1
,...,a
k
R
n
, b
1
,...,b
k
R.ThefunctionJ is a piecewise-
quadratic function of α.Indeed,foreachi, f(u
i
) y
i
is piecewise-linear, and J
is the sum of squares of these functions, so J is convex quadratic on the (polyhe-
dral) regions on which f(u
i
) is affine. But J is not globally convex, so the fitting
problem (3)isnotconvex.
1.2 A more general parametrization
We will also consider a more general parametrized form for convex piecewise-linear
functions,
f(x)= ψ(φ(x,α)), (4)
where ψ : R
q
R is a (fixed) convex piecewise-linear function, and φ :
R
n
× R
p
R
q
is a (fixed) bi-affine function. (This means that for each x, φ(x, α)
is an affine function of α,andforeachα, φ(x, α) is an affine function of x.) The
simple max-affine parametrization (2)hasthisform,withq = k, ψ(z
1
,...,z
k
) =
max{z
1
,...,z
k
},andφ
i
(x, α) = a
T
i
x + b
i
.
As an example, consider the set of functions F that are sums of k terms, each of
which is the maximum of two affine functions,
f(x)=
k
!
i=1
max{a
T
i
x + b
i
,c
T
i
x + d
i
}, (5)
parametrized by a
1
,...,a
k
,c
1
,...,c
k
R
n
and b
1
,...,b
k
,d
1
,...,d
k
R.This
family corresponds to the general form (4)with
ψ(z
1
,...,z
k
,w
1
,...,w
k
) =
k
!
i=1
max{z
i
,w
i
},
and
φ(x,α) = (a
T
1
x + b
1
,...,a
T
k
x + b
k
,c
T
1
x + d
1
,...,c
T
k
x + d
k
).
Of course we can expand any function with the more general form (4)intoits
max-affine representation. But the resulting max-affine representation can be very
much larger than the original general form representation. For example, the function
form (5)requiresp = 2k(n + 1) parameters. If the same function is written out as
amax-afnefunction,itrequires2
k
terms, and therefore 2
k
(n + 1) parameters. The

4 A. Magnani, S.P. Boyd
hope is that a well chosen general form can give us a more compact fit to the given
data than a max-affine form with the same number of parameters.
As another interesting example of the general form (4), consider the case in which
f is given as the optimal value of a linear program (LP) with the right-hand side of
the constraints depending bi-affinely on x and the parameters:
f(x)= min{c
T
v | Av b + Bx}.
Here c and A are fixed; b and B are considered the parameters that define f .This
function can be put in the general form (4)using
ψ(z) = min{c
T
v | Av z},φ(x,b,B)= b + Bx.
The function ψ is convex and piecewise-linear (see, e.g., Boyd and Vandenberghe
2004); the function φ is evidently bi-affine in x and (b, B).
1.3 Dependent variable transformation and normalization
We can apply a nonsingular afne transformation to the dependent variable u,by
forming
˜u
i
= Tu
i
+ s, i = 1,...,m,
where T R
n×n
is nonsingular and s R
n
.Dening
˜
f(˜x) = f(T
1
(x s)),we
have
˜
f(˜u
i
) = f(u
i
).Iff is piecewise-linear and convex, then so is
˜
f (and of course,
vice versa). Provided F is invariant under composition with affine functions, the
problem of fitting the data (u
i
,y
i
) with a function f F is the same as the prob-
lem of fitting the data ( ˜u
i
,y
i
) with a function
˜
f F .
This allows us to normalize the dependent variable data in various ways. For ex-
ample, we can assume that it has zero (sample) mean and unit (sample) covariance,
¯u = (1/m)
m
!
i=1
u
i
= 0,$
u
= (1/m)
m
!
i=1
u
i
u
T
i
= I, (6)
provided the data u
i
are affinely independent. (If they are not, we can reduce the
problem to an equivalent one with smaller dimension.)
1.4 Outline
In Sect. 2 we describe several applications of convex piecewise-linear fitting. In
Sect. 3,wedescribeabasicheuristicalgorithmfor(approximately)solvingthemax-
affine fitting problem (1). This basic algorithm has several shortcomings, such as
convergence to a poor local minimum, or failure to converge at all. By running this
algorithm a modest number of times, from different initial points, however, we obtain
afairlyreliablealgorithmforleast-squaresfittingofamax-afnefunctiontogiven
data. Finally, we show how the algorithm can be extended to handle the more general
function parametrization (4). In Sect. 4 we present some numerical examples.

Convex piecewise-linear fitting 5
1.5 Previous work
Piecewise-linear functions arise in many areas and contexts. Some general forms for
representing piecewise-linear functions can be found in, e.g., Kang and Chua, Kahlert
and Chua (1978, 1990). Several methods have been proposed for fitting general
piecewise-linear functions to (multidimensional) data. A neural network algorithm is
used in Gothoskar et al. (2002); a Gauss-Newton method is used in Julian et al., Horst
and Beichel (1998, 1997)tofindpiecewise-linearapproximationsofsmoothfunc-
tions. A recent reference on methods for least-squares with semismooth functions is
Kanzow and Petra (2004). An iterative procedure, similar in spirit to our method,
is described in Ferrari-Trecate and Muselli (2002). Software for fitting general
piecewise-linear functions to data include, e.g., Torrisi and Bemporad (2004), Storace
and De Feo (2002).
The special case n = 1, i.e., fitting a function on R,byapiecewise-linearfunction
has been extensively studied. For example, a method for finding the minimum num-
ber of segments to achieve a given maximum error is described in Dunham (1986);
the same problem can be approached using dynamic programming (Goodrich 1994;
Bellman and Roth 1969;HakimiandSchmeichel1991;Wangetal.1993), or a ge-
netic algorithm (Pittman and Murthy 2000). The problem of simplifying a given
piecewise-linear function on R,toonewithfewersegments,isconsideredinImai
and Iri (1986).
Another related problem that has received much attention is the problem of fitting
apiecewise-linearcurve,orpolygon,inR
2
to given data; see, e.g., Aggarwal et al.
(1985), Mitchell and Suri (1992). An iterative procedure, closely related to the k-
means algorithm and therefore similar in spirit to our method, is described in Phillips
and Rosenfeld (1988), Yin (1998).
Piecewise-linear functions and approximations have been used in many appli-
cations, such as detection of patterns in images (Rives et al. 1985), contour trac-
ing (Dobkin et al. 1990), extraction of straight lines in aerial images (Venkateswar
and Chellappa 1992), global optimization (Mangasarian et al. 2005), compression of
chemical process data (Bakshi and Stephanopoulos 1996), and circuit modeling (Ju-
lian et al. 1998;ChuaandDeng1986;Vandenbergheetal.1989).
We are aware of only two papers which consider the problem of tting a piecewise-
linear convex function to given data. Mangasarian et al. (2005)describeaheuristic
method for fitting a piecewise-linear convex function of the form a + b
T
x +&Ax +
c&
1
to given data (along with the constraint that the function underestimate the data).
The focus of their paper is on finding piecewise-linear convex underestimators for
known (nonconvex) functions, for use in global optimization; our focus, in contrast,
is on simply fitting some given data. The closest related work that we know of is Kim
et al. (2004). In this paper, Kim et al. describe a method for fitting a (convex) max-
affine function to given data, increasing the number of terms to get a better fit. (In
fact they describe a method for tting a max-monomial function to circuit models;
see Sect. 2.3.)

Citations
More filters
Proceedings ArticleDOI

On a least absolute deviations estimator of a multivariate convex function

TL;DR: This paper proposes a computationally efficient way of fitting a convex function by computing the best fit minimizing the sum of absolute deviations and shows that the proposed least absolute deviations estimator can be computed more efficiently via a linear program than the traditional least squares estimator.
Posted Content

A Parallelizable Dual Smoothing Method for Large Scale Convex Regression Problems

TL;DR: This work proposes a first-order method based on dual smoothing that carefully manages the memory usage through parallelization in order to efficiently compute the least squares estimator in practice for large-scale CR instances.
Book ChapterDOI

Load Shedding in Islanded Microgrid with Uncertain Conditions

TL;DR: In this paper, the authors considered the load shedding in islanded micro grid by solving two problems: optimal generation dispatching problem with interval right hand side of equality constraint due to the fact that wind and solar energy output power are given in the form of intervals.
Journal ArticleDOI

Modeling Complex Pharmacokinetics of Long-Acting Injectable Products Using Convolution-Based Models With Nonparametric Input Functions

TL;DR: In this article, a convolution-based model with piecewise-linear approximation of the nonlinear drug release function is proposed to characterize the complex PK profiles of LAI formulations with completely different drug release properties.
Journal ArticleDOI

Throughput Maximization Leveraging Just-Enough SNR Margin and Channel Spacing Optimization

TL;DR: In this paper , the authors leverage an iterative feedback tuning algorithm to provide a just-enough signal-to-noise ratio (SNR) margin, so as to maximize the network throughput.
References
More filters
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Book

Vector Quantization and Signal Compression

TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Journal ArticleDOI

A tutorial on geometric programming

TL;DR: This tutorial paper collects together in one place the basic background material needed to do GP modeling, and shows how to recognize functions and problems compatible with GP, and how to approximate functions or data in a formcompatible with GP.
Journal ArticleDOI

Model predictive control based on linear programming - the explicit solution

TL;DR: The availability of the explicit structure of the MPC controller provides an insight into the type of control action in different regions of the state space, and highlights possible conditions of degeneracies of the LP, such as multiple optima.
Journal ArticleDOI

HYSDEL-a tool for generating computational hybrid models for analysis and synthesis problems

TL;DR: Hybrid systems description language (HYSDEL) as discussed by the authors is a high-level modeling language for discrete hybrid automata (DHA) and a set of tools for translating DHA into hybrid models.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What have the authors contributed in "Convex piecewise-linear fitting" ?

The authors consider the problem of fitting a convex piecewise-linear function, with some specified form, to given multi-dimensional data. The method the authors describe, which is a variation on the K-means algorithm for clustering, seems to work well in practice, at least on data that can be fit well by a convex function. The authors focus on the simplest function form, a maximum of a fixed number of affine functions, and then show how the methods extend to a more general form.