scispace - formally typeset
Search or ask a question

Showing papers in "Mathematical Programming in 2015"


Journal ArticleDOI
TL;DR: A certain problem structure that arises frequently in machine learning applications is shown, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type.
Abstract: Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.

1,198 citations


Journal ArticleDOI
TL;DR: In this article, the phase retrieval problem is cast as a nonconvex quadratic program over a complex phase vector and formulated a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program.
Abstract: Phase retrieval seeks to recover a signal $$x \in {\mathbb {C}}^p$$ x ? C p from the amplitude $$|A x|$$ | A x | of linear measurements $$Ax \in {\mathbb {C}}^n$$ A x ? C n . We cast the phase retrieval problem as a non-convex quadratic program over a complex phase vector and formulate a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program. We solve this problem using a provably convergent block coordinate descent algorithm whose structure is similar to that of the original greedy algorithm in Gerchberg and Saxton (Optik 35:237---246, 1972), where each iteration is a matrix vector product. Numerical results show the performance of this approach over three different phase retrieval problems, in comparison with greedy phase retrieval algorithms and matrix completion formulations.

502 citations


Journal ArticleDOI
TL;DR: New methods for black-box convex minimization are presented, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.
Abstract: In this paper, we present new methods for black-box convex minimization. They do not need to know in advance the actual level of smoothness of the objective function. Their only essential input parameter is the required accuracy of the solution. At the same time, for each particular problem class they automatically ensure the best possible rate of convergence. We confirm our theoretical results by encouraging numerical experiments, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.

302 citations


Journal ArticleDOI
TL;DR: This paper extends Nesterov’s technique for analyzing the RBCD method for minimizing a smooth convex function over a block-separable closed convex set to obtain a sharper expected-value type of convergence rate than the one implied in Richtárik and Takáč (Math Program 144(1–2):1–38, 2014).
Abstract: In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtarik and Takaa? (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and a block-separable convex function, and derive improved bounds on their convergence rates. In particular, we extend Nesterov's technique developed in Nesterov (SIAM J Optim 22(2):341---362, 2012) for analyzing the RBCD method for minimizing a smooth convex function over a block-separable closed convex set to the aforementioned more general problem and obtain a sharper expected-value type of convergence rate than the one implied in Richtarik and Takaa? (Math Program 144(1---2):1---38, 2014). As a result, we also obtain a better high-probability type of iteration complexity. In addition, for unconstrained smooth convex minimization, we develop a new technique called randomized estimate sequence to analyze the accelerated RBCD method proposed by Nesterov (SIAM J Optim 22(2):341---362, 2012) and establish a sharper expected-value type of convergence rate than the one given in Nesterov (SIAM J Optim 22(2):341---362, 2012).

252 citations


Journal ArticleDOI
TL;DR: Recent results on trust region methods for unconstrained optimization, constrained optimization, nonlinear equations and nonlinear least squares, nonsmooth optimization and optimization without derivatives are reviewed.
Abstract: Trust region methods are a class of numerical methods for optimization. Unlike line search type methods where a line search is carried out in each iteration, trust region methods compute a trial step by solving a trust region subproblem where a model function is minimized within a trust region. Due to the trust region constraint, nonconvex models can be used in trust region subproblems, and trust region algorithms can be applied to nonconvex and ill-conditioned problems. Normally it is easier to establish the global convergence of a trust region algorithm than that of its line search counterpart. In the paper, we review recent results on trust region methods for unconstrained optimization, constrained optimization, nonlinear equations and nonlinear least squares, nonsmooth optimization and optimization without derivatives. Results on trust region subproblems and regularization methods are also discussed.

249 citations


Journal ArticleDOI
TL;DR: In this article, the robust counterpart of a nonlinear uncertain inequality that is concave in the uncertain parameters is constructed using convex analysis and conic duality, and the robustness is shown to be robust both for linear and nonlinear inequalities.
Abstract: In this paper we provide a systematic way to construct the robust counterpart of a nonlinear uncertain inequality that is concave in the uncertain parameters. We use convex analysis (support functions, conjugate functions, Fenchel duality) and conic duality in order to convert the robust counterpart into an explicit and computationally tractable set of constraints. It turns out that to do so one has to calculate the support function of the uncertainty set and the concave conjugate of the nonlinear constraint function. Conveniently, these two computations are completely independent. This approach has several advantages. First, it provides an easy structured way to construct the robust counterpart both for linear and nonlinear inequalities. Second, it shows that for new classes of uncertainty regions and for new classes of nonlinear optimization problems tractable counterparts can be derived. We also study some cases where the inequality is nonconcave in the uncertain parameters.

167 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider two convex optimization problems where, given a cone, a norm, and a smooth convex function, they want either to minimize the norm over the intersection of the cone and a level set of convex functions, or to minimize over the cone the sum of the norm and a multiple of it.
Abstract: Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone $$K$$K, a norm $$\Vert \cdot \Vert $$?·? and a smooth convex function $$f$$f, we want either (1) to minimize the norm over the intersection of the cone and a level set of $$f$$f, or (2) to minimize over the cone the sum of $$f$$f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) $$\Vert \cdot \Vert $$?·? is "too complicated" to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of $$K$$K and the unit $$\Vert \cdot \Vert $$?·?-ball. Motivating examples are given by the nuclear norm with $$K$$K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

167 citations


Journal ArticleDOI
TL;DR: This work develops an efficient, data-driven technique for estimating the parameters of these models from observed equilibria, and supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators).
Abstract: Equilibrium modeling is common in a variety of fields such as game theory and transportation science. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. We use this technique to estimate the utility functions of players in a game from their observed actions and to estimate the congestion function on a road network from traffic count data. A distinguishing feature of our approach is that it supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators). In computational experiments involving Nash and Wardrop equilibria in a nonparametric setting, we find that a) we effectively estimate the unknown demand or congestion function, respectively, and b) our proposed regularization technique substantially improves the out-of-sample performance of our estimators.

155 citations


Journal ArticleDOI
TL;DR: The watershed between tractability and intractability in ambiguity-averse uncertainty quantification and chance constrained programming is delineated and tools from distributionally robust optimization are derived that derive explicit conic reformulations for tractable problem classes and suggest efficiently computable conservative approximations for intractable ones.
Abstract: The objective of uncertainty quantification is to certify that a given physical, engineering or economic system satisfies multiple safety conditions with high probability. A more ambitious goal is to actively influence the system so as to guarantee and maintain its safety, a scenario which can be modeled through a chance constrained program. In this paper we assume that the parameters of the system are governed by an ambiguous distribution that is only known to belong to an ambiguity set characterized through generalized moment bounds and structural properties such as symmetry, unimodality or independence patterns. We delineate the watershed between tractability and intractability in ambiguity-averse uncertainty quantification and chance constrained programming. Using tools from distributionally robust optimization, we derive explicit conic reformulations for tractable problem classes and suggest efficiently computable conservative approximations for intractable ones.

145 citations


Journal ArticleDOI
TL;DR: In this paper, a lifting technique was proposed to map a given stochastic program to an equivalent problem on a higher-dimensional probability space, and it was shown that solving the lifted problem in primal and dual linear decision rules provides tighter bounds than those obtained from applying linear decision rule to the original problem.
Abstract: Stochastic programming provides a versatile framework for decision-making under uncertainty, but the resulting optimization problems can be computationally demanding. It has recently been shown that primal and dual linear decision rule approximations can yield tractable upper and lower bounds on the optimal value of a stochastic program. Unfortunately, linear decision rules often provide crude approximations that result in loose bounds. To address this problem, we propose a lifting technique that maps a given stochastic program to an equivalent problem on a higher-dimensional probability space. We prove that solving the lifted problem in primal and dual linear decision rules provides tighter bounds than those obtained from applying linear decision rules to the original problem. We also show that there is a one-to-one correspondence between linear decision rules in the lifted problem and families of nonlinear decision rules in the original problem. Finally, we identify structured liftings that give rise to highly flexible piecewise linear and nonlinear decision rules, and we assess their performance in the context of a dynamic production planning problem.

139 citations


Journal ArticleDOI
TL;DR: A novel distributed method for convex optimization problems with a certain separability structure based on the augmented Lagrangian framework is proposed and compares favorably to two augmentedlagrangian decomposition methods known in the literature.
Abstract: We propose a novel distributed method for convex optimization problems with a certain separability structure. The method is based on the augmented Lagrangian framework. We analyze its convergence and provide an application to two network models, as well as to a two-stage stochastic optimization problem. The proposed method compares favorably to two augmented Lagrangian decomposition methods known in the literature, as well as to decomposition methods based on the ordinary Lagrangian function.

Journal ArticleDOI
TL;DR: It is shown how to obtain an extended formulation for this polytope from any sorting network, and it is shown that this is best possible (up to a multiplicative constant) since any extended formulation has at least $$varOmega (n \log n)$$Ω(nlogn) inequalities.
Abstract: In this note, we consider the permutahedron, the convex hull of all permutations of $$\{1,2\ldots ,n\}$${1,2?,n}. We show how to obtain an extended formulation for this polytope from any sorting network. By using the optimal Ajtai---Komlos---Szemeredi sorting network, this extended formulation has $$\varTheta (n\log n)$$?(nlogn) variables and inequalities. Furthermore, from basic polyhedral arguments, we show that this is best possible (up to a multiplicative constant) since any extended formulation has at least $$\varOmega (n \log n)$$Ω(nlogn) inequalities. The results easily extend to the generalized permutahedron.

Journal ArticleDOI
TL;DR: This work presents a risk-averse multi-dimensional newsvendor model for a class of products whose demands are strongly correlated and subject to fashion trends that are not fully understood at the time when orders are placed and demonstrates that disregarding ambiguity or multimodality can lead to unstable solutions that perform poorly in stress test experiments.
Abstract: We present a risk-averse multi-dimensional newsvendor model for a class of products whose demands are strongly correlated and subject to fashion trends that are not fully understood at the time when orders are placed. The demand distribution is known to be multimodal in the sense that there are spatially separated clusters of probability mass but otherwise lacks a complete description. We assume that the newsvendor hedges against distributional ambiguity by minimizing the worst-case risk of the order portfolio over all distributions that are compatible with the given modality information. We demonstrate that the resulting distributionally robust optimization problem is $$\mathrm{NP}$$NP-hard but admits an efficient numerical solution in quadratic decision rules. This approximation is conservative and computationally tractable. Moreover, it achieves a high level of accuracy in numerical tests. We further demonstrate that disregarding ambiguity or multimodality can lead to unstable solutions that perform poorly in stress test experiments.

Journal ArticleDOI
TL;DR: The positive semidefinite rank (psd rank) as discussed by the authors is the smallest integer k for which there exist polyhedra of size k = 1 such that the polyhedron is polyhedrically connected with the rank of k. The psd rank has many appealing geometric interpretations, including semidefinite representations of polyhedras and information-theoretic applications.
Abstract: Let $$M \in \mathbb {R}^{p \times q}$$M?Rp×q be a nonnegative matrix. The positive semidefinite rank (psd rank) of M is the smallest integer k for which there exist positive semidefinite matrices $$A_i, B_j$$Ai,Bj of size $$k \times k$$k×k such that $$M_{ij} = {{\mathrm{trace}}}(A_i B_j)$$Mij=trace(AiBj). The psd rank has many appealing geometric interpretations, including semidefinite representations of polyhedra and information-theoretic applications. In this paper we develop and survey the main mathematical properties of psd rank, including its geometry, relationships with other rank notions, and computational and algorithmic aspects.

Journal ArticleDOI
TL;DR: A first order interior point algorithm is proposed for a class of non-Lipschitz and nonconvex minimization problems with box constraints, which arise from applications in variable selection and regularized optimization and the objective function value is reduced monotonically along the iteration points.
Abstract: We propose a first order interior point algorithm for a class of non-Lipschitz and nonconvex minimization problems with box constraints, which arise from applications in variable selection and regularized optimization. The objective functions of these problems are continuously differentiable typically at interior points of the feasible set. Our first order algorithm is easy to implement and the objective function value is reduced monotonically along the iteration points. We show that the worst-case iteration complexity for finding an $$\epsilon $$ ∈ scaled first order stationary point is $$O(\epsilon ^{-2})$$ O ( ∈ - 2 ) . Furthermore, we develop a second order interior point algorithm using the Hessian matrix, and solve a quadratic program with a ball constraint at each iteration. Although the second order interior point algorithm costs more computational time than that of the first order algorithm in each iteration, its worst-case iteration complexity for finding an $$\epsilon $$ ∈ scaled second order stationary point is reduced to $$O(\epsilon ^{-3/2})$$ O ( ∈ - 3 / 2 ) . Note that an $$\epsilon $$ ∈ scaled second order stationary point must also be an $$\epsilon $$ ∈ scaled first order stationary point.

Journal ArticleDOI
TL;DR: The new method performs remarkably well for the nearest low-rank correlation matrix problem in terms of speed and solution quality and is considerably competitive with the widely used SCF iteration for the Kohn–Sham total energy minimization.
Abstract: This paper considers optimization problems on the Stiefel manifold $$X^{\mathsf{T}}X=I_p$$XTX=Ip, where $$X\in \mathbb {R}^{n \times p}$$X?Rn×p is the variable and $$I_p$$Ip is the $$p$$p-by-$$p$$p identity matrix. A framework of constraint preserving update schemes is proposed by decomposing each feasible point into the range space of $$X$$X and the null space of $$X^{\mathsf{T}}$$XT. While this general framework can unify many existing schemes, a new update scheme with low complexity cost is also discovered. Then we study a feasible Barzilai---Borwein-like method under the new update scheme. The global convergence of the method is established with an adaptive nonmonotone line search. The numerical tests on the nearest low-rank correlation matrix problem, the Kohn---Sham total energy minimization and a specific problem from statistics demonstrate the efficiency of the new method. In particular, the new method performs remarkably well for the nearest low-rank correlation matrix problem in terms of speed and solution quality and is considerably competitive with the widely used SCF iteration for the Kohn---Sham total energy minimization.

Journal ArticleDOI
TL;DR: In this paper, two modified primal-dual splitting algorithms are presented for solving monotone inclusion problems, which are fully decomposable, in the sense that the operators are processed individually at each iteration.
Abstract: We present two modified versions of the primal-dual splitting algorithm relying on forward---backward splitting proposed in V $$\tilde{\mathrm{u}}$$ u ~ (Adv Comput Math 38(3):667---681, 2013) for solving monotone inclusion problems. Under strong monotonicity assumptions for some of the operators involved we obtain for the sequences of iterates that approach the solution orders of convergence of $$\mathcal{{O}}(\frac{1}{n})$$ O ( 1 n ) and $$\mathcal{{O}}(\omega ^n)$$ O ( ? n ) , for $$\omega \in (0,1)$$ ? ? ( 0 , 1 ) , respectively. The investigated primal-dual algorithms are fully decomposable, in the sense that the operators are processed individually at each iteration. We also discuss the modified algorithms in the context of convex optimization problems and present numerical experiments in image processing and pattern recognition in cluster analysis.

Journal ArticleDOI
TL;DR: This work considers a risk-averse multi-stage stochastic program using conditional value at risk as the risk measure, and proposes a new approach based on importance sampling, which yields improved upper bound estimators.
Abstract: We consider a risk-averse multi-stage stochastic program using conditional value at risk as the risk measure. The underlying random process is assumed to be stage-wise independent, and a stochastic dual dynamic programming (SDDP) algorithm is applied. We discuss the poor performance of the standard upper bound estimator in the risk-averse setting and propose a new approach based on importance sampling, which yields improved upper bound estimators. Modest additional computational effort is required to use our new estimators. Our procedures allow for significant improvement in terms of controlling solution quality in SDDP-style algorithms in the risk-averse setting. We give computational results for multi-stage asset allocation using a log-normal distribution for the asset returns.

Journal ArticleDOI
TL;DR: This paper significantly extends some existing results that allow to work in the original space of variables for two relevant special cases where the disjunctions corresponding to the logical implications have two terms in two different directions.
Abstract: In this paper we review the relevant literature on mathematical optimization with logical implications, ie, where constraints can be either active or disabled depending on logical conditions to hold In the case of convex functions, the theory of disjunctive programming allows one to formulate these logical implications as convex nonlinear programming problems in a space of variables lifted with respect to its original dimension We concentrate on the attempt of avoiding the issue of dealing with large NLPs In particular, we review some existing results that allow to work in the original space of variables for two relevant special cases where the disjunctions corresponding to the logical implications have two terms Then, we significantly extend these special cases in two different directions, one involving more general convex sets and the other with disjunctions involving three terms Computational experiments comparing disjunctive programming formulations in the original space of variables with straightforward bigM ones show that the former are computationally viable and promising

Journal ArticleDOI
TL;DR: A homogeneous interior-point algorithm for solving nonsymmetric convex conic optimization problems is presented, and it is proved convergence to ϵ-accuracy in O(νlog(1/ϵ) iterations) iterations.
Abstract: A homogeneous interior-point algorithm for solving nonsymmetric convex conic optimization problems is presented. Starting each iteration from the vicinity of the central path, the method steps in the approximate tangent direction and then applies a correction phase to locate the next well-centered primal---dual point. Features of the algorithm include that it makes use only of the primal barrier function, that it is able to detect infeasibilities in the problem and that no phase-I method is needed. We prove convergence to $$\epsilon $$ ∈ -accuracy in $${\mathcal {O}}(\sqrt{ u } \log {(1/\epsilon )})$$ O ( ? log ( 1 / ∈ ) ) iterations. To improve performance, the algorithm employs a new Runge---Kutta type second order search direction suitable for the general nonsymmetric conic problem. Moreover, quasi-Newton updating is used to reduce the number of factorizations needed, implemented so that data sparsity can still be exploited. Extensive and promising computational results are presented for the $$p$$ p -cone problem, the facility location problem, entropy maximization problems and geometric programs; all formulated as nonsymmetric convex conic optimization problems.

Journal ArticleDOI
TL;DR: A multi-step acceleration scheme is incorporated into the well-known bundle-level method, and it is shown that it can achieve the optimal complexity for solving a general class of black-box CP problems without requiring the input of any smoothness information.
Abstract: The main goal of this paper is to develop uniformly optimal first-order methods for convex programming (CP). By uniform optimality we mean that the first-order methods themselves do not require the input of any problem parameters, but can still achieve the best possible iteration complexity bounds. By incorporating a multi-step acceleration scheme into the well-known bundle-level method, we develop an accelerated bundle-level method, and show that it can achieve the optimal complexity for solving a general class of black-box CP problems without requiring the input of any smoothness information, such as, whether the problem is smooth, nonsmooth or weakly smooth, as well as the specific values of Lipschitz constant and smoothness level. We then develop a more practical, restricted memory version of this method, namely the accelerated prox-level (APL) method. We investigate the generalization of the APL method for solving certain composite CP problems and an important class of saddle-point problems recently studied by Nesterov (Math Program 103:127---152, 2005). We present promising numerical results for these new bundle-level methods applied to solve certain classes of semidefinite programming and stochastic programming problems.

Journal ArticleDOI
TL;DR: This paper presents the first fixed-parameter algorithms for classical scheduling problems such as makespan minimization, scheduling with job-dependent cost functions, and scheduling with rejection and explores a research direction proposed by Dániel Marx.
Abstract: Fixed-parameter tractability analysis and scheduling are two core domains of combinatorial optimization which led to deep understanding of many important algorithmic questions. However, even though fixed-parameter algorithms are appealing for many reasons, no such algorithms are known for many fundamental scheduling problems. In this paper we present the first fixed-parameter algorithms for classical scheduling problems such as makespan minimization, scheduling with job-dependent cost functions--one important example being weighted flow time--and scheduling with rejection. To this end, we identify crucial parameters that determine the problems' complexity. In particular, we manage to cope with the problem complexity stemming from numeric input values, such as job processing times, which is usually a core bottleneck in the design of fixed-parameter algorithms. We complement our algorithms with $$\mathsf {W[1]}$$W[1]-hardness results showing that for smaller sets of parameters the respective problems do not allow fixed-parameter algorithms. In particular, our positive and negative results for scheduling with rejection explore a research direction proposed by Daniel Marx.

Journal ArticleDOI
TL;DR: New fractional error bounds for polynomial systems with exponents explicitly determined by the dimension of the underlying space and the number/degree of the involved polynomials are derived.
Abstract: In this paper we derive new fractional error bounds for polynomial systems with exponents explicitly determined by the dimension of the underlying space and the number/degree of the involved polynomials. Our major result extends the existing error bounds from the system involving only a single polynomial to a general polynomial system and do not require any regularity assumptions. In this way we resolve, in particular, some open questions posed in the literature. The developed techniques are largely based on variational analysis and generalized differentiation, which allow us to establish, e.g., a nonsmooth extension of the seminal ?ojasiewicz's gradient inequality to maxima of polynomials with explicitly determined exponents. Our major applications concern quantitative Holderian stability of solution maps for parameterized polynomial optimization problems and nonlinear complementarity systems with polynomial data as well as high-order semismooth properties of the eigenvalues of symmetric tensors.

Journal ArticleDOI
TL;DR: It is shown that the convex relaxation has no gap for arbitrary m as long as the linear constraints are non-intersecting, which is equivalent to saying that the optimal value of eTRS is solvable in polynomial time.
Abstract: This paper studies an extended trust region subproblem (eTRS) in which the trust region intersects the unit ball with $$m$$ m linear inequality constraints. When $$m=0,\,m = 1$$ m = 0 , m = 1 , or $$m = 2$$ m = 2 and the linear constraints are parallel, it is known that the eTRS optimal value equals the optimal value of a particular convex relaxation, which is solvable in polynomial time. However, it is also known that, when $$m \ge 2$$ m ? 2 and at least two of the linear constraints intersect within the ball, i.e., some feasible point of the eTRS satisfies both linear constraints at equality, then the same convex relaxation may admit a gap with eTRS. This paper shows that the convex relaxation has no gap for arbitrary $$m$$ m as long as the linear constraints are non-intersecting.

Journal ArticleDOI
TL;DR: A general pattern for algorithms that maximize linear weight functions over “independent sets” is identified and it is proved that such algorithms can be adapted to maximize a submodular function.
Abstract: We study the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges. This problem, which we call maximum submodular-function matching (MSM), is a natural generalization of maximum weight matching (MWM), which is in turn a generalization of maximum cardinality matching. We give two incomparable algorithms for this problem with space usage falling in the semi-streaming range--they store only $$O(n)$$O(n) edges, using $$O(n\log n)$$O(nlogn) working memory--that achieve approximation ratios of 7.75 in a single pass and $$(3+\varepsilon )$$(3+?) in $$O(\varepsilon ^{-3})$$O(?-3) passes respectively. The operations of these algorithms mimic those of Zelke's and McGregor's respective algorithms for MWM; the novelty lies in the analysis for the MSM setting. In fact we identify a general framework for MWM algorithms that allows this kind of adaptation to the broader setting of MSM. Our framework is not specific to matchings. Rather, we identify a general pattern for algorithms that maximize linear weight functions over "independent sets" and prove that such algorithms can be adapted to maximize a submodular function. The notion of independence here is very general; in particular, appealing to known weight-maximization algorithms, we obtain results for submodular maximization over hypermatchings in hypergraphs as well as independent sets in the intersection of multiple matroids.

Journal ArticleDOI
TL;DR: The well-known symmetric rank-one trust-region method is generalized to the problem of minimizing a real-valued function over a Riemannian manifold and is shown to converge globally and $$d+1$$d-step q-superlinearly to stationary points of the objective function.
Abstract: The well-known symmetric rank-one trust-region method--where the Hessian approximation is generated by the symmetric rank-one update--is generalized to the problem of minimizing a real-valued function over a $$d$$ d -dimensional Riemannian manifold. The generalization relies on basic differential-geometric concepts, such as tangent spaces, Riemannian metrics, and the Riemannian gradient, as well as on the more recent notions of (first-order) retraction and vector transport. The new method, called RTR-SR1, is shown to converge globally and $$d+1$$ d + 1 -step q-superlinearly to stationary points of the objective function. A limited-memory version, referred to as LRTR-SR1, is also introduced. In this context, novel efficient strategies are presented to construct a vector transport on a submanifold of a Euclidean space. Numerical experiments--Rayleigh quotient minimization on the sphere and a joint diagonalization problem on the Stiefel manifold--illustrate the value of the new methods.

Journal ArticleDOI
TL;DR: The global convergence of the nonlinear stepsize control algorithm is proved under the assumption that the norm of the Hessians can grow by a constant amount at each iteration.
Abstract: A nonlinear stepsize control framework for unconstrained optimization was recently proposed by Toint (Optim Methods Softw 28:82---95, 2013), providing a unified setting in which the global convergence can be proved for trust-region algorithms and regularization schemes. The original analysis assumes that the Hessians of the models are uniformly bounded. In this paper, the global convergence of the nonlinear stepsize control algorithm is proved under the assumption that the norm of the Hessians can grow by a constant amount at each iteration. The worst-case complexity is also investigated. The results obtained for unconstrained smooth optimization are extended to some algorithms for composite nonsmooth optimization and unconstrained multiobjective optimization as well.

Journal ArticleDOI
TL;DR: This note provides a simple proof of a worst-case convergence rate measured by the iteration complexity for the Douglas–Rachford operator splitting method for finding a root of the sum of two maximal monotone set-valued operators.
Abstract: This note provides a simple proof of a worst-case convergence rate measured by the iteration complexity for the Douglas---Rachford operator splitting method for finding a root of the sum of two maximal monotone set-valued operators. The accuracy of an iterate to the solution set is measured by the residual of a characterization of the original problem, which is different from conventional measures such as the distance to the solution set.

Journal ArticleDOI
TL;DR: Novel relaxations for cardinality-constrained learning problems, including least-squares regression as a special but important case, are introduced, and it is shown that randomization based on the relaxed solution offers a principled way to generate provably good feasible solutions.
Abstract: We introduce novel relaxations for cardinality-constrained learning problems, including least-squares regression as a special but important case. Our approach is based on reformulating a cardinality-constrained problem exactly as a Boolean program, to which standard convex relaxations such as the Lasserre and Sherali-Adams hierarchies can be applied. We analyze the first-order relaxation in detail, deriving necessary and sufficient conditions for exactness in a unified manner. In the special case of least-squares regression, we show that these conditions are satisfied with high probability for random ensembles satisfying suitable incoherence conditions, similar to results on $$\ell _1$$l1-relaxations. In contrast to known methods, our relaxations yield lower bounds on the objective, and it can be verified whether or not the relaxation is exact. If it is not, we show that randomization based on the relaxed solution offers a principled way to generate provably good feasible solutions. This property enables us to obtain high quality estimates even if incoherence conditions are not met, as might be expected in real datasets. We numerically illustrate the performance of the relaxation-randomization strategy in both synthetic and real high-dimensional datasets, revealing substantial improvements relative to $$\ell _1$$l1-based methods and greedy selection heuristics.

Journal ArticleDOI
Samuel Burer1
TL;DR: This paper illustrates the fundamental connection that allows the reformulation of nonconvex quadratic problems as convex ones in a unified way by focusing on examples having just a few variables or a few constraints for which the quadRatic problem can be formulated as a copositive-style problem, which itself can be recast in terms of linear, second-order-cone, and semidefinite optimization.
Abstract: This paper illustrates the fundamental connection between nonconvex quadratic optimization and copositive optimization--a connection that allows the reformulation of nonconvex quadratic problems as convex ones in a unified way. We focus on examples having just a few variables or a few constraints for which the quadratic problem can be formulated as a copositive-style problem, which itself can be recast in terms of linear, second-order-cone, and semidefinite optimization. A particular highlight is the role played by the geometry of the feasible set.