Convex Analysisの二,三の進展について

Home
/
Papers
/
Convex Analysisの二,三の進展について

Convex Analysisの二,三の進展について

徹丸山

01 Feb 1977-Vol. 70, Iss: 1, pp 97-119

About: The article was published on 1977-02-01 and is currently open access. It has received 5933 citations till now.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Journal Article•DOI•

Increasing Returns and Long-Run Growth

[...]

Paul M. Romer

01 Oct 1986-Journal of Political Economy

TL;DR: In this paper, the authors present a fully specified model of long-run growth in which knowledge is assumed to be an input in production that has increasing marginal productivity, which is essentially a competitive equilibrium model with endogenous technological change.

...read moreread less

Abstract: This paper presents a fully specified model of long-run growth in which knowledge is assumed to be an input in production that has increasing marginal productivity. It is essentially a competitive equilibrium model with endogenous technological change. In contrast to models based on diminishing returns, growth rates can be increasing over time, the effects of small disturbances can be amplified by the actions of private agents, and large countries may always grow faster than small countries. Long-run evidence is offered in support of the empirical relevance of these possibilities.

...read moreread less

18,200 citations

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Cites background from "Convex Analysisの二,三の進展について"

...The algorithm solves problems in the form minimize f(x) + g(z) subject to Ax+Bz = c (9) with variables x ∈ Rn and z ∈ Rm, where A ∈ Rp×n, B ∈ Rp×m, and c ∈ Rp....
[...]

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

An Algorithm for Vector Quantizer Design

[...]

Y. Linde¹, A. Buzo², Robert M. Gray³•Institutions (3)

Codex Corporation¹, National Autonomous University of Mexico², Stanford University³

01 Jan 1980-IEEE Transactions on Communications

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.

...read moreread less

Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

...read moreread less

7,935 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Proximal Splitting Methods in Signal Processing

[...]

Patrick L. Combettes¹, Jean-Christophe Pesquet•Institutions (1)

University of Paris¹

01 Jan 2011

TL;DR: The basic properties of proximity operators which are relevant to signal processing and optimization methods based on these operators are reviewed and proximal splitting methods are shown to capture and extend several well-known algorithms in a unifying framework.

...read moreread less

Abstract: The proximity operator of a convex function is a natural extension of the notion of a projection operator onto a convex set. This tool, which plays a central role in the analysis and the numerical solution of convex optimization problems, has recently been introduced in the arena of inverse problems and, especially, in signal processing, where it has become increasingly important. In this paper, we review the basic properties of proximity operators which are relevant to signal processing and present optimization methods based on these operators. These proximal splitting methods are shown to capture and extend several well-known algorithms in a unifying framework. Applications of proximal methods in signal recovery and synthesis are discussed.

...read moreread less

1,942 citations

Journal Article•DOI•

Regulating a monopolist with unknown costs

[...]

David P. Baron, Roger B. Myerson

01 Jul 1982-Econometrica

TL;DR: In this paper, the authors consider the problem of how to regulate a monopolistic firm whose costs are unknown to the regulator, and derive an optimal regulatory policy for the case in which the regulator does not know the costs of the firm.

...read moreread less

Abstract: We consider the problem of how to regulate a monopolistic firm whose costs are unknown to the regulator. The regulator's objective is to maximize a linear social welfare function of the consumers' surplus and the firm's profit. In the optimal regulatory policy, prices and subsidies are designed as functions of the firm's cost report so that expected social welfare is maximized, subject to the constraints that the firm has nonnegative profit and has no incentive to misrepresent its costs. We explicitly derive the optimal policy and analyze its properties. IN THEIR CLASSIC PAPERS Dupuit [2] and Hotelling [5] considered pricing policies for a bridge that had a fixed cost of construction and zero marginal cost. They demonstrated that the pricing policy that maximizes consumer well-being is to set price equal to marginal cost and to provide a subsidy to the supplier equal to the fixed cost, so that a firm would be willing to provide the bridge. This first-best solution is based on a number of informational assumptions. First, the demand function is assumed to be known to both the regulator and to the firm. While the assumption of complete information may be too strong, the assumption that information about demand is as available to the regulator as it is to the firm does not seem unnatural. A second informational assumption is that the regulator has complete information about the cost of the firm or at least has the same information about cost as does the firm. This assumption is unlikely to be met in reality, since the firm would be expected to have better information about costs than would the regulator. As Weitzman has stated, "An essential feature of the regulatory environment I am trying to describe is uncertainty about the exact specification of each firm's cost function. In most cases even the managers and engineers most closely associated with production will be unable to precisely specify beforehand the cheapest way to generate various hypothetical output levels. Because they are yet removed from the production process, the regulators are likely to be vaguer still about a firm's cost function" [12, p. 684]. As this observation suggests, it is natural to expect that a firm would have better information regarding its costs than would a regulator. The purpose of this paper is to develop an optimal regulatory policy for the case in which the regulator does not know the costs of the firm. One strategy that a regulator could use in the absence of full information about costs is to give the firm the title to the total social surplus and to delegate the pricing decision to the firm. In pursuing its own interests, which would then be to maximize the total social surplus, the firm would adopt the same marginal cost pricing strategy that the regulator would have imposed if the regulator had

...read moreread less

1,791 citations

Proceedings Article•DOI•

Feature selection, L1 vs. L2 regularization, and rotational invariance

[...]

Andrew Y. Ng¹•Institutions (1)

Stanford University¹

04 Jul 2004

TL;DR: A lower-bound is given showing that any rotationally invariant algorithm---including logistic regression with L1 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.

...read moreread less

Abstract: We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant algorithm---including logistic regression with L2 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.

...read moreread less

1,742 citations

Proceedings Article•DOI•

Clustering with Bregman Divergences

[...]

Arindam Banerjee¹, Srujana Merugu¹, Inderjit S. Dhillon¹, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Dec 2005

TL;DR: This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence.

...read moreread less

Abstract: A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans , the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the method to a large class of clustering loss functions. This is achieved by first posing the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate distortion theory, and then deriving an iterative algorithm that monotonically decreases this loss. In addition, we show that there is a bijection between regular exponential families and a large class of Bregman divergences, that we call regular Bregman divergences. This result enables the development of an alternative interpretation of an efficient EM scheme for learning mixtures of exponential family distributions, and leads to a simple soft clustering algorithm for regular Bregman divergences. Finally, we discuss the connection between rate distortion theory and Bregman clustering and present an information theoretic analysis of Bregman clustering algorithms in terms of a trade-off between compression and loss in Bregman information.

...read moreread less

1,723 citations

Journal Article•DOI•

Stability and generalization

[...]

Olivier Bousquet¹, André Elisseeff•Institutions (1)

École Polytechnique¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.

...read moreread less

Abstract: We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a real-valued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and Kullback-Leibler regularization. We demonstrate how to apply the results to SVM for regression and classification.

...read moreread less

1,690 citations

1
2
…
3
4
5
6
7
8
9
…
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse