Home
/
Authors
/
Haesun Park

Author

Haesun Park

Other affiliations: University of Zagreb, Cornell University, Korea Institute for Advanced Study ...read more

Bio: Haesun Park is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Non-negative matrix factorization & Cluster analysis. The author has an hindex of 53, co-authored 235 publications receiving 12188 citations. Previous affiliations of Haesun Park include University of Zagreb & Cornell University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1986

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Orthogonal nonnegative matrix t-factorizations for clustering

[...]

Chris Ding¹, Tao Li², Wei Peng², Haesun Park³•Institutions (3)

Lawrence Berkeley National Laboratory¹, Florida International University², Georgia Institute of Technology³

20 Aug 2006

TL;DR: This work provides a new approach of evaluating the quality of clustering on words using class aggregate distribution and multi-peak distribution and provides new rules for updating $F,S, G$ and proves the convergence of these algorithms.

...read moreread less

Abstract: Currently, most research on nonnegative matrix factorization (NMF)focus on 2-factor $X=FG^T$ factorization. We provide a systematicanalysis of 3-factor $X=FSG^T$ NMF. While it unconstrained 3-factor NMF is equivalent to it unconstrained 2-factor NMF, itconstrained 3-factor NMF brings new features to it constrained 2-factor NMF. We study the orthogonality constraint because it leadsto rigorous clustering interpretation. We provide new rules for updating $F,S, G$ and prove the convergenceof these algorithms. Experiments on 5 datasets and a real world casestudy are performed to show the capability of bi-orthogonal 3-factorNMF on simultaneously clustering rows and columns of the input datamatrix. We provide a new approach of evaluating the quality ofclustering on words using class aggregate distribution andmulti-peak distribution. We also provide an overview of various NMF extensions andexamine their relationships.

...read moreread less

1,211 citations

Journal Article•DOI•

Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis

[...]

Hyun-Chul Kim¹, Haesun Park¹•Institutions (1)

Georgia Institute of Technology¹

01 Jun 2007-Bioinformatics

TL;DR: The experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms.

...read moreread less

Abstract: Motivation: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. Results: In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. Availability: The software is available as supplementary material. Contact:hskim@cc.gatech.edu, hpark@acc.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

813 citations

Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares

[...]

Hyun-Chul Kim¹, Haesun Park¹•Institutions (1)

Georgia Institute of Technology¹

01 Jan 2006

TL;DR: In this paper, the authors introduced sparse NMFs via alternating non-negativity-constrained least squares (NNCS) for cancer class discovery and gene expression data analysis.

...read moreread less

Abstract: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Non-negative matrix factorization (NMF) is a useful technique in approximating these high dimensional data. Sparse NMFs are also useful when we need to control the degree of sparseness in non-negative basis vectors or non-negative lower-dimensional representations. In this paper, we introduce novel sparse NMFs via alternating non-negativity-constrained least squares. We applied one of the proposed sparse NMFs to cancer class discovery and gene expression data analysis. Our experimental results illustrate that our proposed method achieves better clustering performance than NMF based on multiplicative update rules and sparse NMFs based on the gradient descent method.

...read moreread less

662 citations

Journal Article•DOI•

Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

[...]

Hyun-Chul Kim¹, Haesun Park•Institutions (1)

Georgia Institute of Technology¹

01 May 2008-SIAM Journal on Matrix Analysis and Applications

TL;DR: This paper introduces an algorithm for NMF based on alternating nonnegativity constrained least squares (NMF/ANLS) and the active set-based fast algorithm for nonNegativity constrained most squares with multiple right-hand side vectors, and discusses its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions.

...read moreread less

Abstract: Nonnegative matrix factorization (NMF) determines a lower rank approximation of a matrix $A \in \mathbb{R}^{m \times n} \approx WH$ where an integer $k \ll \min(m,n)$ is given and nonnegativity is imposed on all components of the factors $W \in \mathbb{R}^{m \times k}$ and $H \in \mathbb{R}^{k \times n}$. NMF has attracted much attention for over a decade and has been successfully applied to numerous data analysis problems. In applications where the components of the data are necessarily nonnegative, such as chemical concentrations in experimental results or pixels in digital images, NMF provides a more relevant interpretation of the results since it gives nonsubtractive combinations of nonnegative basis vectors. In this paper, we introduce an algorithm for NMF based on alternating nonnegativity constrained least squares (NMF/ANLS) and the active set-based fast algorithm for nonnegativity constrained least squares with multiple right-hand side vectors, and we discuss its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions. In addition, we also describe algorithms for sparse NMFs and regularized NMF. We show how we impose a sparsity constraint on one of the factors by $L_1$-norm minimization and discuss its convergence properties. Our algorithms are compared to other commonly used NMF algorithms in the literature on several test data sets in terms of their convergence behavior.

...read moreread less

612 citations

Journal Article•DOI•

Missing value estimation for DNA microarray gene expression data: local least squares imputation

[...]

Hyunsoo Kim¹, Gene H. Golub², Haesun Park¹•Institutions (2)

University of Minnesota¹, Stanford University²

15 Jan 2005-Bioinformatics

TL;DR: Imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.

...read moreread less

Abstract: Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. Availability: The software is available at http://www.cs.umn.edu/~hskim/tools.html Contact: hpark@cs.umn.edu

...read moreread less

493 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Book•

Support Vector Machines

[...]

Ingo Steinwart, Andreas Christmann

12 Aug 2008

TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.

...read moreread less

Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

...read moreread less

4,664 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Journal Article•DOI•

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

[...]

Yan, Xu, Zhang, Yang, Lin - Show less +1 more

01 Jan 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new supervised dimensionality reduction algorithm called marginal Fisher analysis is proposed in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizing the interclass separability.

...read moreread less

Abstract: A large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called marginal Fisher analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional linear discriminant analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions

...read moreread less

2,751 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse