Home
/
Authors
/
Leonard K. M. Poon

Author

Leonard K. M. Poon

Other affiliations: Hong Kong University of Science and Technology, Hong Kong Institute of Education

Bio: Leonard K. M. Poon is an academic researcher from University of Hong Kong. The author has contributed to research in topics: Latent variable & Cluster analysis. The author has an hindex of 12, co-authored 36 publications receiving 353 citations. Previous affiliations of Leonard K. M. Poon include Hong Kong University of Science and Technology & Hong Kong Institute of Education.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Latent Tree Models for Hierarchical Topic Detection

[...]

Peixian Chen¹, Nevin L. Zhang¹, Tengfei Liu, Leonard K. M. Poon², Zhourong Chen¹, Farhan Khawar¹ - Show less +2 more•Institutions (2)

Hong Kong University of Science and Technology¹, University of Hong Kong²

01 Sep 2017-Artificial Intelligence

TL;DR: In this article, a hierarchical topic detection method is proposed where topics are obtained by clustering documents in multiple ways and each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics.

...read moreread less

41 citations

Posted Content•

Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering

[...]

Xiaopeng Li¹, Zhourong Chen¹, Leonard K. M. Poon², Nevin L. Zhang¹•Institutions (2)

Hong Kong University of Science and Technology¹, University of Hong Kong²

14 Mar 2018-arXiv: Learning

TL;DR: The latent tree variational autoencoder (LTVAE) as discussed by the authors is a variant of VAE where there is a superstructure of discrete latent variables on top of the latent features.

...read moreread less

Abstract: We investigate a variant of variational autoencoders where there is a superstructure of discrete latent variables on top of the latent features. In general, our superstructure is a tree structure of multiple super latent variables and it is automatically learned from data. When there is only one latent variable in the superstructure, our model reduces to one that assumes the latent features to be generated from a Gaussian mixture model. We call our model the latent tree variational autoencoder (LTVAE). Whereas previous deep learning methods for clustering produce only one partition of data, LTVAE produces multiple partitions of data, each being given by one super latent variable. This is desirable because high dimensional data usually have many different natural facets and can be meaningfully partitioned in multiple ways.

...read moreread less

37 citations

Journal Article•DOI•

Model-based clustering of high-dimensional data: Variable selection versus facet determination

[...]

Leonard K. M. Poon¹, Nevin L. Zhang¹, Tengfei Liu¹, April H. Liu¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jan 2013-International Journal of Approximate Reasoning

TL;DR: This paper proposes a generalization of the Gaussian mixture models and demonstrates its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously, to show that facet determination usually leads to better clustering results than variable selection.

...read moreread less

37 citations

Proceedings Article•

Variable Selection in Model-Based Clustering: To Do or To Facilitate

[...]

Leonard K. M. Poon¹, Nevin L. Zhang¹, Tao Chen², Yi Wang¹•Institutions (2)

Hong Kong University of Science and Technology¹, Chinese Academy of Sciences²

21 Jun 2010

TL;DR: A generalization of the Gaussian mixture model is proposed, its ability to cluster data along multiple facets is shown, and it is demonstrated it is often more reasonable to facilitate variable selection than to perform it.

...read moreread less

Abstract: Variable selection for cluster analysis is a difficult problem. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the "best" clustering may be misguided. It makes more sense to facilitate variable selection by domain experts, that is, to systematically identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture model, show its ability to cluster data along multiple facets, and demonstrate it is often more reasonable to facilitate variable selection than to perform it.

...read moreread less

33 citations

Journal Article•DOI•

Greedy learning of latent tree models for multidimensional clustering

[...]

Tengfei Liu¹, Nevin L. Zhang¹, Peixian Chen¹, April H. Liu¹, Leonard K. M. Poon², Yi Wang³ - Show less +2 more•Institutions (3)

Hong Kong University of Science and Technology¹, Hong Kong Institute of Education², Institute of High Performance Computing Singapore³

01 Jan 2015-Machine Learning

TL;DR: This paper proposes an algorithm called BI that can deal with data sets with hundreds of attributes that compares favorably with alternative methods that are not based on LTMs and empirically compares it with EAST and other more efficient LTM learning algorithms.

...read moreread less

Abstract: Real-world data are often multifaceted and can be meaningfully clustered in more than one way. There is a growing interest in obtaining multiple partitions of data. In previous work we learnt from data a latent tree model (LTM) that contains multiple latent variables (Chen et al. 2012). Each latent variable represents a soft partition of data and hence multiple partitions result in. The LTM approach can, through model selection, automatically determine how many partitions there should be, what attributes define each partition, and how many clusters there should be for each partition. It has been shown to yield rich and meaningful clustering results. Our previous algorithm EAST for learning LTMs is only efficient enough to handle data sets with dozens of attributes. This paper proposes an algorithm called BI that can deal with data sets with hundreds of attributes. We empirically compare BI with EAST and other more efficient LTM learning algorithms, and show that BI outperforms its competitors on data sets with hundreds of attributes. In terms of clustering results, BI compares favorably with alternative methods that are not based on LTMs.

...read moreread less

31 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Journal Article•DOI•

Handling incomplete heterogeneous data using VAEs

[...]

Alfredo Nazábal¹, Pablo M. Olmos², Zoubin Ghahramani³, Zoubin Ghahramani⁴, Isabel Valera⁵, Isabel Valera⁶ - Show less +2 more•Institutions (6)

The Turing Institute¹, Carlos III Health Institute², University of Cambridge³, Uber ⁴, Saarland University⁵, Max Planck Society⁶

01 Nov 2020-Pattern Recognition

TL;DR: A general framework to design VAEs suitable for fitting incomplete heterogenous data, which includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation of missing data is proposed.

...read moreread less

177 citations

Book•

Model-Based Clustering and Classification for Data Science

[...]

Charles Bouveyron¹, Gilles Celeux, T. Brendan Murphy², Adrian E. Raftery³•Institutions (3)

French Institute for Research in Computer Science and Automation¹, University College Dublin², University of Washington³

01 Jul 2019

TL;DR: In this paper, the authors frame cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions, such as how many clusters are there? which method should I use? How should I handle outliers.

...read moreread less

Abstract: Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.

...read moreread less

134 citations

Journal Article•DOI•

COVID-19 public sentiment insights and machine learning for tweets classification

[...]

Jim Samuel¹, G. G. Md. Nawaz Ali¹, Md. Mokhlesur Rahman², Md. Mokhlesur Rahman³, Ek Esawi¹, Yana Samuel⁴ - Show less +2 more•Institutions (4)

University of Charleston¹, University of North Carolina at Charlotte², Khulna University of Engineering & Technology³, Northeastern University⁴

01 Jun 2020-Information-an International Interdisciplinary Journal

TL;DR: Insight is provided into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations and two essential machine learning classification methods are provided.

...read moreread less

Abstract: Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

...read moreread less

118 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

Collapse