Home
/
Authors
/
Hong Cheng

Author

Hong Cheng

Other affiliations: University of Illinois at Urbana–Champaign, Hong Kong Baptist University, Universidade Nova de Lisboa ...read more

Bio: Hong Cheng is an academic researcher from The Chinese University of Hong Kong. The author has contributed to research in topics: Large Hadron Collider & Lepton. The author has an hindex of 50, co-authored 267 publications receiving 12119 citations. Previous affiliations of Hong Cheng include University of Illinois at Urbana–Champaign & Hong Kong Baptist University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Frequent pattern mining: current status and future directions

[...]

Jiawei Han¹, Hong Cheng¹, Dong Xin¹, Xifeng Yan¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Aug 2007-Data Mining and Knowledge Discovery

TL;DR: It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.

...read moreread less

Abstract: Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.

...read moreread less

1,448 citations

Journal Article•DOI•

Graph clustering based on structural/attribute similarities

[...]

Yang Zhou¹, Hong Cheng¹, Jeffrey Xu Yu¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Aug 2009

TL;DR: This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.

...read moreread less

Abstract: The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparison with the state-of-the-art graph clustering and summarization methods.

...read moreread less

865 citations

Proceedings Article•DOI•

RankClus: integrating clustering with ranking for heterogeneous information network analysis

[...]

Yizhou Sun¹, Jiawei Han¹, Peixiang Zhao¹, Zhijun Yin¹, Hong Cheng², Tianyi Wu¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, The Chinese University of Hong Kong²

24 Mar 2009

TL;DR: This paper addresses the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed information network, and proposes a novel clustering framework called RankClus that directly generates clusters integrated with ranking.

...read moreread less

Abstract: As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well.In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.

...read moreread less

399 citations

Proceedings Article•DOI•

Querying k-truss community in large and dynamic graphs

[...]

Xin Huang¹, Hong Cheng¹, Lu Qin², Wentao Tian¹, Jeffrey Xu Yu¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, University of Technology, Sydney²

18 Jun 2014

TL;DR: A novel community model based on the k-truss concept is proposed, which brings nice structural and computational properties and a compact and elegant index structure which supports the efficient search of k- Truss communities with a linear cost with respect to the community size.

...read moreread less

Abstract: Community detection which discovers densely connected structures in a network has been studied a lot. In this paper, we study online community search which is practically useful but less studied in the literature. Given a query vertex in a graph, the problem is to find meaningful communities that the vertex belongs to in an online manner. We propose a novel community model based on the k-truss concept, which brings nice structural and computational properties. We design a compact and elegant index structure which supports the efficient search of k-truss communities with a linear cost with respect to the community size. In addition, we investigate the k-truss community search problem in a dynamic graph setting with frequent insertions and deletions of graph vertices and edges. Extensive experiments on large real-world networks demonstrate the effectiveness and efficiency of our community model and search algorithms.

...read moreread less

381 citations

Proceedings Article•DOI•

Discriminative Frequent Pattern Analysis for Effective Classification

[...]

Hong Cheng¹, Xifeng Yan², Jiawei Han¹, Hsu Chih-Wei¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

15 Apr 2007

TL;DR: This paper develops a strategy to set minimum support in frequent pattern mining for generating useful patterns, and demonstrates that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets.

...read moreread less

Abstract: The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent pattern-based classification, and provide solid reasons supporting this methodology. It was well known that feature combinations (patterns) could capture more underlying semantics than single features. However, inclusion of infrequent patterns may not significantly improve the accuracy due to their limited predictive power. By building a connection between pattern frequency and discriminative measures such as information gain and Fisher score, we develop a strategy to set minimum support in frequent pattern mining for generating useful patterns. Based on this strategy, coupled with a proposed feature selection algorithm, discriminative frequent patterns can be generated for building high quality classifiers. We demonstrate that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets. Empirical studies indicate that significant improvement in classification accuracy is achieved (up to 12% in UCI datasets) using the so-selected discriminative frequent patterns.

...read moreread less

379 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Book•

计量经济分析 = Econometric analysis

[...]

William H. Greene, 成思张

01 Jan 2009

8,216 citations

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Journal Article•

The ATLAS Experiment at the CERN Large Hadron Collider

[...]

A.T. Goshaw

30 Oct 2008-Bulletin of the American Physical Society

TL;DR: In this paper, the ATLAS experiment is described as installed in i ts experimental cavern at point 1 at CERN and a brief overview of the expec ted performance of the detector is given.

...read moreread less

Abstract: This paper describes the ATLAS experiment as installed in i ts experimental cavern at point 1 at CERN. It also presents a brief overview of the expec ted performance of the detector.

...read moreread less

2,798 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse