Home
/
Authors
/
Reynold Cheng

Author

Reynold Cheng

Other affiliations: University of New South Wales, Hong Kong Polytechnic University, Purdue University

Bio: Reynold Cheng is an academic researcher from University of Hong Kong. The author has contributed to research in topics: Uncertain data & Probabilistic logic. The author has an hindex of 44, co-authored 188 publications receiving 7717 citations. Previous affiliations of Reynold Cheng include University of New South Wales & Hong Kong Polytechnic University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
1999

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Evaluating probabilistic queries over imprecise data

[...]

Reynold Cheng¹, Dmitri V. Kalashnikov¹, Sunil Prabhakar¹•Institutions (1)

Purdue University¹

09 Jun 2003

TL;DR: This paper addresses the important issue of measuring the quality of the answers to query evaluation based upon uncertain data, and provides algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve thequality of the executing queries.

...read moreread less

Abstract: Many applications employ sensors for monitoring entities such as temperature and wind speed. A centralized database tracks these entities to enable query processing. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), it is often infeasible to store the exact values at all times. A similar situation exists for moving object environments that track the constantly changing locations of objects. In this environment, it is possible for database queries to produce incorrect or invalid results based upon old data. However, if the degree of error (or uncertainty) between the actual value and the database value is controlled, one can place more confidence in the answers to queries. More generally, query answers can be augmented with probabilistic estimates of the validity of the answers. In this paper we study probabilistic query evaluation based upon uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies.

...read moreread less

632 citations

Journal Article•DOI•

Querying imprecise data in moving object environments

[...]

Reynold Cheng¹, Dmitri V. Kalashnikov², Sunil Prabhakar•Institutions (2)

Purdue University¹, University of California, Irvine²

05 Mar 2003

TL;DR: Algorithms for computing these queries are presented for a generic object movement model and detailed solutions are discussed for two common models of uncertainty in moving object databases.

...read moreread less

Abstract: In moving object environments, it is infeasible for the database tracking the movement of objects to store the exact locations of objects at all times. Typically, the location of an object is known with certainty only at the time of the update. The uncertainty in its location increases until the next update. In this environment, it is possible for queries to produce incorrect results based upon old data. However, if the degree of uncertainty is controlled, then the error of the answers to queries can be reduced. More generally, query answers can be augmented with probabilistic estimates of the validity of the answer. We study the execution of probabilistic range and nearest-neighbor queries. The imprecision in answers to queries is an inherent property of these applications due to uncertainty in data, unlike the techniques for approximate nearest-neighbor processing that trade accuracy for performance. Algorithms for computing these queries are presented for a generic object movement model and detailed solutions are discussed for two common models of uncertainty in moving object databases. We study the performance of these queries through extensive simulations.

...read moreread less

441 citations

Journal Article•DOI•

Truth inference in crowdsourcing: is the problem solved?

[...]

Yudian Zheng¹, Guoliang Li², Yuanbing Li², Caihua Shan¹, Reynold Cheng¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Tsinghua University²

01 Jan 2017

TL;DR: It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.

...read moreread less

Abstract: Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates addressing problems that are hard for computers, e.g., entity resolution and sentiment analysis. However, due to the openness of crowdsourcing, workers may yield low-quality answers, and a redundancy-based method is widely employed, which first assigns each task to multiple workers and then infers the correct answer (called truth) for the task based on the answers of the assigned workers. A fundamental problem in this method is Truth Inference, which decides how to effectively infer the truth. Recently, the database community and data mining community independently study this problem and propose various algorithms. However, these algorithms are not compared extensively under the same framework and it is hard for practitioners to select appropriate algorithms. To alleviate this problem, we provide a detailed survey on 17 existing algorithms and perform a comprehensive evaluation using 5 real datasets. We make all codes and datasets public for future research. Through experiments we find that existing algorithms are not stable across different datasets and there is no algorithm that outperforms others consistently. We believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions.

...read moreread less

376 citations

Book Chapter•DOI•

Preserving user location privacy in mobile data management infrastructures

[...]

Reynold Cheng¹, Yu Zhang², Elisa Bertino², Sunil Prabhakar²•Institutions (2)

Hong Kong Polytechnic University¹, Purdue University²

28 Jun 2006

TL;DR: A data model to augment uncertainty to location data is suggested, and imprecise queries that hide the location of the query issuer and yields probabilistic results are proposed that investigate the evaluation and quality aspects for a range query.

...read moreread less

Abstract: Location-based services, such as finding the nearest gas station, require users to supply their location information. However, a user's location can be tracked without her consent or knowledge. Lowering the spatial and temporal resolution of location data sent to the server has been proposed as a solution. Although this technique is effective in protecting privacy, it may be overkill and the quality of desired services can be severely affected. In this paper, we suggest a framework where uncertainty can be controlled to provide high quality and privacy-preserving services, and investigate how such a framework can be realized in the GPS and cellular network systems. Based on this framework, we suggest a data model to augment uncertainty to location data, and propose imprecise queries that hide the location of the query issuer and yields probabilistic results. We investigate the evaluation and quality aspects for a range query. We also provide novel methods to protect our solutions against trajectory-tracing. Experiments are conducted to examine the effectiveness of our approaches.

...read moreread less

345 citations

Proceedings Article•

Indexing multi-dimensional uncertain data with arbitrary probability density functions

[...]

Yufei Tao¹, Reynold Cheng², Xiaokui Xiao¹, Wang Kay Ngai³, Ben Kao³, Sunil Prabhakar⁴ - Show less +2 more•Institutions (4)

City University of Hong Kong¹, Hong Kong Polytechnic University², University of Hong Kong³, Purdue University⁴

30 Aug 2005

TL;DR: The U-tree is proposed, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data and is fully dynamic, and does not place any constraints on the data pdfs.

...read moreread less

Abstract: In an "uncertain database", an object o is associated with a multi-dimensional probability density function(pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the "probabilistic range search" which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the U-tree, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of U-trees with extensive experiments.

...read moreread less

310 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Proceedings Article•DOI•

metapath2vec: Scalable Representation Learning for Heterogeneous Networks

[...]

Yuxiao Dong¹, Nitesh V. Chawla¹, Ananthram Swami²•Institutions (2)

University of Notre Dame¹, United States Army Research Laboratory²

04 Aug 2017

TL;DR: Two scalable representation learning models, namely metapath2vec and metapATH2vec++, are developed that are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, but also discern the structural and semantic correlations between diverse network objects.

...read moreread less

Abstract: We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.

...read moreread less

1,794 citations

Journal Article•DOI•

The Analysis of Time Series: An Introduction

[...]

P. A. Blight

01 Mar 1991-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The analysis of time series: An Introduction, 4th edn. as discussed by the authors by C. Chatfield, C. Chapman and Hall, London, 1989. ISBN 0 412 31820 2.

...read moreread less

Abstract: The Analysis of Time Series: An Introduction, 4th edn. By C. Chatfield. ISBN 0 412 31820 2. Chapman and Hall, London, 1989. 242 pp. £13.50.

...read moreread less

1,583 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse