Home
/
Authors
/
Byung Suk Lee

Author

Byung Suk Lee

Other affiliations: KAIST, University of St. Thomas (Minnesota), Stanford University

Bio: Byung Suk Lee is an academic researcher from University of Vermont. The author has contributed to research in topics: Query optimization & Tuple. The author has an hindex of 15, co-authored 82 publications receiving 758 citations. Previous affiliations of Byung Suk Lee include KAIST & University of St. Thomas (Minnesota).

Topics: Query optimization, Tuple, Sargable, Data stream mining, Query language ...read more

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
1999
1997
1996
1995
1994
1991
1990

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Adaptive-Size Reservoir Sampling over Data Streams

[...]

Mohammed Al-Kateb, Byung Suk Lee, X.S. Wang

09 Jul 2007

TL;DR: This paper presents a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted such that the resulting uniformity confidence exceeds a given threshold.

...read moreread less

Abstract: Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situations, however, in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling due to changes in data characteristics and/or application behavior. This paper studies adaptive size reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity. First, the paper conducts a theoretical study on the effects of adjusting the size of a reservoir while sampling is in progress. The theoretical results show that such an adjustment may bring a negative impact on the probability of the sample being uniform (called uniformity confidence herein). Second, the paper presents a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted such that the resulting uniformity confidence exceeds a given threshold. Third, the paper extends the proposed algorithm to an adaptive multi-reservoir sampling algorithm for a practical application in which samples are collected from memory-limited wireless sensor networks using a mobile sink. Finally, the paper empirically examines the adaptivity of the multi-reservoir sampling algorithm with regard to reservoir size and sample uniformity using real sensor networks data sets.

...read moreread less

60 citations

Book Chapter•DOI•

Performance Evaluation of Main-Memory R-tree Variants

[...]

Sangyong Hwang¹, Keun-Joo Kwon¹, Sang Kyun Cha¹, Byung Suk Lee²•Institutions (2)

Seoul National University¹, University of Vermont²

24 Jul 2003

TL;DR: This work is the first comprehensive performance study of main-memory R-tree variants, and provides a useful guideline in selecting the most suitable index structure in various cases.

...read moreread less

Abstract: There have been several techniques proposed for improving the performance of main-memory spatial indexes, but there has not been a comparative study of their performance In this paper we compare the performance of six main-memory R-tree variants: R-tree, R*-tree, Hilbert R-tree, CR-tree, CR*-tree, and Hilbert CR-tree CR*-trees and Hilbert CR-trees are respectively a natural extension of R*-trees and Hilbert R-trees by incorporating CR-trees’ quantized relative minimum bounding rectangle (QRMBR) technique Additionally, we apply the optimistic, latch-free index traversal (OLFIT) concurrency control mechanism for B-trees to the R-tree variants while using the GiST-link technique We perform extensive experiments in the two categories of sequential accesses and concurrent accesses, and pick the following best trees In sequential accesses, CR*-trees are the best for search, Hilbert R-trees for update, and Hilbert CR-trees for a mixture of them In concurrent accesses, Hilbert CR-trees for search if data is uniformly distributed, CR*-trees for search if data is skewed, Hilbert R-trees for update, and Hilbert CR-trees for a mixture of them We also provide detailed observations of the experimental results, and rationalize them based on the characteristics of the individual trees As far as we know, our work is the first comprehensive performance study of main-memory R-tree variants The results of our study provide a useful guideline in selecting the most suitable index structure in various cases

...read moreread less

58 citations

Journal Article•DOI•

Outer joins and filters for instantiating objects from relational databases through views

[...]

Byung Suk Lee¹, Gio Wiederhold•Institutions (1)

University of St. Thomas (Minnesota)¹

01 Feb 1994-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A rigorous system model is developed to facilitate the mapping between an object-oriented model and the relational model and reduces the number of left outer joins and the filters so that the query can be processed more efficiently.

...read moreread less

Abstract: One of the approaches for integrating object-oriented programs with databases is to instantiate objects from relational databases by evaluating view queries. In that approach, it is often necessary to evaluate some joins of the query by left outer joins to prevent information loss caused by the tuples discarded by inner joins. It is also necessary to filter some relations with selection conditions to prevent the retrieval of unwanted nulls. The system should automatically prescribe joins as inner or left outer joins and generate the filters, rather than letting them be specified manually for every view definition. We develop such a mechanism in this paper. We first develop a rigorous system model to facilitate the mapping between an object-oriented model and the relational model. The system model provides a well-defined context for developing a simple mechanism. The mechanism requires only one piece of information from users: null options on an object attribute. The semantics of these options are mapped to non-null constraints on the query result. Then the system prescribes joins and generates filters accordingly. We also address reducing the number of left outer joins and the filters so that the query can be processed more efficiently. >

...read moreread less

56 citations

Journal Article•DOI•

Aggregation in sensor networks with a user-provided quality of service goal

[...]

Zhen He¹, Byung Suk Lee², X. Sean Wang²•Institutions (2)

La Trobe University¹, University of Vermont²

01 May 2008-Information Sciences

TL;DR: An aggregation protocol and related algorithms for reaching a quality of service (QoS) goal that has a combined objective of lifetime and error and the key idea is to periodically modify a filter threshold for each sensor in a way that is optimal within the user objective.

...read moreread less

46 citations

Journal Article•DOI•

NETS: extremely fast outlier detection from a data stream via set-based processing

[...]

Susik Yoon¹, Jae-Gil Lee¹, Byung Suk Lee²•Institutions (2)

KAIST¹, University of Vermont²

01 Jul 2019

TL;DR: It is asserted that NETS opens a new possibility to real-time data stream outlier detection by realizing set-based early identification of outliers or inliers and taking advantage of the "net effect" between expired and new data points.

...read moreread less

Abstract: This paper addresses the problem of efficiently detecting outliers from a data stream as old data points expire from and new data points enter the window incrementally. The proposed method is based on a newly discovered characteristic of a data stream that the change in the locations of data points in the data space is typically very insignificant. This observation has led to the finding that the existing distance-based outlier detection algorithms perform excessive unnecessary computations that are repetitive and/or canceling out the effects. Thus, in this paper, we propose a novel set-based approach to detecting outliers, whereby data points at similar locations are grouped and the detection of outliers or inliers is handled at the group level. Specifically, a new algorithm NETS is proposed to achieve a remarkable performance improvement by realizing set-based early identification of outliers or inliers and taking advantage of the "net effect" between expired and new data points. Additionally, NETS is capable of achieving the same efficiency even for a high-dimensional data stream through two-level dimensional filtering. Comprehensive experiments using six real-world data streams show 5 to 25 times faster processing time than state-of-the-art algorithms with comparable memory consumption. We assert that NETS opens a new possibility to real-time data stream outlier detection.

...read moreread less

40 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Mediators in the architecture of future information systems

[...]

Gio Wiederhold¹•Institutions (1)

Stanford University¹

01 Mar 1992-IEEE Computer

TL;DR: A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications as discussed by the authors, which simplifies, abstracts, reduces, merges, and explains data.

...read moreread less

Abstract: For single databases, primary hindrances for end-user access are the volume of data that is becoming available, the lack of abstraction, and the need to understand the representation of the data. When information is combined from multiple databases, the major concern is the mismatch encountered in information representation and structure. Intelligent and active use of information requires a class of software modules that mediate between the workstation applications and the databases. It is shown that mediation simplifies, abstracts, reduces, merges, and explains data. A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. A model of information processing and information system components is described. The mediator architecture, including mediator interfaces, sharing of mediator modules, distribution of mediators, and triggers for knowledge maintenance, are discussed. >

...read moreread less

2,441 citations

Journal Article•DOI•

A survey on concept drift adaptation

[...]

João Gama¹, Indrė Žliobaitė², Albert Bifet, Mykola Pechenizkiy³, Abdelhamid Bouchachia⁴ - Show less +1 more•Institutions (4)

University of Porto¹, Aalto University², Eindhoven University of Technology³, Bournemouth University⁴

01 Mar 2014-ACM Computing Surveys

TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read moreread less

Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read moreread less

2,374 citations

Book•

コンピュータ・サイエンス : ACM computing surveys

[...]

共立出版株式会社

01 Jan 1978

1,055 citations

Book Chapter•DOI•

Multivariate Density Estimation

[...]

Jeffrey S. Simonoff¹•Institutions (1)

New York University¹

01 Jan 1996

TL;DR: Exploring and identifying structure is even more important for multivariate data than univariate data, given the difficulties in graphically presenting multivariateData and the comparative lack of parametric models to represent it.

...read moreread less

Abstract: Exploring and identifying structure is even more important for multivariate data than univariate data, given the difficulties in graphically presenting multivariate data and the comparative lack of parametric models to represent it. Unfortunately, such exploration is also inherently more difficult.

...read moreread less

920 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

Collapse