Home
/
Authors
/
Nikos Mamoulis

Author

Nikos Mamoulis

Other affiliations: University of Hong Kong, Max Planck Society, University of California, Riverside ...read more

Bio: Nikos Mamoulis is an academic researcher from University of Ioannina. The author has contributed to research in topics: Joins & Spatial query. The author has an hindex of 56, co-authored 282 publications receiving 11121 citations. Previous affiliations of Nikos Mamoulis include University of Hong Kong & Max Planck Society.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
1999
1998
1996

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Iterative projected clustering by subspace mining

[...]

Man Lung Yiu¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

01 Feb 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a technique that improves the efficiency of a projected clustering algorithm (DOC), an optimized adaptation of the frequent pattern tree growth method used for mining frequent itemsets that significantly improves on the accuracy and speed of previous techniques.

...read moreread less

Abstract: Irrelevant attributes add noise to high-dimensional clusters and render traditional clustering techniques inappropriate. Recently, several algorithms that discover projected clusters and their associated subspaces have been proposed. We realize the analogy between mining frequent itemsets and discovering dense projected clusters around random points. Based on this, we propose a technique that improves the efficiency of a projected clustering algorithm (DOC). Our method is an optimized adaptation of the frequent pattern tree growth method used for mining frequent itemsets. We propose several techniques that employ the branch and bound paradigm to efficiently discover the projected clusters. An experimental study with synthetic and real data demonstrates that our technique significantly improves on the accuracy and speed of previous techniques.

...read moreread less

79 citations

Proceedings Article•

Algorithms for Querying by Spatial Structure

[...]

Dimitris Papadias, Nikos Mamoulis, Vasilis Delis

24 Aug 1998

TL;DR: A flexible framework is described which permits the representation of configurations in different resolution levels and supports the automatic derivation of similarity measures and three algorithms for structural query processing which integrate constraint satisfaction with spatial indexing (R-trees) are proposed.

...read moreread less

Abstract: Structural queries constitute a special form of content-based retrieval where the user specifies a set of spatial constraints among query variables and asks for all configurations of actual objects that (totally or partially) match these constraints. Processing such queries can be thought of as a general form of spatial joins, i.e., instead of pairs, the result consists of n-tuples of objects, where n is the number of query variables. In this paper we describe a flexible framework which permits the representation of configurations in different resolution levels and supports the automatic derivation of similarity measures. We subsequently propose three algorithms for structural query processing which integrate constraint satisfaction with spatial indexing (R-trees). For each algorithm we apply several optimization techniques and experimentally evaluate performance using real data.

...read moreread less

79 citations

Posted Content•

Privacy Preservation by Disassociation

[...]

Manolis Terrovitis, John Liagouris, Nikos Mamoulis, Spiros Skiadopoulos

30 Jun 2012-arXiv: Databases

TL;DR: In this article, the authors proposed an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record.

...read moreread less

Abstract: In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.

...read moreread less

77 citations

Proceedings Article•

One-pass wavelet synopses for maximum-error metrics

[...]

Panagiotis Karras¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

30 Aug 2005

TL;DR: This paper develops efficient and practicable wavelet thresholding algorithms for maximum-error metrics, for both a static and a streaming case, that achieve near-optimal accuracy and superior runtime performance, under frugal space requirements in both contexts.

...read moreread less

Abstract: We study the problem of computing wavelet-based synopses for massive data sets in static and streaming environments. A compact representation of a data set is obtained after a thresholding process is applied on the coefficients of its wavelet decomposition. Existing polynomial-time thresholding schemes that minimize maximum error metrics are disadvantaged by impracticable time and space complexities and are not applicable in a data stream context. This is a cardinal issue, as the problem at hand in its most practically interesting form involves the time-efficient approximation of huge amounts of data, potentially in a streaming environment. In this paper we fill this gap by developing efficient and practicable wavelet thresholding algorithms for maximum-error metrics, for both a static and a streaming case. Our algorithms achieve near-optimal accuracy and superior runtime performance, as our experiments show, under frugal space requirements in both contexts.

...read moreread less

77 citations

Journal Article•DOI•

Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data

[...]

Man Lung Yiu¹, Nikos Mamoulis², Xiangyuan Dai², Yufei Tao², Michail Vaitis³ - Show less +1 more•Institutions (3)

Aalborg University¹, University of Hong Kong², University of the Aegean³

01 Jan 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conducts an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

Abstract: We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

77 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
…
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

Journal Article•

When is nearest neighbor meaningful

[...]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

...read moreread less

1,992 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse