Home
/
Authors
/
Nikos Mamoulis

Author

Nikos Mamoulis

Other affiliations: University of Hong Kong, Max Planck Society, University of California, Riverside ...read more

Bio: Nikos Mamoulis is an academic researcher from University of Ioannina. The author has contributed to research in topics: Joins & Spatial query. The author has an hindex of 56, co-authored 282 publications receiving 11121 citations. Previous affiliations of Nikos Mamoulis include University of Hong Kong & Max Planck Society.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
1999
1998
1996

Papers

PDF

Open Access

More filters

Proceedings Article•

Improving microblog retrieval from exterior corpus by automatically constructing a microblogging corpus

[...]

Wenting Tu¹, David W. Cheung¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

25 Jan 2015

TL;DR: This work proposes a methodology that constructs a simulated microblog-ging corpus rather than directly building a model from the exterior corpus, and demonstrates the superiority of this technique compared to the previous approaches.

...read moreread less

Abstract: A large-scale training corpus consisting of microblogs belonging to a desired category is important for high-accuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and labor-consuming. Therefore, some models for the automatic retrieval of microblogs from an exterior corpus have been proposed. However, these approaches may fail in considering microblog-specific features. To alleviate this issue, we propose a methodology that constructs a simulated microblog-ging corpus rather than directly building a model from the exterior corpus. The performance of our model is better since the microblog-special knowledge of the microblogging corpus is used in the end by the retrieval model. Experimental results on real-world microblogs demonstrate the superiority of our technique compared to the previous approaches.

...read moreread less

1 citations

Posted Content•

Flow Computation in Temporal Interaction Networks

[...]

Chrysanthi Kosyfaki¹, Nikos Mamoulis¹, Evaggelia Pitoura¹, Panayiotis Tsaparas¹•Institutions (1)

University of Ioannina¹

04 Mar 2020-arXiv: Databases

TL;DR: This paper introduces the flow computation problem between two vertrices in an interaction network and proposes and studies two models of flow computation, one based on a greedy flow transfer assumption and one that finds the maximum possible flow.

...read moreread less

Abstract: Temporal interaction networks capture the history of activities between entities along a timeline. At each interaction, some quantity of data (money, information, kbytes, etc.) flows from one vertex of the network to another. Flow-based analysis can reveal important information. For instance, financial intelligent units (FIUs) are interested in finding subgraphs in transactions networks with significant flow of money transfers. In this paper, we introduce the flow computation problem in an interaction network or a subgraph thereof. We propose and study two models of flow computation, one based on a greedy flow transfer assumption and one that finds the maximum possible flow. We show that the greedy flow computation problem can be easily solved by a single scan of the interactions in time order. For the harder maximum flow problem, we propose graph precomputation and simplification approaches that can greatly reduce its complexity in practice. As an application of flow computation, we formulate and solve the problem of flow pattern search, where, given a graph pattern, the objective is to find its instances and their flows in a large interaction network. We evaluate our algorithms using real datasets. The results show that the techniques proposed in this paper can greatly reduce the cost of flow computation and pattern enumeration.

...read moreread less

1 citations

Proceedings Article•DOI•

Maximizing a record's standing in a relation

[...]

Yu Tang¹, Yilun Cai, Nikos Mamoulis²•Institutions (2)

University of Oxford¹, University of Hong Kong²

16 May 2016

TL;DR: A linear-time algorithm for determining the optimal selection range for an ordinal attribute and techniques for choosing and prioritizing the most promising selection predicates to apply are proposed.

...read moreread less

Abstract: Given a database table with records that can be ranked, an interesting problem is to identify selection conditions, which are qualified by an input record and render its ranking as high as possible among the qualifying tuples. In this paper, we study this standing maximization problem, which finds application in object promotion and characterization. We propose greedy methods, which are experimentally shown to achieve high accuracy compared to exhaustive enumeration, while scaling very well to the problem size. Our contributions include a lineartime algorithm for determining the optimal selection range for an attribute and techniques for choosing and prioritizing the most promising selection predicates to apply. Experiments on real datasets confirm the effectiveness and efficiency of our techniques.

...read moreread less

1 citations

Journal Article•DOI•

Three-dimensional Geospatial Interlinking with JedAI-spatial

[...]

Marios Papamichalopoulos, George Papadakis, G. Mandilaras, M. Siampou, Nikos Mamoulis, Manolis Koubarakis - Show less +2 more

04 May 2022-arXiv.org

TL;DR: JedAI-spatial is presented, a novel, open-source system that organizes interlinking algorithms according to three dimensions according to Space Tiling, Budget-awareness, and Execution mode, which discerns between serial algorithms, running on a single CPU-core, and parallel ones,Running on top of Apache Spark.

...read moreread less

Abstract: Geospatial data constitutes a considerable part of (Semantic) Web data, but so far, its sources are inadequately interlinked in the Linked Open Data cloud. Geospatial Interlinking aims to cover this gap by associating geometries with topological relations like those of the Dimensionally Extended 9-Intersection Model. Due to its quadratic time complexity, various algorithms aim to carry out Geospatial Interlinking efficiently. We present JedAI-spatial , a novel, open-source system that organizes these algorithms according to three dimensions: (i) Space Tiling , which determines the approach that reduces the search space, (ii) Budget-awareness , which distinguishes interlinking algorithms into batch and progressive ones, and (iii) Execution mode , which discerns between serial algorithms, running on a single CPU-core, and parallel ones, running on top of Apache Spark. We analytically describe JedAI-spatial’s architecture and capabilities and perform thorough experiments to provide in-teresting insights about the relative performance of its algorithms.

...read moreread less

1 citations

Privacy Preservation by Disassociation TR-IMIS-2011-1

[...]

Manolis Terrovitis, John Liagouris, Nikos Mamoulis, Spiros Skiadopoulos

01 Jan 2011

TL;DR: This work proposes an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record, and presents an algorithm that anonymizes the data by first clustering them and then locally disassociating identifying combinations of terms.

...read moreread less

Abstract: In this work, we focus on the preservation of user privacy in the publication of sparse multidimensional data. Existing works protect the users’ sensitive information by generalizing or suppressing quasi identifiers in the original data. In many real world cases, neither generalization nor the distinction between sensitive and non-sensitive items is appropriate. For example, web search query logs contain millions of terms that are very hard to categorize as sensitive or non sensitive. At the same time, a generalization-based anonymization would remove the most valuable information in the dataset; the original terms. Motivated by this problem, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. Up to now, such techniques were used to sever the link between quasiidentifiers and sensitive values in settings with a clear distinction between these types of values. Our proposal generalizes these techniques for sparse multidimensional data, where no such distinction holds. We protect the users’ privacy by disassociating combinations of terms that can act as quasi-identifiers from the rest of the record or by disassociating the constituent terms, so that the identifying combination cannot be accurately recognized. To this end, we present an algorithm that anonymizes the data by first clustering them and then locally disassociating identifying combinations of terms. We analyze the attack model and extend the km-anonymity guaranty to the aforementioned setting. We empirically evaluate our method on real and synthetic datasets.

...read moreread less

1 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
…
48
49
50
51
52
53
54
…
55
56
57
58
59

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

Journal Article•

When is nearest neighbor meaningful

[...]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

...read moreread less

1,992 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse