Home
/
Authors
/
Paolo Ciaccia

Author

Paolo Ciaccia

Bio: Paolo Ciaccia is an academic researcher from University of Bologna. The author has contributed to research in topics: Nearest neighbor search & Skyline. The author has an hindex of 26, co-authored 122 publications receiving 4212 citations.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1989
1988

Papers

PDF

Open Access

More filters

Proceedings Article•

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

[...]

Paolo Ciaccia¹, Marco Patella, Pavel Zezula•Institutions (1)

University of Bologna¹

25 Aug 1997

TL;DR: The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

...read moreread less

Abstract: A new access method, called M-tree, is proposed to organize and search large data sets from a generic “metric space”, i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O’s and the number of distance computations. The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

...read moreread less

1,792 citations

Journal Article•DOI•

WARP: accurate retrieval of shapes using phase of Fourier descriptors and time warping distance

[...]

Ilaria Bartolini¹, Paolo Ciaccia¹, Marco Patella¹•Institutions (1)

University of Bologna¹

01 Jan 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a novel Fourier-based approach, called WARP, for matching and retrieving similar shapes, which exploits the phase of Fourier coefficients and the use of the dynamic time warping distance to compare shape descriptors.

...read moreread less

Abstract: Effective and efficient retrieval of similar shapes from large image databases is still a challenging problem in spite of the high relevance that shape information can have in describing image contents. We propose a novel Fourier-based approach, called WARP, for matching and retrieving similar shapes. The unique characteristics of WARP are the exploitation of the phase of Fourier coefficients and the use of the dynamic time warping (DTW) distance to compare shape descriptors. While phase information provides a more accurate description of object boundaries than using only the amplitude of Fourier coefficients, the DTW distance permits us to accurately match images even in the presence of (limited) phase shillings. In terms of classical precision/recall measures, we experimentally demonstrate that WARP can gain, say, up to 35 percent in precision at a 20 percent recall level with respect to Fourier-based techniques that use neither phase nor DTW distance.

...read moreread less

225 citations

Journal Article•DOI•

Efficient sort-based skyline evaluation

[...]

Ilaria Bartolini¹, Paolo Ciaccia¹, Marco Patella¹•Institutions (1)

University of Bologna¹

12 Dec 2008-ACM Transactions on Database Systems

TL;DR: Salinas as discussed by the authors is a novel skyline algorithm that exploits the idea of presorting the input data so as to effectively limit the number of tuples to be read and compared, which makes salsa also attractive when skyline queries are executed on top of systems that do not understand skyline semantics.

...read moreread less

Abstract: Skyline queries compute the set of Pareto-optimal tuples in a relation, that is, those tuples that are not dominated by any other tuple in the same relation. Although several algorithms have been proposed for efficiently evaluating skyline queries, they either necessitate the relation to have been indexed or have to perform the dominance tests on all the tuples in order to determine the result. In this article we introduce salsa, a novel skyline algorithm that exploits the idea of presorting the input data so as to effectively limit the number of tuples to be read and compared. This makes salsa also attractive when skyline queries are executed on top of systems that do not understand skyline semantics, or when the skyline logic runs on clients with limited power and/or bandwidth. We prove that, if one considers symmetric sorting functions, the number of tuples to be read is minimized by sorting data according to a “minimum coordinate,” minC, criterion, and that performance can be further improved if data distribution is known and an asymmetric sorting function is used. Experimental results obtained on synthetic and real datasets show that salsa consistently outperforms state-of-the-art sequential skyline algorithms and that its performance can be accurately predicted.

...read moreread less

206 citations

Proceedings Article•DOI•

PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces

[...]

Paolo Ciaccia¹, Marco Patella¹•Institutions (1)

University of Bologna¹

29 Feb 2000

TL;DR: This paper describes sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound, and provides experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

...read moreread less

Abstract: In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called "curse of dimensionality". This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+/spl epsiv/) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound /spl epsiv/ can be exceeded with probability /spl delta/ and both /spl epsiv/ and /spl delta/ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable /spl epsiv/ and /spl delta/ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

...read moreread less

161 citations

Proceedings Article•DOI•

SaLSa: computing the skyline without scanning the whole sky

[...]

Ilaria Bartolini¹, Paolo Ciaccia¹, Marco Patella¹•Institutions (1)

University of Bologna¹

06 Nov 2006

TL;DR: SaLSa (Sort and Limit Skyline algorithm), which exploits the sorting machinery of a relational engine to order tuples so that only a subset of them needs to be examined for computing the skyline result.

...read moreread less

Abstract: Skyline queries compute the set of Pareto-optimal tuples in a relation, ie those tuples that are not dominated by any other tuple in the same relation. Although several algorithms have been proposed for efficiently evaluating skyline queries, they either require to extend the relational server with specialized access methods (which is not always feasible) or have to perform the dominance tests on all the tuples in order to determine the result. In this paper we introduce SaLSa (Sort and Limit Skyline algorithm), which exploits the sorting machinery of a relational engine to order tuples so that only a subset of them needs to be examined for computing the skyline result. This makes SaLSa particularly attractive when skyline queries are executed on top of systems that do not understand skyline semantics or when the skyline logic runs on clients with limited power and/or bandwidth.

...read moreread less

140 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Content-based image retrieval at the end of the early years

[...]

Arnold W. M. Smeulders¹, Marcel Worring¹, Simone Santini², Amarnath Gupta², Ramesh Jain - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of California, San Diego²

01 Dec 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

...read moreread less

6,447 citations

物件導向軟體之架構(Object-Oriented Software Construction)探討

[...]

簡聰富

01 Dec 1989

4,898 citations

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje, Wouter Meuleman, Jason Ernst, Angela Yen, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Lucas D. Ward, Abhishek Sarkar, Gerald Quon, Matthew L. Eaton, Yi-Chieh Wu, Andreas R. Pfenning, Xinchen Wang, Melina Claussnitzer, Yaping Liu, Mukul S. Bansal, Soheil Feizi-Khankandi, Ah Ram Kim, Richard C Sallari, Nicholas A Sinnott-Armstrong, Laurie A. Boyer, Elizabeta Gjoneska, Li-Huei Tsai, Manolis Kellis - Show less +21 more

01 Feb 2015

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.

...read moreread less

Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read moreread less

4,409 citations

Journal Article•DOI•

OPTICS: ordering points to identify the clustering structure

[...]

Mihael Ankerst¹, Markus M. Breunig¹, Hans-Peter Kriegel¹, Jörg Sander¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jun 1999

TL;DR: A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.

...read moreread less

Abstract: Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

...read moreread less

4,020 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse