scispace - formally typeset
Search or ask a question

Showing papers by "Paolo Ciaccia published in 2000"


Proceedings ArticleDOI
29 Feb 2000
TL;DR: This paper describes sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound, and provides experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.
Abstract: In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called "curse of dimensionality". This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+/spl epsiv/) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound /spl epsiv/ can be exceeded with probability /spl delta/ and both /spl epsiv/ and /spl delta/ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable /spl epsiv/ and /spl delta/ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

161 citations


Book ChapterDOI
14 Feb 2000
TL;DR: An integrated algebraic framework which allows many relevant aspects of similarity query processing to be dealt with, and defines a generic similarity algebra, SAMEW, where semantics of operators is deliberately left unspecified in order to better adapt to specific scenarios.
Abstract: Specification and efficient processing of similarity queries on multimedia databases have recently attracted several research efforts, even if most of them have considered specific aspects, such as indexing, of this new exciting scenario. In this paper we try to remedy this by presenting an integrated algebraic framework which allows many relevant aspects of similarity query processing to be dealt with. As a starting point, we assume the more general case where "imprecision" is already present at the data level, typically because of the ambiguous nature of multimedia objects' content. We then define a generic similarity algebra, SAMEW, where semantics of operators is deliberately left unspecified in order to better adapt to specific scenarios. A basic feature of SAMEW is that it allows user preferences, expressed in the form of weights, to be specified so as to alter the default behavior of most operators. Finally, we discuss some issues related to "approximation" and to "user evaluation" of query results.

38 citations


01 Jan 2000
TL;DR: The proposed approach combines within a single index structure information from multiple metric spaces, thus being able to efficiently support queries on arbitrary combinations of indexed features.
Abstract: Motivated by the needs for efficient similarity retrieval in multimedia digital libraries, we present basic principles of a new paged and balanced index structure, the M -tree The M-tree can be applied whenever “complex” range and/or best matches queries over different descriptions (features) of objects need to be solved The proposed approach combines within a single index structure information from multiple metric spaces, thus being able to efficiently support queries on arbitrary combinations of indexed features Efficiency of the structure is presented through preliminary experimental results over a real-world data-set

34 citations


Proceedings ArticleDOI
06 Sep 2000
TL;DR: This work proposes the first provably sound algorithm for performing region-based similarity search when regions are accessed through an index, and demonstrates the effectiveness of this approach as also compared to alternative retrieval strategies.
Abstract: Region-based image retrieval systems aim to improve the effectiveness of content-based search by decomposing each image into a set of "homogeneous" regions. Thus, similarity between images is assessed by computing similarity between pairs of regions and then combining the results at the image level. We propose the first provably sound algorithm for performing region-based similarity search when regions are accessed through an index. Experimental results demonstrate the effectiveness of our approach, as also compared to alternative retrieval strategies.

29 citations



01 Jan 2000
TL;DR: In this paper, the authors propose to use multidimensional unbalanced wavelets that are used to store the parameters determined during the feedback process and to predict parameter settings for queries similar to earlier ones by interpolation.
Abstract: User feedback has proven very successful to query large multimedia databases. Due to the nature of the data representation and the mismatch between mathematical models and human perception, the query techniques benefit substantially from interactively modifying a query. Typical examples are generalized ellipsoid queries where optimal ratios and orientations of the half-axes are determined by relevance feedback. However, no information about the outcome of a feedback process is stored whatsoever once the process is terminated. Accordingly, the entire feedback loop has to be repeated—starting out with default parameters—if the same query is posed again. In this paper we present preliminary results on how to preserve feedback results in a space efficient way and learn from user feedback. The cornerstone of our system are multidimensional unbalanced wavelets that are used to store the parameters determined during the feedback process. Using wavelets lets us not only store parameter combinations but also enables us to predict parameter settings for queries similar to earlier ones by interpolation: the feedback process for an entirely new query can be started with a parameter setting, usually much closer to the optimal than the default parameters. As a result, after an initial learning phase, feedback is needed for fine tuning only, increasing effectiveness and response time of multimedia databases.

1 citations