scispace - formally typeset
Search or ask a question

Showing papers by "Eugene J. Shekita published in 2005"


Patent
10 Aug 2005
TL;DR: In this article, a thread processor analyzes the EMT threads and records the thread configuration data, and a query manager utilizes the thread configurations data to conduct selective searches of EMT volume.
Abstract: A method includes describing the thread configurations of a volume of well-ordered electronic message transmissions (EMT) and utilizing the thread configuration data to conduct selective searches of the EMT volume. An apparatus includes a thread processor and a query manager. The thread processor analyzes the EMT threads and records the thread configuration data. The query manager utilizes the thread configuration data to conduct selective searches of the EMT volume.

66 citations


Proceedings ArticleDOI
31 Oct 2005
TL;DR: TwigOptimal is described, a new holistic twig join algorithm with optimal cursor movement that can use information in the return clause of XQuery to boost its performance and experimental results are presented, showing TwgOptimal's superiority over existing holistic Twig join algorithms.
Abstract: Holistic twig join algorithms represent the state of the art for evaluating path expressions in XML queries. Using inverted indexes on XML elements, holistic twig joins move a set of index cursors in a coordinated way to quickly find structural matches. Because each cursor move can trigger I/O, the performance of a holistic twig join is largely determined by how many cursor moves it makes, yet, surprisingly, existing join algorithms have not been optimized along these lines. In this paper, we describe TwigOptimal, a new holistic twig join algorithm with optimal cursor movement. We sketch the proof of TwigOptimal's optimality, and describe how TwigOptimal can use information in the return clause of XQuery to boost its performance. Finally, experimental results are presented, showing TwigOptimal's superiority over existing holistic twig join algorithms.

49 citations


Proceedings ArticleDOI
05 Apr 2005
TL;DR: This work proposes a new family of inverted list indices and associated query algorithms that can support SVR efficiently in update-intensive databases, where the structured data values (and hence the scores of documents) change frequently.
Abstract: We propose a new ranking paradigm for relational databases called Structured Value Ranking (SVR). SVR uses structured data values to score (rank) the results of keyword search queries over text columns. Our main contribution is a new family of inverted list indices and associated query algorithms that can support SVR efficiently in update-intensive databases, where the structured data values (and hence the scores of documents) change frequently. Our experimental results on real and synthetic data sets using BerkeleyDB show that we can support SVR efficiently in relational databases.

13 citations


Patent
12 Jan 2005
TL;DR: In this paper, a method for indexing a plurality of documents, that includes plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality, and then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicates group.
Abstract: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

12 citations


Proceedings ArticleDOI
Chavdar Botev1, Nadav Eiron2, Marcus Fontoura2, Ning Li2, Eugene J. Shekita2 
31 Oct 2005
TL;DR: This paper shows that a new index organization based on static score bucketing significantly improves in index build performance while having minimal impact on the quality of search results.
Abstract: Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires complex index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results.

1 citations