Showing papers by "Eugene J. Shekita published in 2005"

PDF

Open Access

Patent•

Indexing and searching of electronic message transmission thread sets

[...]

Andrei Z. Broder¹, Nadav Eiron¹, Marcus Fontoura¹, Michael Herscovici¹, Ronny Lempel¹, John McPherson¹, Eugene J. Shekita¹ - Show less +3 more•Institutions (1)

IBM¹

10 Aug 2005

TL;DR: In this article, a thread processor analyzes the EMT threads and records the thread configuration data, and a query manager utilizes the thread configurations data to conduct selective searches of EMT volume.

...read moreread less

Abstract: A method includes describing the thread configurations of a volume of well-ordered electronic message transmissions (EMT) and utilizing the thread configuration data to conduct selective searches of the EMT volume. An apparatus includes a thread processor and a query manager. The thread processor analyzes the EMT threads and records the thread configuration data. The query manager utilizes the thread configuration data to conduct selective searches of the EMT volume.

...read moreread less

66 citations

Proceedings Article•DOI•

Optimizing cursor movement in holistic twig joins

[...]

Marcus Fontoura¹, Vanja Josifovski¹, Eugene J. Shekita¹, Beverly Yang¹•Institutions (1)

IBM¹

31 Oct 2005

TL;DR: TwigOptimal is described, a new holistic twig join algorithm with optimal cursor movement that can use information in the return clause of XQuery to boost its performance and experimental results are presented, showing TwgOptimal's superiority over existing holistic Twig join algorithms.

...read moreread less

Abstract: Holistic twig join algorithms represent the state of the art for evaluating path expressions in XML queries. Using inverted indexes on XML elements, holistic twig joins move a set of index cursors in a coordinated way to quickly find structural matches. Because each cursor move can trigger I/O, the performance of a holistic twig join is largely determined by how many cursor moves it makes, yet, surprisingly, existing join algorithms have not been optimized along these lines. In this paper, we describe TwigOptimal, a new holistic twig join algorithm with optimal cursor movement. We sketch the proof of TwigOptimal's optimality, and describe how TwigOptimal can use information in the return clause of XQuery to boost its performance. Finally, experimental results are presented, showing TwigOptimal's superiority over existing holistic twig join algorithms.

...read moreread less

49 citations

Proceedings Article•DOI•

Efficient inverted lists and query algorithms for structured value ranking in update-intensive relational databases

[...]

G. Guo¹, Jayavel Shanmugasundaram¹, Kevin Scott Beyer², Eugene J. Shekita²•Institutions (2)

Cornell University¹, IBM²

05 Apr 2005

TL;DR: This work proposes a new family of inverted list indices and associated query algorithms that can support SVR efficiently in update-intensive databases, where the structured data values (and hence the scores of documents) change frequently.

...read moreread less

Abstract: We propose a new ranking paradigm for relational databases called Structured Value Ranking (SVR). SVR uses structured data values to score (rank) the results of keyword search queries over text columns. Our main contribution is a new family of inverted list indices and associated query algorithms that can support SVR efficiently in update-intensive databases, where the structured data values (and hence the scores of documents) change frequently. Our experimental results on real and synthetic data sets using BerkeleyDB show that we can support SVR efficiently in relational databases.

...read moreread less

13 citations

Patent•

Generic architecture for indexing document groups in an inverted text index

[...]

Andrei Z. Broder¹, Marcus Fontoura, Michael Herscovici, Ronny Lempel, John Ai McPherson, Andreas Neumann, Runping Qi, Eugene J. Shekita - Show less +4 more•Institutions (1)

IBM¹

12 Jan 2005

TL;DR: In this paper, a method for indexing a plurality of documents, that includes plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality, and then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicates group.

...read moreread less

Abstract: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

...read moreread less

12 citations

Proceedings Article•DOI•

Static score bucketing in inverted indexes

[...]

Chavdar Botev¹, Nadav Eiron², Marcus Fontoura², Ning Li², Eugene J. Shekita² - Show less +1 more•Institutions (2)

Cornell University¹, IBM²

31 Oct 2005

TL;DR: This paper shows that a new index organization based on static score bucketing significantly improves in index build performance while having minimal impact on the quality of search results.

...read moreread less

Abstract: Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires complex index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results.

...read moreread less

1 citations