scispace - formally typeset
Search or ask a question

Showing papers by "Eugene J. Shekita published in 2008"


Proceedings ArticleDOI
11 Feb 2008
TL;DR: This paper extends traditional faceted search to support richer information discovery tasks over more complex data models, and adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data far richer than just knowing the quantities of documents belonging to each facet.
Abstract: This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data that is far richer than just knowing the quantities of documents belonging to each facet. We see this capability as a step toward bringing OLAP capabilities, traditionally supported by databases over relational data, to the domain of free-text queries over metadata-rich content. Our second extension shows how one can efficiently extend a faceted search engine to support correlated facets - a more complex information model in which the values associated with a document across multiple facets are not independent. We show that by reducing the problem to a recently solved tree-indexing scenario, data with correlated facets can be efficiently indexed and retrieved

174 citations


Proceedings ArticleDOI
Vuk Ercegovac1, Vanja Josifovski2, Ning Li1, Mauricio Mediano2, Eugene J. Shekita1 
26 Oct 2008
TL;DR: A novel self-optimizing query execution algorithm is described to efficiently join the sections of a document in the inverted index, showing that sections can dramatically improve overall system throughput on a mixed workload of updates and queries.
Abstract: Inverted indexes have become the standard indexing method for supporting search queries in a variety of content-based applications. Examples of such applications include enterprise document management, e-mail, web search, and social networks. One shortcoming in current inverted index designs is that they support only document-level updates, forcing a full document to be reindexed even if just part of it changes. This paper describes a new inverted index design that enables applications to break a document into semantically meaningful sub-documents or "sections". Each section of a document can be updated separately, but search queries can still work seamlessly across sections. Our index design is motivated by applications where there is metadata associated with each document that tends to be smaller and more frequently updated than the document's content, but at the same time, it is desireable to search the metadata and content with the same index structure. A novel self-optimizing query execution algorithm is described to efficiently join the sections of a document in the inverted index. Experimental results on TREC and patent data are provided, showing that sections can dramatically improve overall system throughput on a mixed workload of updates and queries.

53 citations