scispace - formally typeset
Search or ask a question

Showing papers by "Jayant Madhavan published in 2011"


Journal ArticleDOI
01 Jun 2011
TL;DR: A system that attempts to recover the semantics of tables by enriching the table with additional annotations, which leverages a database of class labels and relationships automatically extracted from the Web.
Abstract: The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables.To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach.

381 citations


Journal ArticleDOI
01 Aug 2011
TL;DR: A taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation are developed, which summarizes the new techniques that have been developed and applications of the techniques in the commercial world.
Abstract: In a paper published in the 2001 VLDB Conference, we proposed treating generic schema matching as an independent problem. We developed a taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation. Since then, the field has grown into a major research topic. We briefly summarize the new techniques that have been developed and applications of the techniques in the commercial world. We conclude by discussing future trends and recommendations for further work.

288 citations


Journal ArticleDOI
TL;DR: Fusion Tables is described, a recently launched data-management service that lets users create and visualize structured and easily and emphasizes the ability to collaborate with other data owners.
Abstract: Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.

109 citations


Patent
Hector Gonzalez1, Jayant Madhavan1, Andrin von Richenberg1, Anno Langen1, Alon Halevy1 
06 Jun 2011
TL;DR: In this article, the authors describe a service for data management and integration across a wide range of applications, where the data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries.
Abstract: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.

52 citations



Journal ArticleDOI
Fei Wu1, Jayant Madhavan1, Alon Halevy1
TL;DR: The ASPECTOR system that computes aspects for a given query is described, which combines two sources of information to compute aspects that are orthogonal to each other and to have high combined coverage.
Abstract: Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the ASPECTOR system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, ASPECTOR computes aspects that are orthogonal to each other and to have high combined coverage. ASPECTOR combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be "semantically" related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives - related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.

19 citations