Showing papers by "Jayant Madhavan published in 2011"

PDF

Open Access

Journal Article•DOI•

Recovering semantics of tables on the web

[...]

Petros Venetis¹, Alon Halevy², Jayant Madhavan², Marius Pasca², Warren Shen², Fei Wu², Gengxin Miao³, Chung Wu² - Show less +4 more•Institutions (3)

Stanford University¹, Google², University of California, Santa Barbara³

01 Jun 2011

TL;DR: A system that attempts to recover the semantics of tables by enriching the table with additional annotations, which leverages a database of class labels and relationships automatically extracted from the Web.

...read moreread less

Abstract: The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables.To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach.

...read moreread less

381 citations

Journal Article•DOI•

Generic schema matching, ten years later

[...]

Philip A. Bernstein¹, Jayant Madhavan², Erhard Rahm³•Institutions (3)

Microsoft¹, Google², Leipzig University³

01 Aug 2011

TL;DR: A taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation are developed, which summarizes the new techniques that have been developed and applications of the techniques in the commercial world.

...read moreread less

Abstract: In a paper published in the 2001 VLDB Conference, we proposed treating generic schema matching as an independent problem. We developed a taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation. Since then, the field has grown into a major research topic. We briefly summarize the new techniques that have been developed and applications of the techniques in the commercial world. We conclude by discussing future trends and recommendations for further work.

...read moreread less

288 citations

Journal Article•DOI•

Structured data on the web

[...]

Michael Cafarella¹, Alon Halevy², Jayant Madhavan²•Institutions (2)

University of Michigan¹, Google²

01 Feb 2011-Communications of The ACM

TL;DR: Fusion Tables is described, a recently launched data-management service that lets users create and visualize structured and easily and emphasizes the ability to collaborate with other data owners.

...read moreread less

Abstract: Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.

...read moreread less

109 citations

Patent•

Scalable rendering of large spatial databases

[...]

Hector Gonzalez¹, Jayant Madhavan¹, Andrin von Richenberg¹, Anno Langen¹, Alon Halevy¹ - Show less +1 more•Institutions (1)

Google¹

06 Jun 2011

TL;DR: In this article, the authors describe a service for data management and integration across a wide range of applications, where the data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries.

...read moreread less

Abstract: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.

...read moreread less

52 citations

Patent•

Table search using recovered semantic information

[...]

Jayant Madhavan, Chung M. Wu, Alon Halevy, Gengxin Miao, Marius Pasca, Warren Shen - Show less +2 more

08 Jul 2011

23 citations

Journal Article•DOI•

Identifying aspects for web-search queries

[...]

Fei Wu¹, Jayant Madhavan¹, Alon Halevy¹•Institutions (1)

Google¹

01 Jan 2011-Journal of Artificial Intelligence Research

TL;DR: The ASPECTOR system that computes aspects for a given query is described, which combines two sources of information to compute aspects that are orthogonal to each other and to have high combined coverage.

...read moreread less

Abstract: Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the ASPECTOR system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, ASPECTOR computes aspects that are orthogonal to each other and to have high combined coverage. ASPECTOR combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be "semantically" related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives - related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.

...read moreread less

19 citations