scispace - formally typeset
Search or ask a question
Author

G. De Giacomo

Bio: G. De Giacomo is an academic researcher from Sapienza University of Rome. The author has contributed to research in topics: Web query classification & Query optimization. The author has an hindex of 10, co-authored 14 publications receiving 786 citations.

Papers
More filters
Proceedings ArticleDOI
20 Aug 1998
TL;DR: The authors present a general architecture for information integration that explicitly includes a conceptual representation of the application and provide various arguments in favor of the conceptual level in the architecture and of automated reasoning over the conceptual representation.
Abstract: Information integration is one of the core problems in cooperative information systems. The authors argue that two critical factors for the design and maintenance of applications requiring information integration are conceptual modeling of the domain, and reasoning support over the conceptual representation. In particular they present a general architecture for information integration that explicitly includes a conceptual representation of the application. They illustrate how the architecture can express several integration settings and existing systems. They provide various arguments in favor of the conceptual level in the architecture and of automated reasoning over the conceptual representation. Finally, they present a specific proposal of an integration system which realizes the general architecture and is equipped with decidable reasoning procedures.

173 citations

Proceedings ArticleDOI
01 Feb 2000
TL;DR: This work is the first to exhibit decidability in cases where the language for expressing the query and the views allows for recursion, and characterize data, expression, and combined complexity of the problem, showing that the proposed algorithms are essentially optimal.
Abstract: Query answering using views amounts to computing the answer to a query having information only on the extension of a set of views. This problem is relevant in several fields, such as information integration, data warehousing, query optimization, mobile computing, and maintaining physical data independence. We address query answering using views in a context where queries and views are regular path queries, i.e., regular expressions that denote the pairs of objects in the database connected by a matching path. Regular path queries are the basic query mechanism when the database is conceived as a graph, such as in semistructured data and data on the Web. We study algorithms for answering regular path queries using views under different assumptions, namely, closed and open domain, and sound, complete, and exact information on view extensions. We characterize data, expression, and combined complexity of the problem, showing that the proposed algorithms are essentially optimal. Our results are the first to exhibit decidability in cases where the language for expressing the query and the views allows for recursion.

141 citations

Journal ArticleDOI
01 Dec 2003
TL;DR: It is demonstrated that the basic services for reasoning about two way regular path queries are decidable, thus showing that the limited form of recursion expressible by these queries does not endanger the decidability of reasoning.
Abstract: Current information systems are required to deal with more complex data with respect to traditional relational data. The database community has already proposed abstractions for these kinds of data, in particular in terms of semistructured data models. A semistructured model conceives a database essentially as a finite directed labeled graph whose nodes represent objects, and whose edges represent relationships between objects. In the same way as conjunctive queries form the core of any query language for the relational model, regular path queries (RPQs) and their variants are considered the basic querying mechanisms for semistructured data.Besides the basic task of query answering, i.e., evaluating a query over a database, databases should support other reasoning services related to querying. One of the most important is query containment, i.e., verifying whether for all databases the answer to a query is a subset of the answer to a second query. Another important reasoning service that has received considerable attention in the recent years is view-based query processing, which amounts to processing queries based on a set of materialized views, rather than on the raw data in the database.The goal of this paper is to describe basic results and techniques concerning query containment and view based query processing for the class of two-way regular-path queries (which extend RPQs with the inverse operator). We will demonstrate that the basic services for reasoning about two way regular path queries are decidable, thus showing that the limited form of recursion expressible by these queries does not endanger the decidability of reasoning. Besides the specific results, our methods show the power of two-way automata in reasoning on complex queries.

98 citations

Proceedings ArticleDOI
26 Jun 2000
TL;DR: This work studies view based query processing for the case of regular-path queries, which are the basic querying mechanisms for the emergent field of semistructured data and presents two methods for computing PTIME rewritings of specific forms.
Abstract: View-based query processing requires answering a query posed to a database only on the basis of the information on a set of views, which are again queries over the same database. This problem is relevant in many aspects of database management, and has been addressed by means of two basic approaches: query rewriting and query answering. In the former approach, one tries to compute a rewriting of the query in terms of the views, whereas in the latter, one aims at directly answering the query based on the view extensions. We study view based query processing for the case of regular-path queries, which are the basic querying mechanisms for the emergent field of semistructured data. Based on recent results, we first show that a rewriting is in general a co-NP function wrt to the size of view extensions. Hence, the problem arises of characterizing which instances of the problem admit a rewriting that is PTIME. A second contribution of the work is to establish a tight connection between view based query answering and constraint satisfaction problems, which allows us to show that the above characterization is going to be difficult. As a third contribution, we present two methods for computing PTIME rewritings of specific forms. The first method, which is based on the established connection with constraint satisfaction problems, gives us rewritings expressed in Datalog with a fixed number of variables. The second method, based on automata-theoretic techniques, gives us rewritings that are formulated as unions of conjunctive regular-path queries with a fixed number of variables.

90 citations

Proceedings ArticleDOI
26 Aug 1998
TL;DR: This work presents a novel approach to conceptual modeling for source integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources.
Abstract: Source integration is one of the core problems in data warehousing. Two critical factors for the design and maintenance of applications requiring source integration, and in particular data warehouse applications, are conceptual modeling of the domain, and reasoning support over the conceptual representation. We present a novel approach to conceptual modeling for source integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources. Our methodological framework relies on the reasoning services associated with the modeling formalism to support an incremental source integration phase within the data warehouse design process.

87 citations


Cited by
More filters
Proceedings ArticleDOI
03 Jun 2002
TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

2,716 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
Abstract: Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

2,467 citations

Book
18 Nov 2009
TL;DR: This introduction presents the main motivations for the development of Description Logics as a formalism for representing knowledge, as well as some important basic notions underlying all systems that have been created in the DL tradition.
Abstract: This introduction presents the main motivations for the development of Description Logics (DLs) as a formalism for representing knowledge, as well as some important basic notions underlying all systems that have been created in the DL tradition. In addition, we provide the reader with an overview of the entire book and some guidelines for reading it. We first address the relationship between Description Logics and earlier semantic network and frame systems, which represent the original heritage of the field. We delve into some of the key problems encountered with the older efforts. Subsequently, we introduce the basic features of DL languages and related reasoning techniques. DL languages are then viewed as the core of knowledge representation systems, considering both the structure of a DL knowledge base and its associated reasoning services. The development of some implemented knowledge representation systems based on Description Logics and the first applications built with such systems are then reviewed. Finally, we address the relationship of Description Logics to other fields of Computer Science.We also discuss some extensions of the basic representation language machinery; these include features proposed for incorporation in the formalism that originally arose in implemented systems, and features proposed to cope with the needs of certain application domains.

1,966 citations

Journal Article
TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
Abstract: We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning.

1,675 citations

Journal ArticleDOI
01 Dec 2001
TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.
Abstract: The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.

1,642 citations