Showing papers by "Phokion G. Kolaitis published in 2014"

PDF

Open Access

Journal Article•DOI•

The Complexity of Mining Maximal Frequent Subgraphs

[...]

Benny Kimelfeld, Phokion G. Kolaitis¹•Institutions (1)

30 Dec 2014

TL;DR: A comprehensive investigation of the computational complexity of mining maximal frequent subgraphs, focusing on specific classes of connected graphs and establishing that the following problem is NP-complete: given two unlabeled trees, do they have more than one maximal subtree in common?

...read moreread less

Abstract: A frequent subgraph of a given collection of graphs is a graph that is isomorphic to a subgraph of at least as many graphs in the collection as a given threshold. Frequent subgraphs generalize frequent itemsets and arise in various contexts, from bioinformatics to the Web. Since the space of frequent subgraphs is typically extremely large, research in graph mining has focused on special types of frequent subgraphs that can be orders of magnitude smaller in number, yet encapsulate the space of all frequent subgraphs. Maximal frequent subgraphs (i.e., the ones not properly contained in any frequent subgraph) constitute the most useful such type. In this article, we embark on a comprehensive investigation of the computational complexity of mining maximal frequent subgraphs. Our study is carried out by considering the effect of three different parameters: possible restrictions on the class of graphs; a fixed bound on the threshold; and a fixed bound on the number of desired answers. We focus on specific classes of connected graphs: general graphs, planar graphs, graphs of bounded degree, and graphs of bounded treewidth (trees being a special case). Moreover, each class has two variants: that in which the nodes are unlabeled, and that in which they are uniquely labeled. We delineate the complexity of the enumeration problem for each of these variants by determining when it is solvable in (total or incremental) polynomial time and when it is NP-hard. Specifically, for the labeled classes, we show that bounding the threshold yields tractability but, in most cases, bounding the number of answers does not, unless P=NP; an exception is the case of labeled trees, where bounding either of these two parameters yields tractability. The state of affairs turns out to be quite different for the unlabeled classes. The main (and most challenging to prove) result concerns unlabeled trees: we show NP-hardness, even if the input consists of two trees and both the threshold and the number of desired answers are equal to just two. In other words, we establish that the following problem is NP-complete: given two unlabeled trees, do they have more than one maximal subtree in commonq

...read moreread less

25 citations

Proceedings Article•DOI•

Nested dependencies: structure and reasoning

[...]

Phokion G. Kolaitis¹, Reinhard Pichler², Emanuel Sallinger², Vadim Savenkov²•Institutions (2)

University of California, Santa Cruz¹, Vienna University of Technology²

18 Jun 2014

TL;DR: This paper focuses on the basic reasoning tasks, algorithmic problems, and structural properties of nested GLAV mappings, and concludes that also the following problem is decidable: given a nestedGLAV mapping, is it logically equivalent to a GLAV mapping?

...read moreread less

Abstract: During the past decade, schema mappings have been extensively used in formalizing and studying such critical data interoperability tasks as data exchange and data integration. Much of the work has focused on GLAV mappings, i.e., schema mappings specified by source-to-target tuple-generating dependencies (s-t tgds), and on schema mappings specified by second-order tgds (SO tgds), which constitute the closure of GLAV mappings under composition. In addition, nested GLAV mappings have also been considered, i.e., schema mappings specified by nested tgds, which have expressive power intermediate between s-t tgds and SO tgds. Even though nested GLAV mappings have been used in data exchange systems, such as IBM's Clio, no systematic investigation of this class of schema mappings has been carried out so far. In this paper, we embark on such an investigation by focusing on the basic reasoning tasks, algorithmic problems, and structural properties of nested GLAV mappings. One of our main results is the decidability of the implication problem for nested tgds. We also analyze the structure of the core of universal solutions with respect to nested GLAV mappings and develop useful tools for telling apart SO tgds from nested tgds. By discovering deeper structural properties of nested GLAV mappings, we show that also the following problem is decidable: given a nested GLAV mapping, is it logically equivalent to a GLAV mapping?

...read moreread less

21 citations

Book Chapter•DOI•

Exchange-Repairs: Managing Inconsistency in Data Exchange

[...]

Balder ten Cate¹, Richard L. Halpert¹, Phokion G. Kolaitis¹, Phokion G. Kolaitis²•Institutions (2)

University of California, Santa Cruz¹, IBM²

15 Sep 2014

TL;DR: In this article, the notion of exchange-repairs is introduced and explored for data exchange with target constraints, where the semantics of target queries trivialize, because the certain answers of every target query over the given source instance evaluate to true.

...read moreread less

Abstract: In a data exchange setting with target constraints, it is often the case that a given source instance has no solutions. Intuitively, this happens when data sources contain inconsistent or conflicting information that is exposed by the target constraints at hand. In such cases, the semantics of target queries trivialize, because the certain answers of every target query over the given source instance evaluate to “true”. The aim of this paper is to introduce and explore a new framework that gives meaningful semantics in such cases by using the notion of exchange-repairs. Informally, an exchange-repair of a source instance is another source instance that differs minimally from the first, but has a solution. In turn, exchange-repairs give rise to a natural notion of exchange-repair certain answers (in short, XR-certain answers) for target queries in the context of data exchange with target constraints.

...read moreread less

8 citations

Book Chapter•DOI•

Schema Mappings: A Case of Logical Dynamics in Database Theory

[...]

Balder ten Cate¹, Phokion G. Kolaitis²•Institutions (2)

University of California, Santa Cruz¹, IBM²

01 Jan 2014

TL;DR: This chapter discusses a series of results concerning fundamental structural properties of schema mappings and shows that these structural properties can be used to obtain characterizations of various schema-mapping languages, in the spirit of abstract model theory.

...read moreread less

Abstract: A schema mapping is a high-level specification of the structural relationships between two database schemas. This specification is expressed in a schema-mapping language, which is typically a fragment of first-order logic or second-order logic. Schema mappings have played an essential role in the study of important data-interoperability tasks, such as data integration and data exchange. In this chapter, we examine schema mappings as a case of logical dynamics in action. We provide a self-contained introduction to this area of research in the context of logic and databases, and focus on some of the concepts and results that may be of particular interest to the readers of this volume. After a basic introduction to schema mappings and schema-mapping languages, we discuss a series of results concerning fundamental structural properties of schema mappings. We then show that these structural properties can be used to obtain characterizations of various schema-mapping languages, in the spirit of abstract model theory. We conclude this chapter by highlighting the surprisingly subtle picture regarding compositions of schema mappings and the languages needed to express them.

...read moreread less

2 citations