scispace - formally typeset
Search or ask a question

Showing papers in "Journal on Data Semantics in 2009"


Book ChapterDOI
TL;DR: A reference reconciliation approach which combines a logical method for reference reconciliation called L2R and a numerical one called N2R, which exploits the schema and data semantics, which is translated into a set of Horn FOL rules of reconciliation.
Abstract: The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e. correspond to the same real world entity. In this article we present a reference reconciliation approach which combines a logical method for reference reconciliation called L2R and a numerical one called N2R. This approach exploits the schema and data semantics, which is translated into a set of Horn FOL rules of reconciliation. These rules are used in L2R to infer exact decisions both of reconciliation and non-reconciliation. In the second method N2R, the semantics of the schema is translated in an informed similarity measure which is used by a numerical computation of the similarity of reference pairs. This similarity measure is expressed in a non linear equation system, which is solved by using an iterative method. The experiments of the methods made on two different domains, show good results for both recall and precision. They can be used separately or in combination. We have shown that their combination allows to improve runtime performance.

95 citations


Book ChapterDOI
TL;DR: TheSemantic Data Warehouse is proposed to be a repository of ontologies and semantically annotated data resources and an ontology-driven framework to design multidimensional analysis models for Semantic Data Warehouses is proposed.
Abstract: The Semantic Web enables organizations to attach semantic annotations taken from domain and application ontologies to the information they generate. The concepts in these ontologies could describe the facts, dimensions and categories implied in the analysis subjects of a data warehouse. In this paper we propose the Semantic Data Warehouse to be a repository of ontologies and semantically annotated data resources. We also propose an ontology-driven framework to design multidimensional analysis models for Semantic Data Warehouses. This framework provides means for building a Multidimensional Integrated Ontology (MIO) including the classes, relationships and instances that represent interesting analysis dimensions, and it can be also used to check the properties required by current multidimensional databases (e.g., dimension orthogonality, category satisfiability, etc.) In this paper we also sketch how the instance data of a MIO can be translated into OLAP cubes for analysis purposes. Finally, some implementation issues of the overall framework are discussed.

82 citations


Book ChapterDOI
TL;DR: An advanced method for on-demand construction of OLAP cubes for ROLAP systems that contains the steps from cube design to ETL but focuses on ETL, and proposes an ontology based tool that will work as a user interface to the system from design to actual analysis.
Abstract: In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database. Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps. In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.

58 citations


Book ChapterDOI
TL;DR: This paper proposes a customizable and extensible ontology-driven approach for the conceptual design of ETL processes, using a graph-based representation used as a conceptual model for the source and target data stores and presents a method for devising flows ofETL operations by means of graph transformations.
Abstract: One of the main tasks during the early steps of a data warehouse project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the data warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.

42 citations


Book ChapterDOI
TL;DR: The problem of performing impact prediction for changes that occur in the schema/structure of the data warehouse sources is discussed and rules so that both syntactical and semantic correctness of activities are retained are presented.
Abstract: In this paper, we discuss the problem of performing impact prediction for changes that occur in the schema/structure of the data warehouse sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. For many cases of ETL source evolution, we present rules so that both syntactical and semantic correctness of activities are retained. Finally, we experiment with the evaluation of our approach over real-world ETL workflows used in the Greek public sector.

40 citations


Book ChapterDOI
TL;DR: Using fuzzy DLs, the proposed reasoning framework captures the vagueness of the extracted image descriptions and accomplishes their semantic interpretation, while resolving inconsistencies rising from contradictory descriptions.
Abstract: Statistical learning approaches, bounded mainly to knowledge related to perceptual manifestations of semantics, fall short to adequately utilise the meaning and logical connotations pertaining to the extracted image semantics. Instigated by the Semantic Web, ontologies have appealed to a significant share of synergistic approaches towards the combined use of statistical learning and explicit semantics. While the relevant literature tends to disregard the uncertainty involved, and treats the extracted image descriptions as coherent, two valued propositions, this paper explores reasoning under uncertainty towards a more accurate and pragmatic handling of the underlying semantics. Using fuzzy DLs, the proposed reasoning framework captures the vagueness of the extracted image descriptions and accomplishes their semantic interpretation, while resolving inconsistencies rising from contradictory descriptions. To evaluate the proposed reasoning framework, an experimental implementation using the fuzzyDL Description Logic reasoner has been carried out. Experiments in the domain of outdoor images illustrate the added value, while outlining challenges to be further addressed.

30 citations


Book ChapterDOI
TL;DR: A novel approach to probabilistic description logic programs for the Semantic Web in which disjunctive logic programs under the answer set semantics are tightly coupled with description logics and Bayesian probabilities is presented.
Abstract: We present a novel approach to probabilistic description logic programs for the Semantic Web in which disjunctive logic programs under the answer set semantics are tightly coupled with description logics and Bayesian probabilities. The approach has several nice features. In particular, it is a logic-based representation formalism that naturally fits into the landscape of Semantic Web languages. Tightly coupled probabilistic description logic programs can especially be used for representing mappings between ontologies, which are a common way of approaching the semantic heterogeneity problem on the Semantic Web. In this application, they allow in particular for resolving inconsistencies and for merging mappings from different matchers based on the level of confidence assigned to different rules. Furthermore, tightly coupled probabilistic description logic programs also provide a natural integration of ontologies, action languages, and Bayesian probabilities towards Web Services. We explore the computational aspects of consistency checking and query processing in tightly coupled probabilistic description logic programs. We show that these problems are decidable and computable, respectively, and that they can be reduced to consistency checking and cautious/brave reasoning, respectively, in tightly coupled disjunctive description logic programs. Using these results, we also provide an anytime algorithm for tight query processing. Furthermore, we analyze the complexity of consistency checking and query processing in the new probabilistic description logic programs, and we present a special case of these problems with polynomial data complexity.

25 citations


Book ChapterDOI
TL;DR: A new ontology mapping algorithm called Semantic Coordinator (SECCO), which paves the way towards a comprehensive semantic P2P solution for content sharing and retrieval, semantic query answering and query routing, and on the advantages of integrating SECCO in the K-link+ system.
Abstract: Ontology Mapping is a mandatory requirement for enabling semantic interoperability among different agents and services relying on different ontologies. This aspect becomes more critical in Peer-to-Peer (P2P) networks for several reasons: (i) the number of different ontologies can dramatically increase; (ii) mappings among peer ontologies have to be discovered on the fly and only on the parts of ontologies "contextual" to a specific interaction in which peers are involved; (iii) complex mapping strategies (e.g., structural mapping based on graph matching) cannot be exploited since peers are not aware of one another's ontologies. In order to address these issues, we developed a new ontology mapping algorithm called Semantic Coordinator (SECCO). SECCO is composed by three individual matchers: syntactic, lexical and contextual. The syntactic matcher , in order to discover mappings, exploits different kinds of linguistic information (e.g., comments, labels) encoded in ontology entities. The lexical matcher enables discovering mappings in a semantic way since it "interprets" the semantic meaning of concepts to be compared. The contextual matcher relies on a "how it fits" strategy, inspired by the contextual theory of meaning, and by taking into account the contexts in which the concepts to be compared are used refines similarity values. We show through experimental results that SECCO fulfills two important requirements: fastness and accuracy (i.e., quality of mappings). SECCO , differently from other semantic P2P applications (e.g., Piazza, GridVine) that assume the preexistence of mappings for achieving semantic interoperability, focuses on the problem of finding mappings. Therefore, if coupled with a P2P platform, it paves the way towards a comprehensive semantic P2P solution for content sharing and retrieval, semantic query answering and query routing. We report on the advantages of integrating SECCO in the K-link+ system.

21 citations


Book ChapterDOI
TL;DR: MISM (Model Independent Schema Management), a platform for model management offering a set of operators to manipulate schemas, in a manner that is both model-independent and model-aware, is proposed.
Abstract: Model management is a metadata-based approach to database problems aimed at supporting the productivity of developers by providing schema manipulation operators. Here we propose MISM (Model Independent Schema Management), a platform for model management offering a set of operators to manipulate schemas, in a manner that is both model-independent (in the sense that operators are generic and apply to schemas of different data models) and model-aware (in the sense that it is possible to say whether a schema is allowed for a data model). This is the first proposal for model management in this direction. We consider the main operators in model management: merge, diff, and modelgen. These operators play a major role in solving various problems related to schema evolution (such as data integration, data exchange or forward engineering), and we show in detail a solution to a major representative of the class, the round-trip engineering problem.

20 citations


Book ChapterDOI
TL;DR: This work proposes a metadata approach with semantic networks which reveals additional relevant artifacts that the user might have not been aware of and applies contextual information to filter out results unrelated to the user contexts, thus, improving the precision of the search results.
Abstract: The discovery of relevant software artifacts can increase software reuse and reduce the cost of software development and maintenance. Furthermore, change requests, which are a leading cause of project failures, can be better classified and handled when all relevant artifacts are available to the decision makers. However, traditional full-text and similarity search techniques often fail to provide the full set of relevant documents because they do not take into consideration existing relationships between software artifacts. We propose a metadata approach with semantic networks which convey such relationships. Our approach reveals additional relevant artifacts that the user might have not been aware of. We also apply contextual information to filter out results unrelated to the user contexts, thus, improving the precision of the search results. Experimental results show that the combination of semantic networks and context significantly improve the precision and recall of the search results.

18 citations


Book ChapterDOI
TL;DR: A software tool is described that allows for visualization of the impact of schema evolution through the use of triggers and stored procedures and a formalism for representing data warehouse schemas and determining the validity ofschema evolution operators applied to a schema is contributed.
Abstract: Models for conceptual design of data warehouse schemas have been proposed, but few researchers have addressed schema evolution in a formal way and none have presented software tools for enforcing the correctness of multidimensional schema evolution operators. We generalize the core features typically found in data warehouse data models, along with modeling extended hierarchy semantics. The advanced features include multiple hierarchies, non-covering hierarchies, non-onto hierarchies, and non-strict hierarchies. We model the constructs in the Uni-level Description Language (ULD) as well as using a multilevel dictionary definition (MDD) approach. The ULD representation provides a formal foundation to specify transformation rules for the semantics of schema evolution operators. The MDD gives a basis for direct implementation in a relational database system; we define model constraints and then use the constraints to maintain integrity when schema evolution operators are applied. This paper contributes a formalism for representing data warehouse schemas and determining the validity of schema evolution operators applied to a schema. We describe a software tool that allows for visualization of the impact of schema evolution through the use of triggers and stored procedures.

Book ChapterDOI
TL;DR: Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping, indicating that the system is both viable and effective.
Abstract: Automating the discovery of mappings between structured data sources is a long standing and important problem in data management We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping Hence, Tupelo is applicable in a wide range of data mapping scenarios Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective

Book ChapterDOI
TL;DR: This model is able to handle most of the hierarchies which have been suggested to take real situations into account and to characterize certain properties of summarizability and a complete development cycle of a multidimensional system is proposed.
Abstract: Models for representing multidimensional systems usually consider that facts and dimensions are two different things. In this paper we propose a model based on UML which unifies the representations of fact and of dimension members. Since a given element can play the role of a fact or of a dimension member, this model allows for more flexibility in the design and the implementation of multidimensional systems. Moreover this model offers the possibility to express various constraints to guarantee desirable properties for data. We then show that this model is able to handle most of the hierarchies which have been suggested to take real situations into account and to characterize certain properties of summarizability. Using this model we propose a complete development cycle of a multidimensional system. It appears that this cycle can be partially automated and that an end user can control the design and the implementation of his system himself.

Book ChapterDOI
TL;DR: A formal intensional FOL is introduced by fusing Bealer's intensional algebraic FOL with a possible-world semantics of the Montague's FOL modal approach to natural language and defining an intensional equivalence relation between views for peer databases.
Abstract: The meaning of concepts and views defined over a database ontology can be considered as intensional objects which have a particular extension in a given possible world: for instance in the actual world. Thus, non invasive mapping between completely independent peer databases in a P2P systems can be naturally specified by the set of couples of views, which have the same meaning (intension), over two different peers. Such a kind of mapping has very different semantics from standard view-based mappings based on material implication, commonly used for Data Integration Systems. The introduction of an intensional equivalence generates the quotient intensional FOL fundamental for a query answering in P2P systems. In this paper we introduce this formal intensional FOL by fusing Bealer's intensional algebraic FOL with a possible-world semantics of the Montague's FOL modal approach to natural language. We modify the Bealer's intensional algebra in order to deal with relational databases and views, by introducing the join operation of relational algebra. Then we adopt the S5 Kripke frame in order to define an intensional equivalence relation between views for peer databases. Finally, we define an embedding of P2P database system into this quotient intensional FOL, and the computing of its extensionalization mapping in the actual Montague's world.

Book ChapterDOI
TL;DR: A new, coherent approach to worklist visualisation is described, via analysis and development of a resource-centric view of the worklist information, creating an effective mapping between a task and the capabilities of the resources.
Abstract: Although business process management has been a major area of ICT research, no coherent approach has been developed to address the problem of business process visualisation to aid workers in the process of task prioritisation In this paper we describe the development of a new, coherent approach to worklist visualisation, via analysis and development of a resource-centric view of the worklist information We use instances of generic resource types as workflow elements that may be considered by workers when interacting with worklists We then propose a generic 2D framework for visualising the resources, creating an effective mapping between a task and the capabilities of the resources This aims to aid the process of task selection and prioritisation by workers A worklist visualisation system has been implemented as an extension to an open-source workflow system, YAWL (Yet Another Workflow Language)

Book ChapterDOI
TL;DR: This paper uses description logics to formalize the problem of query rewrite using views in presence of value constraints and shows that the technique of query rewriting can be used to process queries under the certain answer semantics, and proposes a sound and complete query rewriting Bucket-like algorithm.
Abstract: In this paper, we investigate the problem of query rewriting using views in a hybrid language allowing nominals (i.e., individual names) to occur in intentional descriptions. Of particular interest, restricted form of nominals where individual names refer to simple values enable the specification of value constraints, i.e, sets of allowed values for attributes. Such constraints are very useful in practice enabling, for example, fine-grained description of queries and views in integration systems and thus can be exploited to reduce the query processing cost. We use description logics to formalize the problem of query rewriting using views in presence of value constraints and show that the technique of query rewriting can be used to process queries under the certain answer semantics. We propose a sound and complete query rewriting Bucket-like algorithm. Data mining techniques have been used to favor scalability w.r.t. the number of views. Experiments on synthetic datasets have been conducted.

Book ChapterDOI
TL;DR: Con conceptual modeling concepts to specify complex connected 3D objects are introduced using F-logic, a full-fledged logic following the object-oriented paradigm that will allow applying reasoners to check the consistency of the specifications and to investigate properties before the application is actually built.
Abstract: Virtual Reality (VR) allows creating interactive three-dimen sional computer worlds in which objects have a sense of spatial and physical presence and can be manipulated by the user as such. Different software tools have been developed to build virtual worlds. However, most tools require considerable background knowledge about VR and the virtual world needs to be expressed in low-level VR primitives. This is one of the reasons why developing a virtual world is complex, time-consuming and expensive. Introducing a conceptual design phase in the development process will reduce the complexity and provides an abstraction layer to hide the VR implementation details. However, virtual worlds contain features not present in classical software. Therefore, new modeling concepts, currently not available in classical conceptual modeling languages, such as ORM or UML, are required. Next to introducing these new modeling concepts, it is also necessary to define their semantics to ensure unambiguousness and to allow code generation. In this paper, we introduce conceptual modeling concepts to specify complex connected 3D objects. Their semantics are defined using F-logic, a full-fledged logic following the object-oriented paradigm. F-logic will allow applying reasoners to check the consistency of the specifications and to investigate properties before the application is actually built.