scispace - formally typeset
Search or ask a question

Integration of Multidimensional Data in

01 Jan 2012-
About: The article was published on 2012-01-01 and is currently open access. It has received 4 citations till now.
Citations
More filters
Journal Article
TL;DR: On-Line Analytical Processing systems based on multidimensional databases are essential elements of decision support, but most existing data is stored in “ordinary” relational OLTP databases, i.e., data has to be (re-) modeled as multiddimensional cubes before the advantages of OLAP tools are available.
Abstract: On-Line Analytical Processing (OLAP) systems based on multidimensional databases are essential elements of decision support. However, most existing data is stored in ordinary relational OLTP databases, i.e., data has to be (re-) modeled as multidimensional cubes before the advantages of OLAP tools are available. In this paper we present an approach for the automatic construction of multidimensional OLAP database schemas from existing relational OLTP databases, enabling easy OLAP design and analysis for most existing data sources. This is achieved through a set of practical and effective algorithms for discovering multidimensional schemas from relational databases. The algorithms take a wide range of available metadata into account in the discovery process, including functional and inclusion dependencies, and key and cardinality information.

77 citations

Journal Article
TL;DR: A taxonomy of inaccurate summary factors and practical rules for handling them is presented, which could help designers and users of OLAP systems reduce unnecessary constraints caused by imposing rules to eliminate all summarizability violations and give designers a means to prioritize problems.
Abstract: Accurate summarizability is an important property in OLAP systems because inaccurate summaries can result in poor decisions. Furthermore, it is important to understand and identify the potential sources of inaccurate summaries. In this paper, we present a taxonomy of inaccurate summary factors and practical rules for handling them. We consolidate relevant terms and concepts in statistical databases with those in OLAP systems and explore factors that are important for measuring the impact of erroneous summaries. We discuss these issues from the perspectives of schema, data, and computation. This paper contributes to a comprehensive understanding of summarizability and its impact on decision-making. Our work could help designers and users of OLAP systems reduce unnecessary constraints caused by imposing rules to eliminate all summarizability violations and give designers a means to prioritize problems.

26 citations

Journal Article
TL;DR: In this paper, the authors present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes.
Abstract: A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In order to enable the translation of queries in a federated approach, schemas of the federated and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison, where calculated values are used to create necessary mappings. We present restriction rules and recommendations for aggregation level matching, which builds the most complex part of the process. A software implementation of the entire process is provided in order to perform its verification, as well as to determine the proper selection metric for mapping different multidimensional structures.

19 citations

Posted Content
TL;DR: In this paper, the deductive object manager ConceptBase is proposed to enrich the semantics of data warehouse solutions by including an explicit enterprise-centered concept of quality, and CoDecide, an Internet-based toolkit for the flexible visualization of multiple, interrelated data cubes.
Abstract: We show three interrelated tools intended to improve different aspects of the quality of data warehouse solutions. Firstly, the deductive object manager ConceptBase is intended to enrich the semantics of data warehouse solutions by including an explicit enterprise-centered concept of quality. The positive impact of precise multidimensional data models on the client interface is demonstrated by CoDecide, an Internet-based toolkit for the flexible visualization of multiple, interrelated data cubes. Finally, MIDAS is a hybrid data mining system which analyses multi-dimensional data to further enrich the semantics of the meta database, using a combination of neural network techniques, fuzzy logic and machine learning.

7 citations

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
01 Dec 2001
TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

3,693 citations

Journal ArticleDOI
01 Mar 1997
TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.
Abstract: Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.

2,835 citations

Book
01 Jan 1992
TL;DR: This Second Edition of Building the Data Warehouse is revised and expanded to include new techniques and applications of data warehouse technology and update existing topics to reflect the latest thinking.
Abstract: From the Publisher: The data warehouse solves the problem of getting information out of legacy systems quickly and efficiently. If designed and built right, data warehouses can provide significant freedom of access to data, thereby delivering enormous benefits to any organization. In this unique handbook, W. H. Inmon, "the father of the data warehouse," provides detailed discussion and analysis of all major issues related to the design and construction of the data warehouse, including granularity of data, partitioning data, metadata, lack of creditability of decision support systems (DSS) data, the system of record, migration and more. This Second Edition of Building the Data Warehouse is revised and expanded to include new techniques and applications of data warehouse technology and update existing topics to reflect the latest thinking. It includes a useful review checklist to help evaluate the effectiveness of the design.

2,820 citations

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations