scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2006"


Proceedings ArticleDOI
Martin Wattenberg1
22 Apr 2006
TL;DR: This paper introduces PivotGraph, a software tool that uses a new technique for visualizing and analyzing graph structures designed specifically for graphs that are "multivariate," i.e., where each node is associated with several attributes.
Abstract: This paper introduces PivotGraph, a software tool that uses a new technique for visualizing and analyzing graph structures. The technique is designed specifically for graphs that are "multivariate," i.e., where each node is associated with several attributes. Unlike visualizations which emphasize global graph topology, PivotGraph uses a simple grid-based approach to focus on the relationship between node attributes and connections. The interaction technique is derived from an analogy with methods seen in spreadsheet pivot tables and in online analytical processing (OLAP). Finally, several examples are presented in which PivotGraph was applied to real-world data sets.

250 citations


Journal ArticleDOI
01 Nov 2006
TL;DR: The MultiDimER model is formally defined, a conceptual multidimensional model that allows to represent facts with measures as well as the different kinds of hierarchies already classified in the previous work.
Abstract: Hierarchies are used in data warehouses (DWs) and on-line analytical processing (OLAP) systems to see data at different levels of detail. However, many kinds of hierarchies arising in real-world situations are not addressed by current OLAP systems. Further, there is still no agreement on a conceptual model for DW and OLAP design that offers both a graphical representation and a formal definition.In this paper, we formally define the MultiDimER model, a conceptual multidimensional model that allows to represent facts with measures as well as the different kinds of hierarchies already classified in our previous work [E. Malinowski, E. Zimanyi, OLAP hierarchies: a conceptual perspective, in: Proceedings of the 16th International Conference on Advanced Information Systems Engineering, 2004, pp. 477-491]. We also present the mapping of such hierarchies to the relational model, as well as their implementation in commercial DW products.

206 citations


Book
01 Jan 2006
TL;DR: Data Mining in Business, Banking and Commercial Applications, and Insurance, as well as major and privacy issues in Data Mining and Knowledge Discovery, and Active Data Mining.
Abstract: to Data Mining Principles- Data Warehousing, Data Mining, and OLAP- Data Marts and Data Warehouse- Evolution and Scaling of Data Mining Algorithms- Emerging Trends and Applications of Data Mining- Data Mining Trends and Knowledge Discovery- Data Mining Tasks, Techniques, and Applications- Data Mining: an Introduction - Case Study- Data Mining & KDD- Statistical Themes and Lessons for Data Mining- Theoretical Frameworks for Data Mining- Major and Privacy Issues in Data Mining and Knowledge Discovery- Active Data Mining- Decomposition in Data Mining - A Case Study- Data Mining System Products and Research Prototypes- Data Mining in Customer Value and Customer Relationship Management- Data Mining in Business- Data Mining in Sales Marketing and Finance- Banking and Commercial Applications- Data Mining for Insurance- Data Mining in Biomedicine and Science- Text and Web Mining- Data Mining in Information Analysis and Delivery- Data Mining in Telecommunications and Control- Data Mining in Security

206 citations


Patent
28 Aug 2006
TL;DR: In this paper, a system for capturing the context of, and translating, or mapping, from data in an originating database presentation, or in a originating format, to data in a target database presentation or target format is presented.
Abstract: A system for capturing the context of, and translating, or mapping, from data in an originating database presentation, or in an originating format, to data in a target database presentation, or target format. The translation uses the context of the originating report/query in terms of the originating database as a basis for the translation. The originating context is translated to the target context and is used to accurately map data from one presentation to another. By using a context and a translation map, which define specifics of the translation between contexts against different data sources, the invention is able to achieve a mapping engine that can efficiently map data between databases of different types. The translation map includes rules set automatically by the system, or set by a human administrator. The rules permit special treatment of different mapping scenarios. For example, specified types of mappings can be prevented so that selected users will be denied the ability to access restricted target information. Member exceptions are used that permit mapping between different data models, as, for example, where rows or columns in an originating data source (e.g. OLAP) are not present in a target data source (e.g. Relational). Other aspects of the invention include using supplemental member translations, translating items in an OLAP level to more than one translation object, delegating data items in cases where there is little or no correspondence between data models, translating a data item to a plurality of data items, translating a data item to a range, and additional aspects. An administrator interface is provided to create, modify, monitor and manage a mapping system.

179 citations


Journal ArticleDOI
TL;DR: This paper presents a multidimensional conceptual Object-Oriented model for Data Warehousing and OLAP tools, its structures, integrity constraints and query operations, and investigates the representation of several semantically related star schemas.

176 citations


Book
30 Oct 2006
TL;DR: This text provides theoretical frameworks, presents challenges and their possible solutions, and examines the latest empirical research findings in the area of data warehousing.
Abstract: Covering a wide range of technical, technological, and research issues, this text provides theoretical frameworks, presents challenges and their possible solutions, and examines the latest empirical research findings in the area of data warehousing.

158 citations


Journal ArticleDOI
01 Dec 2006
TL;DR: This paper presents a UML-based data warehouse design method that spans the three design phases (conceptual, logical and physical), and represents all the metamodels using UML, and illustrates the formal specification of the transformations based on OMG's Object Constraint Language (OCL).
Abstract: Data warehouses are a major component of data-driven decision support systems (DSS). They rely on multidimensional models. The latter provide decision makers with a business-oriented view to data, thereby easing data navigation and analysis via On-Line Analytical Processing (OLAP) tools. They also determine how the data are stored in the data warehouse for subsequent use, not only by OLAP tools, but also by other decision support tools. Data warehouse design is a complex task, which requires a systematic method. Few such methods have been proposed to date. This paper presents a UML-based data warehouse design method that spans the three design phases (conceptual, logical and physical). Our method comprises a set of metamodels used at each phase, as well as a set of transformations that can be semi-automated. Following our object orientation, we represent all the metamodels using UML, and illustrate the formal specification of the transformations based on OMG's Object Constraint Language (OCL). Throughout the paper, we illustrate the application of our method to a case study.

149 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: This work introduces a deferred approach for detecting and correcting RFID data anomalies, and develops two novel rewrite methods, both of which reduce the amount of data to be cleaned, by exploiting predicates in application queries while guaranteeing correct answers.
Abstract: Radio Frequency Identification is gaining broader adoption in many areas. One of the challenges in implementing an RFID-based system is dealing with anomalies in RFID reads. A small number of anomalies can translate into large errors in analytical results. Conventional "eager" approaches cleanse all data upfront and then apply queries on cleaned data. However, this approach is not feasible when several applications define anomalies and corrections on the same data set differently and not all anomalies can be defined beforehand. This necessitates anomaly handling at query time. We introduce a deferred approach for detecting and correcting RFID data anomalies. Each application specifies the detection and the correction of relevant anomalies using declarative sequence-based rules. An application query is then automatically rewritten based on the cleansing rules that the application has specified, to provide answers over cleaned data. We show that a naive approach to deferred cleansing that applies rules without leveraging query information can be prohibitive. We develop two novel rewrite methods, both of which reduce the amount of data to be cleaned, by exploiting predicates in application queries while guaranteeing correct answers. We leverage standardized SQL/OLAP functionality to implement rules specified in a declarative sequence-based language. This allows efficient evaluation of cleansing rules using existing query processing capabilities of a DBMS. Our experimental results show that deferred cleansing is affordable for typical analytic queries over RFID data.

134 citations


Patent
23 Jun 2006
TL;DR: In this article, the authors propose a method for mapping a data source of an unknown configuration to that of a known configuration, comprising the steps of submitting a request for metadata to the data source, generating a relational schema from the known configuration based on the metadata received from the source of the unknown configuration, and returning the metadata of the generated relational schema which maps the data from the unknown source to the known source.
Abstract: A method for mapping a data source of an unknown configuration to that of a known configuration, comprising the steps of submitting a request for metadata to the data source of the unknown configuration; generating a relational schema from the known configuration based on the metadata received from the data source of the unknown configuration; and returning the metadata of the generated relational schema which maps the data source of the unknown configuration to the known configuration. In a preferred embodiment data source of the unknown configuration is a multidimensional database and the known configuration is a star or snowflake relational schema.

134 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: This paper presents a generic, DBMS independent, and highly extensible relational data generation tool that can efficiently generate realistic test data for OLTP, OLAP, and data streaming applications.
Abstract: This paper presents a generic, DBMS independent, and highly extensible relational data generation tool. The tool can efficiently generate realistic test data for OLTP, OLAP, and data streaming applications. The tool uses a graph model to direct the data generation. This model makes it very simple to generate data even for large database schemas with complex inter- and intra table relationships. The model also makes it possible to generate data with very accurate characteristics.

134 citations


Patent
27 Nov 2006
TL;DR: In this paper, a system and a method for creating an analytical report on top of a multidimensional data model built on a relational or multi-dimensional database is presented. But the system is not able to defer the initial query of the data source, as is the case with conventional reporting tools and methods until after the report has been defined.
Abstract: A system and a method for creating an analytical report on top of a multidimensional data model built on top of a relational or multidimensional database. The database operates in a computer system and provides returned values responsive to queries. Such a query is generated automatically and is deduced from a report definition. According to one embodiment, a report specification is used by the system and method of the present invention is able to defer the initial query of the data source, as is the case with conventional reporting tools and methods, until after the report has been defined. That is, the manner in which a analytic report is defined provides for an automatically generated query. Once the report has been defined, the data to populate such a report is retrieved to build the document.

Journal ArticleDOI
01 Nov 2006
TL;DR: The basic concept of document warehousing is discussed and its formal definitions are presented and a general system framework is proposed and some useful applications are elaborate to illustrate the importance of documentWarehousing.
Abstract: During the past decade, data warehousing has been widely adopted in the business community. It provides multi-dimensional analyses on cumulated historical business data for helping contemporary administrative decision-making. Nevertheless, it is believed that only about 20% information can be extracted from data warehouses concerning numeric data only, the other 80% information is hidden in non-numeric data or even in documents. Therefore, many researchers now advocate that it is time to conduct research work on document warehousing to capture complete business intelligence. Document warehouses, unlike traditional document management systems, include extensive semantic information about documents, cross-document feature relations, and document grouping or clustering to provide a more accurate and more efficient access to text-oriented business intelligence. In this paper, we discuss the basic concept of document warehousing and present its formal definitions. Then, we propose a general system framework and elaborate some useful applications to illustrate the importance of document warehousing. The work is essential for establishing an infrastructure to help combine text processing with numeric OLAP processing technologies. The combination of data warehousing and document warehousing will be one of the most important kernels of knowledge management and customer relationship management applications.

Proceedings ArticleDOI
03 Apr 2006
TL;DR: The core components of the solution include: modeling of required workflows, active enforcement of control activities, auditing of actual workflows to verify compliance with internal controls, and discovery-driven OLAP to identify irregularities in financial data.
Abstract: The Sarbanes-Oxley Act instituted a series of corporate reforms to improve the accuracy and reliability of financial reporting. Sections 302 and 404 of the Act require SEC-reporting companies to implement internal controls over financial reporting, periodically assess the effectiveness of these internal controls, and certify the accuracy of their financial statements. We suggest that database technology can play an important role in assisting compliance with the internal control provisions of the Act. The core components of our solution include: (i) modeling of required workflows, (ii) active enforcement of control activities, (iii) auditing of actual workflows to verify compliance with internal controls, and (iv) discovery-driven OLAP to identify irregularities in financial data. We illustrate how the features of our solution fulfill Sarbanes-Oxley requirements using several real-life scenarios. In the process, we identify opportunities for new database research.

Patent
10 Oct 2006
TL;DR: In this article, the authors present a system for implementing an OLAP system that has increased query execution speed, requires reduced data storage capacity for OLAP systems and/or facilitates scaling of OLAP applications to large data cubes.
Abstract: Systems or apparatus, methods, data structures and/or computer programs are provided for implementing an OLAP system that has increased query execution speed, requires reduced data storage capacity for an OLAP system and/or facilitates scaling of an OLAP system to large data cubes. The OLAP system can advantageously be implemented as an all-main memory OLAP system.

Journal ArticleDOI
TL;DR: It has been shown that the BI concept may contribute towards improving quality of decision-making in any organisation, better customer service and some increase in customers’ loyalty.
Abstract: The paper aims at analysing Business Intelligence Systems (BI) in the context of opportunities for improving decision-making in a contemporary organisation. The authors – taking specifics of a decision-making process together with heterogeneity and dispersion of information sources into consideration – present Business Intelligence Systems as some holistic infrastructure of decisionmaking. It has been shown that the BI concept may contribute towards improving quality of decision-making in any organisation, better customer service and some increase in customers’ loyalty. The paper is focused on three fundamental components of the BI systems, i.e. key information technologies (including ETL tools and data warehouses), potential of key information technologies (OLAP techniques and data mining) and BI applications that support making different decisions in an organisation. A major part of the paper is devoted to discussing basic business analyses that are not only offered by the BI systems but also applied frequently in business practice.

Patent
18 Jul 2006
TL;DR: In this article, a data warehouse is provided for gaming systems located in multiple jurisdictions, where each jurisdiction collects gaming data from one or more gaming devices, and data from each jurisdiction is extracted, transformed and loaded into a Data Warehouse.
Abstract: A data warehouse is provided for gaming systems located in multiple jurisdictions. Each jurisdiction collects gaming data from one or more gaming devices. Data from each jurisdiction is extracted, transformed and loaded into a data warehouse. A network, such as the Internet, may be used to transfer the data to the data warehouse. An on-line analytical processing (OLAP) application provides analysis services, such as point-in-time data reports, summary data reports, comparison reports, trend analysis reports and profitability reports, and other data analysis and data mining applications.

Journal ArticleDOI
TL;DR: This paper proposes a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models, and substantially reduces the memory usage and the overall response time for statistical analysis of multidimensional data.
Abstract: As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in addition to the traditional simple measures such as count and average. Such new measures allow users to model, smooth, and predict the trends and patterns of data. Existing algorithms for simple distributive and algebraic measures are inadequate for efficient computation of statistical measures in a multidimensional space. In this paper, we propose a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models. For compressible measures, we compress each cell into an auxiliary matrix with a size independent of the number of tuples. We can then compute the statistical measures for any data cell from the compressed data of the lower-level cells without accessing the raw data. Time- and space-efficient lossless aggregation formulae are derived for regression and filtering measures. Our analytical and experimental studies show that the resulting system, regression cube, substantially reduces the memory usage and the overall response time for statistical analysis of multidimensional data

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results, and addresses the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans.
Abstract: This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a naive materialize-group-sort scheme that can be prohibitively inefficient. Our framework is based on three fundamental principles. The Upper-Bound Principle dictates the requirements of early pruning, and the Group-Ranking and Tuple-Ranking Principles dictate group-ordering and tuple-ordering requirements. They together guide the query processor toward a provably optimal tuple schedule for aggregate query processing. We propose a new execution framework to apply the principles and requirements. We address the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans. The experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans, compared with the traditional plans.

Proceedings ArticleDOI
Pat Hanrahan1
27 Jun 2006
TL;DR: VizQL enables a new generation of visual analysis tools that closely couple query, analysis and visualization into a single framework and permits an unlimited number of picture expressions.
Abstract: Conventional query languages such as SQL and MDX have limited formatting and visualization capabilities. Thus, although powerful queries can be composed, another layer of software is needed to report or present the results in a useful form to the analyst. VizQL™ is designed to fill that gap. VizQL evolved from the Polaris system at Stanford, which combined query, analysis and visualization into a single framework [1].VizQL is a formal language for describing tables, charts, graphs, maps, time series and tables of visualizations. These different types of visual representations are unified into one framework, making it easy to switch from one visual representation to another (e.g. from a list view to a cross-tab to a chart). Unlike current charting packages and like query languages, VizQL permits an unlimited number of picture expressions. Visualizations can thus be easily customized and controlled. VizQL is a declarative language. The desired picture is described; the low-level operations needed to retrieve the results, to perform analytical calculations, to map the results to a visual representation, and to render the image are generated automatically by the query analyzer. The query analyzer compiles VizQL expressions to SQL and MDX and thus VizQL can be used with relational databases and datacubes. The current implementation supports Hyperion Essbase, Microsoft SQL Server, Microsoft Analysis Services, MySQL, Oracle, as well as desktop data sources such as CSV and Excel files. This analysis phase includes many optimizations that allow large databases to be browsed interactively. VizQL enables a new generation of visual analysis tools that closely couple query, analysis and visualization.

Book
23 Jan 2006
TL;DR: In this article, the authors describe how to transform disparate enterprise data into actionable business intelligence using Microsoft SQL Server 2005 and the Unified Dimensional Model (UDM) for data mining, warehousing, and scripting techniques.
Abstract: Transform disparate enterprise data into actionable business intelligence Put timely, mission-critical information in the hands of employees across your organization using Microsoft SQL Server 2005 and the comprehensive information in this unique resource. Delivering Business Intelligence with Microsoft SQL Server 2005 shows you, step-by-step, how to author, customize, and distribute information that will give your company the competitive edge. It's all right here--from data mining, warehousing, and scripting techniques to MDX queries, KPI analysis, and the all-new Unified Dimensional Model. Real-world examples, start-to-finish exercises, and downloadable code throughout illustrate all of the integration, analysis, and reporting capabilities of SQL Server 2005. Table of contents PART I: Business Intelligence Chapter 1: Equipping the Organization for Effective Decision Making Chapter 2: Making the Most of What You've Got -- Using Business Intelligence Chapter 3: Searching for the Source -- The Source of Business Intelligence Chapter 4: One-Stop Shopping -- The Unified Dimensional Model Chapter 5: First Steps -- Beginning the Development of Business Intelligence PART II: Defining Business Intelligence Structures Chapter 6: Building Foundations -- Creating and Populating Data Marts Chapter 7: Fill 'er up -- Using Integration Services for Populating Data Marts PART III: Analyzing Cube Content Chapter 8: Cubism -- Measures and Dimensions Chapter 9: Bells and Whistles -- Special Features of OLAP Cubes Chapter 10: Writing a New Script -- MDX Scripting Chapter 11: Pulling It Out and Building It Up -- MDX Queries PART IV: Mining Chapter 12: Panning for Gold -- Introduction to Data Mining Chapter 13: Building the Mine -- Working with the Data Mining Model Chapter 14: Spelunking -- Exploration Using Data Mining PART V: Delivering Chapter 15: On Report -- Delivering Business Intelligence with Reporting Services Chapter 16: Let's Get Together -- Integrating OLAP with Your Applications Chapter 17: Another Point of View -- Excel Pivot Tables and Pivot Charts

Proceedings ArticleDOI
10 Nov 2006
TL;DR: This paper proposes a framework for mining inter-dimensional association rules from data cubes according to a sum-based aggregate measure more general than simple frequencies provided by the traditional COUNT measure.
Abstract: On-line analytical processing (OLAP) provides tools to explore and navigate into data cubes in order to extract interesting information. Nevertheless, OLAP is not capable of explaining relationships that could exist in a data cube. Association rules are one kind of data mining techniques which finds associations among data. In this paper, we propose a framework for mining inter-dimensional association rules from data cubes according to a sum-based aggregate measure more general than simple frequencies provided by the traditional COUNT measure. Our mining process is guided by a meta-rule context driven by analysis objectives and exploits aggregate measures to revisit the definition of support and confidence. We also evaluate the interestingness of mined association rules according to Lift and Loevinger criteria and propose an efficient algorithm for mining inter-dimensional association rules directly from a multidimensional data.

Patent
24 Feb 2006
TL;DR: In this paper, an automated data warehousing and OLAP cube building process is presented, which allows a person who is not a database query language expert to build validated data warehouses, and to build OLAP cubes based on such data warehouses.
Abstract: The present invention provides an automated data warehousing and OLAP cube building process. The invention allows a person who is not a database query language expert to build validated data warehouses, and OLAP cubes based on such data warehouses.

Proceedings ArticleDOI
08 Jul 2006
TL;DR: This work applies two non-elitist multiobjective evolutionary algorithms (MOEAs) to view selection under a size constraint and observes that the evolutionary process mimics that of the greedy in terms of the convergence process in the population.
Abstract: On-Line Analytical Processing (OLAP) tools are frequently used in business, science and health to extract useful knowledgefrom massive databases. An important and hard optimization problem in OLAP data warehouses is the view selection problem, consisting of selecting a set of aggregate views of the data for speeding up future query processing. A common variant of the view selection problem addressed in the literature minimizes the sum of maintenance cost and query time on the view set. Converting what is inherently an optimization problem with multiple conflicting objectives into one with a single objective ignores the need and value of a variety of solutions offering various levels of trade-off between the objectives. We apply two non-elitist multiobjective evolutionary algorithms (MOEAs) to view selection under a size constraint. Our emphasis is to determine the suitability of the combination of MOEAs with constraint handling to the view selection problem, compared to a widely used greedy algorithm. We observe that the evolutionary process mimics that of the greedy in terms of the convergence process in the population. The MOEAs are competitive with the greedy on a variety of problem instances, often finding solutions dominating it in a reasonable amount of time.

Journal ArticleDOI
01 Feb 2006
TL;DR: This paper presents a technique based on an analytical interpretation of multidimensional data and on the well-known least squares approximation (LSA) method for supporting approximate aggregate query answering in OLAP, which represents the most common application interfaces for a DWS.
Abstract: Inefficient query answering is the main drawback in Decision Support Systems (DSS), due to the very large size of the multidimensional data stored in the underlying Data Warehouse Server (DWS). Aggregate queries are the most frequent and useful kind for such systems, as they support several analysis based on the multidimensionality and multi-resolution of data. As a consequence, providing fast answers to aggregate queries (by trading off accuracy for efficiency, if possible) has become a very important requirement in improving the effectiveness of DSS-based applications. In this paper we present a technique based on an analytical interpretation of multidimensional data and on the well-known least squares approximation (LSA) method for supporting approximate aggregate query answering in OLAP, which represents the most common application interfaces for a DWS. Our technique consists in building data synopses by interpreting the original data distributions as a set of discrete functions. These synopses, called Δ-Syn, are obtained by approximating data with a set of polynomial coefficients, and by storing these coefficients instead of the original data. Queries are issued on the compressed representation, thus reducing the number of disk accesses needed to evaluate the answers.

Proceedings ArticleDOI
03 Jul 2006
TL;DR: The proposed technique can be efficiently used in QoA-based OLAP tools, where OLAP users/applications and DW servers are allowed to mediate on the accuracy of (approximate) answers, similarly to what happens in QoS-based systems for the quality of services.
Abstract: An innovative technique supporting accuracy control in compressed multidimensional data cubes is presented in this paper. The proposed technique can be efficiently used in QoA-based OLAP tools, where OLAP users/applications and DW servers are allowed to mediate on the accuracy of (approximate) answers, similarly to what happens in QoS-based systems for the quality of services. The compressed data structure KLSA, which implements the technique, is also extensively presented and discussed. We complement our analytical contributions with an experimental evaluation on several kinds of synthetic multidimensional data cubes, demonstrating the superiority of our approach in comparison with other similar techniques.

Journal Article
TL;DR: Ontology-basedintegration of BI is discussed for semantic interoperability inintegrating DW, OLAP and DM, and a hybrid ontological structure is introduced which includes conceptual view, analytical view and physical view.
Abstract: The integration of Business Intelligence BI has been taken bybusiness decision-makers as an effective means to enhance enterprise "soft power" and added value in the reconstruction and revolution oftraditional industries. The existing solutions based on structuralintegration are to pack together data warehouse DW, OLAP, data miningDM and reporting systems from different vendors. BI system users arefinally delivered a reporting system in which reports, data models,dimensions and measures are predefined by system designers. As aresult of a survey in the US, 85% of DW projects based on the above solutions failed to meet their intended objectives. In this paper, wesummarize our investigation on the integration of BI on the basis ofsemantic integration and structural interaction. Ontology-basedintegration of BI is discussed for semantic interoperability inintegrating DW, OLAP and DM. A hybrid ontological structure isintroduced which includes conceptual view, analytical view and physicalview. These views are matched with user interfaces, DW and enterpriseinformation systems, respectively. Relevant ontological engineeringtechniques are developed for ontology namespace, semantic relationships,and ontological transformation, mapping and query in this ontologicalspace. The approach is promising for business-oriented, adaptive andautomatic integration of BI in the real world. Operational decisionmaking experiments within a telecom company have demonstrated that a BI system utilizing the proposed approach is more flexible.

Journal ArticleDOI
TL;DR: This paper discusses the cgmCUBE Project, a multi-year effort to design and implement aMulti-processor platform for data cube generation that targets the relational database model (ROLAP), and discusses new algorithmic and system optimizations relating to a thorough optimization of the underlying sequential cube construction method.
Abstract: On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.

Book ChapterDOI
04 Sep 2006
TL;DR: A wide set of experimental results conducted on several kind of synthetic two-dimensional OLAP views clearly confirm the effectiveness and the efficiency of the proposed technique, also in comparison with state-of-the-art proposals.
Abstract: In this paper, we investigate the problem of visualizing multidimensional data cubes, and propose a novel technique for supporting advanced OLAP visualization of such data structures. Founding on very efficient data compression solutions for two-dimensional data domains, the proposed technique relies on the amenity of generating “semantics-aware” compressed representation of two-dimensional OLAP views extracted from multidimensional data cubes via the so-called OLAP dimension flattening process. A wide set of experimental results conducted on several kind of synthetic two-dimensional OLAP views clearly confirm the effectiveness and the efficiency of our technique, also in comparison with state-of-the-art proposals.

Journal ArticleDOI
TL;DR: A new multidimensional model is proposed that can manage imprecision in both dimensions and facts and hide the complexity to the end user.
Abstract: As a result of the use of OLAP technology in new fields of knowledge and the merging of data from different sources, it has become necessary for models to support this technology. In this paper, we shall propose a new multidimensional model that can manage imprecision in both dimensions and facts and hide the complexity to the end user. The multidimensional structure is therefore able to model data imprecision resulting from the integration of data from different sources or even information from experts, which it does by means of fuzzy logic

Proceedings ArticleDOI
20 Aug 2006
TL;DR: A novel approach to deal with interestingness of discovered rules is proposed, which casts rule analysis as OLAP operations and general impression mining, which enables the user to explore the knowledge space to find useful knowledge easily and systematically.
Abstract: The problem of interestingness of discovered rules has been investigated by many researchers. The issue is that data mining algorithms often generate too many rules, which make it very hard for the user to find the interesting ones. Over the years many techniques have been proposed. However, few have made it to real-life applications. Since August 2004, we have been working on a major application for Motorola. The objective is to find causes of cellular phone call failures from a large amount of usage log data. Class association rules have been shown to be suitable for this type of diagnostic data mining application. We were also able to put several existing interestingness methods to the test, which revealed some major shortcomings. One of the main problems is that most existing methods treat rules individually. However, we discovered that users seldom regard a single rule to be interesting by itself. A rule is only interesting in the context of some other rules. Furthermore, in many cases, each individual rule may not be interesting, but a group of them together can represent an important piece of knowledge. This led us to discover a deficiency of the current rule mining paradigm. Using non-zero minimum support and non-zero minimum confidence eliminates a large amount of context information, which makes rule analysis difficult. This paper proposes a novel approach to deal with all of these issues, which casts rule analysis as OLAP operations and general impression mining. This approach enables the user to explore the knowledge space to find useful knowledge easily and systematically. It also provides a natural framework for visualization. As an evidence of its effectiveness, our system, called Opportunity Map, based on these ideas has been deployed, and it is in daily use in Motorola for finding actionable knowledge from its engineering and other types of data sets.