scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2008"


Book
15 Jan 2008
TL;DR: This book serves as an introduction to the state of the art on data warehouse design, with many references to more detailed sources, and may help experienced data warehouse designers to enlarge their analysis possibilities by incorporating spatial and temporal information.
Abstract: A data warehouse stores large volumes of historical data required for analytical purposes. This data is extracted from operational databases; transformed into a coherent whole using a multidimensional model that includes measures, dimensions, and hierarchies; and loaded into a data warehouse during the extraction-transformation-loading (ETL) process. Malinowski and Zimnyi explain in detail conventional data warehouse design, covering in particular complex hierarchy modeling. Additionally, they address two innovative domains recently introduced to extend the capabilities of data warehouse systems, namely the management of spatial and temporal information. Their presentation covers different phases of the design process, such as requirements specification, conceptual, logical, and physical design. They include three different approaches for requirements specification depending on whether users, operational data sources, or both are the driving force in the requirements gathering process, and they show how each approach leads to the creation of a conceptual multidimensional model. Throughout the book the concepts are illustrated using many real-world examples and completed by sample implementations for Microsoft's Analysis Services 2005 and Oracle 10g with the OLAP and the Spatial extensions. For researchers this book serves as an introduction to the state of the art on data warehouse design, with many references to more detailed sources. Providing a clear and a concise presentation of the major concepts and results of data warehouse design, it can also be used as the basis of a graduate or advanced undergraduate course. The book may help experienced data warehouse designers to enlarge their analysis possibilities by incorporating spatial and temporal information. Finally, experts in spatial databases or in geographical information systems could benefit from the data warehouse vision for building innovative spatial analytical applications.

223 citations


Proceedings ArticleDOI
11 Feb 2008
TL;DR: This paper extends traditional faceted search to support richer information discovery tasks over more complex data models, and adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data far richer than just knowing the quantities of documents belonging to each facet.
Abstract: This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data that is far richer than just knowing the quantities of documents belonging to each facet. We see this capability as a step toward bringing OLAP capabilities, traditionally supported by databases over relational data, to the domain of free-text queries over metadata-rich content. Our second extension shows how one can efficiently extend a faceted search engine to support correlated facets - a more complex information model in which the values associated with a document across multiple facets are not independent. We show that by reducing the problem to a recently solved tree-indexing scenario, data with correlated facets can be efficiently indexed and retrieved

174 citations


Journal ArticleDOI
TL;DR: The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections and introduces the problem of dealing with semi-structured data in aDW.
Abstract: This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query and retrieve web data, and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semi-structured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources and the XML extensions of On-Line Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as, to identify open research lines.

160 citations


Proceedings ArticleDOI
15 Dec 2008
TL;DR: A novel graph OLAP framework is developed, which presents a multi-dimensional and multi-level view over graphs and shows how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently.
Abstract: OLAP (On-Line Analytical Processing) is an important notion in data analysis. Recently, more and more graph or networked data sources come into being. There exists a similar need to deploy graph analysis from different perspectives and with multiple granularities. However, traditional OLAP technology cannot handle such demands because it does not consider the links among individual data tuples. In this paper, we develop a novel graph OLAP framework, which presents a multi-dimensional and multi-level view over graphs. The contributions of this work are two-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Then, with more emphasis on informational OLAP (topological OLAP will be covered in a future study due to the lack of space), we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. We can see that the aggregated graphs, which depend on the graph properties of underlying networks, are much harder to compute than their traditional OLAP counterparts, due to the increased structural complexity of data. Empirical studies show insightful results on real datasets and demonstrate the efficiency of our proposed optimizations.

151 citations


Proceedings ArticleDOI
15 Dec 2008
TL;DR: This paper proposes a text-cube model on multidimensional text database and conducts systematic studies on efficient text-Cube implementation, OLAP execution and query processing and shows the high promise of the methods.
Abstract: Since Jim Gray introduced the concept of rdquodata cuberdquo in 1997, data cube, associated with online analytical processing (OLAP), has become a driving engine in data warehouse industry. Because the boom of Internet has given rise to an ever increasing amount of text data associated with other multidimensional information, it is natural to propose a data cube model that integrates the power of traditional OLAP and IR techniques for text. In this paper, we propose a text-cube model on multidimensional text database and study effective OLAP over such data. Two kinds of hierarchies are distinguishable inside: dimensional hierarchy and term hierarchy. By incorporating these hierarchies, we conduct systematic studies on efficient text-cube implementation, OLAP execution and query processing. Our performance study shows the high promise of our methods.

125 citations


Proceedings ArticleDOI
26 Oct 2008
TL;DR: A novel "navigational" expectation that's particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values are proposed.
Abstract: We propose a dynamic faceted search system for discovery-driven analysis on data with both textual content and structured attributes. From a keyword query, we want to dynamically select a small set of "interesting" attributes and present aggregates on them to a user. Similar to work in OLAP exploration, we define "interestingness" as how surprising an aggregated value is, based on a given expectation. We make two new contributions by proposing a novel "navigational" expectation that's particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values. Through a user survey, we find the new expectation and interestingness metric quite effective. We develop an efficient dynamic faceted search system by improving a popular open source engine, Solr. Our system exploits compressed bitmaps for caching the posting lists in an inverted index, and a novel directory structure called a bitset tree for fast bitset intersection. We conduct a comprehensive experimental study on large real data sets and show that our engine performs 2 to 3 times faster than Solr.

113 citations


Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper proposes the concept of Sequence OLAP, or S-OLAP for short, a system that can be characterized not only by the attributes' values of its constituting items, but also by the subsequence/substring patterns it possesses.
Abstract: Many kinds of real-life data exhibit logical ordering among their data items and are thus sequential in nature. However, traditional online analytical processing (OLAP) systems and techniques were not designed for sequence data and they are incapable of supporting sequence data analysis. In this paper, we propose the concept of Sequence OLAP, or S-OLAP for short. The biggest distinction of S-OLAP from traditional OLAP is that a sequence can be characterized not only by the attributes' values of its constituting items, but also by the subsequence/substring patterns it possesses. This paper studies many aspects related to Sequence OLAP. The concepts of sequence cuboid and sequence data cube are introduced. A prototype S-OLAP system is built in order to validate the proposed concepts. The prototype is able to support "pattern-based" grouping and aggregation, which is currently not supported by any OLAP system. The implementation details of the prototype system as well as experimental results are presented.

86 citations


Proceedings ArticleDOI
13 Jun 2008
TL;DR: This work investigates how the traditional data cube model is adapted to trajectory warehouses in order to transform raw location data into valuable information.
Abstract: The flow of data generated from low-cost modern sensing technologies and wireless telecommunication devices enables novel research fields related to the management of this new kind of data and the implementation of appropriate analytics for knowledge extraction. In this work, we investigate how the traditional data cube model is adapted to trajectory warehouses in order to transform raw location data into valuable information. In particular, we focus our research on three issues that are critical to trajectory data warehousing: (a) the trajectory reconstruction procedure that takes place when loading a moving object database with sampled location data originated e.g. from GPS recordings, (b) the ETL procedure that feeds a trajectory data warehouse, and (c) the aggregation of cube measures for OLAP purposes. We provide design solutions for all these issues and we test their applicability and efficiency in real world settings.

86 citations


Book
25 Feb 2008
TL;DR: In this article, the authors present an overview of the state of the art on data warehouse design, including three different approaches for requirements specification depending on whether users, operational data sources or both are the driving force in the requirements gathering process, and how each approach leads to the creation of a conceptual multidimensional model.
Abstract: A data warehouse stores large volumes of historical data required for analytical purposes. This data is extracted from operational databases; transformed into a coherent whole using a multidimensional model that includes measures, dimensions, and hierarchies; and loaded into a data warehouse during the extraction-transformation-loading (ETL) process. Malinowski and Zimnyi explain in detail conventional data warehouse design, covering in particular complex hierarchy modeling. Additionally, they address two innovative domains recently introduced to extend the capabilities of data warehouse systems, namely the management of spatial and temporal information. Their presentation covers different phases of the design process, such as requirements specification, conceptual, logical, and physical design. They include three different approaches for requirements specification depending on whether users, operational data sources, or both are the driving force in the requirements gathering process, and they show how each approach leads to the creation of a conceptual multidimensional model. Throughout the book the concepts are illustrated using many real-world examples and completed by sample implementations for Microsoft's Analysis Services 2005 and Oracle 10g with the OLAP and the Spatial extensions. For researchers this book serves as an introduction to the state of the art on data warehouse design, with many references to more detailed sources. Providing a clear and a concise presentation of the major concepts and results of data warehouse design, it can also be used as the basis of a graduate or advanced undergraduate course. The book may help experienced data warehouse designers to enlarge their analysis possibilities by incorporating spatial and temporal information. Finally, experts in spatial databases or in geographical information systems could benefit from the data warehouse vision for building innovative spatial analytical applications.

86 citations


Book
23 May 2008
TL;DR: This six-volume set offers tools, designs, and outcomes of the utilization of data warehousing and mining technologies, such as algorithms, concept lattices, multidimensional data, and online analytical processing.
Abstract: Data Warehousing and Mining: Concepts, Methodologies, Tools and Applications provides the most comprehensive compilation of research available in this emerging and increasingly important field. This six-volume set offers tools, designs, and outcomes of the utilization of data warehousing and mining technologies, such as algorithms, concept lattices, multidimensional data, and online analytical processing. With more than 300 chapters contributed by over 575 experts from around the globe, this authoritative collection will provide libraries with the essential reference on data warehousing and mining.

85 citations


Journal ArticleDOI
TL;DR: An extension of the OnLine Analytical Processing framework with causal explanation is described, offering the possibility to automatically generate explanations for exceptional cell values, suggesting improved decision-making by managers because the current tedious and error-prone manual analysis process is enhanced by automated problem identification and explanation generation.

Journal ArticleDOI
TL;DR: This article deals with multidimensional analyses, where analyzed data are designed according to a conceptual model as a constellation of facts and dimensions, which are composed of multi-hierarchies and this model supports a query algebra defining a minimal core of operators.
Abstract: This article deals with multidimensional analyses. Analyzed data are designed according to a conceptual model as a constellation of facts and dimensions, which are composed of multi-hierarchies. This model supports a query algebra defining a minimal core of operators, which produce multidimensional tables for displaying analyzed data. This user-oriented algebra supports complex analyses through advanced operators and binary operators. A graphical language, based on this algebra, is also provided to ease the specification of multidimensional queries. These graphical manipulations are expressed from a constellation schema and they produce multidimensional tables.

Proceedings ArticleDOI
08 Dec 2008
TL;DR: By MRBench, users can evaluate the performance of MapReduce systems while varying environmental parameters such as data size and the number of (map/reduce) tasks, show that MRBench is a useful tool to benchmark the capability of answering critical business questions.
Abstract: MapReduce is Google's programming model for easy development of scalable parallel applications which process huge quantity of data on many clusters. Due to its conveniency and efficiency, MapReduce is used in various applications (e.g., Web search services and online analytical processing). However, there are only few good benchmarks to evaluate MapReduce implementations by realistic testsets. In this paper, we present MRBench that is a benchmark for evaluating MapReduce systems. MRBench focuses on processing business oriented queries and concurrent data modifications. To this end, we build MRBench to deal with large volumes of relational data and execute highly complex queries. By MRBench, users can evaluate the performance of MapReduce systems while varying environmental parameters such as data size and the number of (map/reduce) tasks. Our extensive experimental results show that MRBench is a useful tool to benchmark the capability of answering critical business questions.

Journal ArticleDOI
01 Apr 2008
TL;DR: The traditional corporate data warehouse with a document warehouse is integrated, resulting in a contextualized warehouse, where the user first selects an analysis context by supplying some keywords, and the analysis is performed on a novel type of OLAP cube, called an R-cube.
Abstract: Current data warehouse and OLAP technologies are applied to analyze the structured data that companies store in databases. The context that helps to understand data over time is usually described separately in text-rich documents. This paper proposes to integrate the traditional corporate data warehouse with a document warehouse, resulting in a contextualized warehouse. Thus, the user first selects an analysis context by supplying some keywords. Then, the analysis is performed on a novel type of OLAP cube, called an R-cube, which is materialized by retrieving and ranking the documents and corporate facts related to the selected context.

Proceedings ArticleDOI
30 Oct 2008
TL;DR: This paper presents a generic framework that allows to recommend OLAP queries based on the OLAP server query log and shows how to use this framework for recommending simple MDX queries and provides some experimental results to validate the approach.
Abstract: An OLAP analysis session can be defined as an interactive session during which a user launches queries to navigate within a cube. Very often choosing which part of the cube to navigate further, and thus designing the forthcoming query, is a difficult task. In this paper, we propose to use what the OLAP users did during their former exploration of the cube as a basis for recommending OLAP queries to the user. We present a generic framework that allows to recommend OLAP queries based on the OLAP server query log. This framework is generic in the sense that changing its parameters changes the way the recommendations are computed. We show how to use this framework for recommending simple MDX queries and we provide some experimental results to validate our approach.

Book ChapterDOI
02 Sep 2008
TL;DR: A robust sampling-based framework for privacy preserving OLAP is introduced and experimentally assessed, which deals with the problem of preserving the privacy of OLAP aggregations rather than the one of data cube cells, which results in a greater theoretical soundness, and lower computational overheads due to processing massive-in-size data cubes.
Abstract: A robust sampling-based framework for privacy preserving OLAP is introduced and experimentally assessed in this paper. The most distinctive characteristic of the proposed framework consists in adopting an innovative privacy OLAP notion, which deals with the problem of preserving the privacy of OLAP aggregations rather than the one of data cube cells, like in conventional perturbation-based privacy preserving OLAP techniques. This results in a greater theoretical soundness, and lower computational overheads due to processing massive-in-size data cubes. Also, the performance of our privacy preserving OLAP technique is compared with the one of the method Zero-Sum, the state-of-the-art privacy preserving OLAP perturbation-based technique, under several perspectives of analysis. The derived experimental results confirm to us the benefits deriving from adopting our proposed framework for the goal of preserving the privacy of OLAP data cubes.

Patent
Eric Williamson1
28 Aug 2008
TL;DR: In this paper, a transform engine can combine or aggregate the set of data sources using common dimensions or data points, and build an index into a transform table reflecting the hierarchical level of dimension from each data source in a combined hierarchical mapping.
Abstract: Embodiments relate to systems and methods for aggregating data from data sources according to a hierarchical mapping generated from dimensions of the data sources. A set of applications such as online analytical processing (OLAP) applications can access the combined data of a set of multi-dimensional data sources via a transform engine. The set of data sources can be configured with diverse dimensions and associated data, which in general do not reflect a strictly hierarchical structure. In embodiments, the transform engine can combine or aggregate the set of data sources using common dimensions or data points, and build an index into a transform table reflecting the hierarchical level of dimension from each data source in a combined hierarchical mapping. An OLAP or other application can therefore perform searches, sorts, and/or other operations on the combined hierarchical mapping based on the resulting ordering of data, even when the original multi-dimensional data sources do not contain an explicit common hierarchy.

Book ChapterDOI
01 Jan 2008
TL;DR: The above-mentioned issues are meant to facilitate understanding of the high relevance and attractiveness of the problem of visualizing multidimensional data sets at present and in the future, with challenging research findings accompanied by significant spin-offs in the Information Technology (IT) industrial field.
Abstract: INTRODUCTION The problem of efficiently visualizing multidimensional data sets produced by scientific and statistical tasks/ processes is becoming increasingly challenging, and is attracting the attention of a wide multidisciplinary community of researchers and practitioners. Basically, this problem consists in visualizing multidimensional data sets by capturing the dimensionality of data, which is the most difficult aspect to be considered. Human analysts interacting with high-dimensional data often experience disorientation and cognitive overload. Analysis of high-dimensional data is a challenge encountered in a wide set of real-life applications such as (i) biological databases storing massive gene and protein data sets, (ii) real-time monitoring systems accumulating data sets produced by multiple, multi-rate streaming sources, (iii) advanced Business Intelligence (BI) systems collecting business data for decision making purposes etc. Traditional DBMS front-end tools, which are usually tuple-bag-oriented, are completely inadequate to fulfill the requirements posed by an interactive exploration of high-dimensional data sets due to two major reasons: (i) DBMS implement the OLTP paradigm, which is optimized for transaction processing and deliberately neglects the dimensionality of data; (ii) DBMS operators are very poor and offer nothing beyond the capability of conventional SQL statements, what makes such tools very inefficient with respect to the goal of visualizing and, above all, interacting with multidimensional data sets embedding a large number of dimensions. Despite the above-highlighted practical relevance of the problem of visualizing multidimensional data sets, the literature in this field is rather scarce, due to the fact that, for many years, this problem has been of relevance for life science research communities only, and interaction of the latter with the computer science research community has been insufficient. Following the enormous growth of scientific disciplines like Bio-Informatics, this problem has then become a fundamental field in the computer science academic as well as industrial research. At the same time, a number of proposals dealing with the multidimensional data visualization problem appeared in literature, with the amenity of stimulating novel and exciting application fields such as the visualization of Data Mining results generated by challenging techniques like clustering and association rule discovery. The above-mentioned issues are meant to facilitate understanding of the high relevance and attractiveness of the problem of visualizing multidimensional data sets at present and in the future, with challenging research findings accompanied by significant spin-offs in the Information Technology (IT) industrial field. A possible solution to tackle this problem is represented by well-known OLAP techniques (Codd …

Journal ArticleDOI
01 Aug 2008
TL;DR: Multidimensional Content eXploration (MCX) is about effectively analyzing and exploring large amounts of content by combining keyword search with OLAP-style aggregation, navigation, and reporting, and formally presents how CMS content and metadata should be organized in a well-defined multidimensional structure.
Abstract: Content Management Systems (CMS) store enterprise data such as insurance claims, insurance policies, legal documents, patent applications, or archival data like in the case of digital libraries. Search over content allows for information retrieval, but does not provide users with great insight into the data. A more analytical view is needed through analysis, aggregations, groupings, trends, pivot tables or charts, and so on. Multidimensional Content eXploration (MCX) is about effectively analyzing and exploring large amounts of content by combining keyword search with OLAP-style aggregation, navigation, and reporting. We focus on unstructured data or generally speaking documents or content with limited metadata, as it is typically encountered in CMS. We formally present how CMS content and metadata should be organized in a well-defined multidimensional structure, so that sophisticated queries can be expressed and evaluated. The CMS metadata provide traditional OLAP static dimensions that are combined with dynamic dimensions discovered from the analyzed keyword search result, as well as measures for document scores based on the link structure between the documents. In addition, we provide means for multidimensional content exploration through traditional OLAP rollupdrilldown operations on the static and dynamic dimensions, solutions for multi-cube analysis and dynamic navigation of the content. We present our prototype, called DBPubs, which stores research publications as documents that can be searched and -most importantly-- analyzed, and explored. Finally, we present experimental results of the efficiency and effectiveness of our approach.

Patent
Eric Williamson1
28 Aug 2008
TL;DR: In this paper, a set of applications such as online analytical processing (OLAP) applications can access the combined data of the data sources via an aggregation engine, where the set of data sources can be configured with diverse dimensions and associated data.
Abstract: Embodiments relate to systems and methods for analyzing data extracted from a set of data sources. A set of applications such as online analytical processing (OLAP) applications can access the combined data of a set of data sources via an aggregation engine. The set of data sources can be configured with diverse dimensions and associated data. In general the data sources may not be expected to reflect a strictly consistent structure. In embodiments, the aggregation engine can combine or aggregate the set of data sources using common dimensions or data points, and build an index into a transform table reflecting a combined mapping. An OLAP or other application can then perform statistical computations, searches, sorts, and/or other operations on the combined mapping, even when the original data sources do not contain identical dimensions or other formats.

Book ChapterDOI
02 Sep 2008
TL;DR: A new aggregation function that aggregates textual data in an OLAP environment is presented that represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.
Abstract: For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new aggregation function that aggregates textual data in an OLAP environment. The Top_Keyword function ( Top_Kw for short) represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.

Journal ArticleDOI
TL;DR: The Dynameomics project is an effort to characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations, as described by Beck et al.
Abstract: The Dynameomics project is our effort to characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations, as described by Beck et al. (in Protein Eng. Des. Select., the first paper in this series). The data produced by these simulations are highly multidimensional in structure and multi-terabytes in size. Both of these features present significant challenges for storage, retrieval and analysis. For optimal data modeling and flexibility, we needed a platform that supported both multidimensional indices and hierarchical relationships between related types of data and that could be integrated within our data warehouse, as described in the accompanying paper directly preceding this one. For these reasons, we have chosen On-line Analytical Processing (OLAP), a multi-dimensional analysis optimized database, as an analytical platform for these data. OLAP is a mature technology in the financial sector, but it has not been used extensively for scientific analysis. Our project is further more unusual for its focus on the multidimensional and analytical capabilities of OLAP rather than its aggregation capacities. The dimensional data model and hierarchies are very flexible. The query language is concise for complex analysis and rapid data retrieval. OLAP shows great promise for the dynamic protein analysis for bioengineering and biomedical applications. In addition, OLAP may have similar potential for other scientific and engineering applications involving large and complex datasets.

Book
10 Dec 2008
TL;DR: In this paper, the authors present a one-stop guide for transforming disparate data into actionable insight for users throughout the organization by using data mining to identify data patterns, correlations, and clustering.
Abstract: Maximize the Business Intelligence Tools in Microsoft SQL Server 2008 Manage, analyze, and distribute enterprise data with help from this expert resource. Delivering Business Intelligence with Microsoft SQL Server 2008 covers the entire BI lifecycle and explains how to build robust data integration, reporting, and analysis solutions. Real-world examples illustrate all of the powerful BI capabilities of SQL Server 2008. This is your one-stop guide for transforming disparate data into actionable insight for users throughout your organization. Understand the goals and benefits of business intelligence Design and create relational data marts and OLAP cubes Manage Analysis Services databases using BI Development Studio Cleanse data and populate data marts with SQL Server Integration Services Take advantage of the flexibility of the Unified Dimensional Model Manipulate and analyze data using MDX scripts and queries Use data mining to identify data patterns, correlations, and clustering Develop and distribute interactive reports with SQL Server 2008 Reporting Services Integrate business intelligence into enterprise applications using ADOMD.NET and the Report Viewer Control Table of contents Part I: Business Intelligence Chapter 1. Equipping the Organization for Effective Decision Making Chapter 2. Making the Most of What You've Got--Using Business Intelligence Chapter 3. Seeking the Source--The Source of Business Intelligence Chapter 4. One-Stop Shopping--The Unified Dimensional Model Chapter 5. First Steps--Beginning the Development of Business Intelligence Part II: Defining Business Intelligence Structures Chapter 6. Building Foundations--Creating Data Marts Chapter 7. Transformers--Integration Services Structure and Components Chapter 8. Fill'er Up - Using Integration Services for Populating Data Marts Part III: Analyzing Cube Content Chapter 9. Cubism--Measures and Dimensions Chapter 10. Bells and Whistles--Special Features of OLAP Cubes Chapter 11. Writing a New Script--MDX Scripting Chapter 12. Pulling It Out and Building It Up--MDX Queries Part IV: Mining Chapter 13. Panning for Gold--Introduction to Data Mining Chapter 14. Building the Mine--Working with the Data Mining Model Chapter 15. Spelunking--Exploration Using Data Mining Part V: Delivering Chapter 16. On Report--Delivering Business Intelligence with Reporting Systems Chapter 17. Falling into Place--Managing Reporting Systems Reports Chapter 18. Let's Get Together--Excel Pivot Tables and Pivot Charts Index

Proceedings ArticleDOI
09 Jun 2008
TL;DR: A Sampling Cube framework, which efficiently calculates confidence intervals for any multidimensional query and uses the OLAP structure to group similar segments to increase sampling size when needed is proposed.
Abstract: Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are gathered from a sample of roughly 5,000 households. To use the results effectively, the samples are further partitioned in a multidimensional space based on multiple attribute values. This naturally leads to the desirability of OLAP (Online Analytical Processing) over sampling data. However, unlike traditional data, sampling data is inherently uncertain, i.e., not representing the full data in the population. Thus, it is desirable to return not only query results but also the confidence intervals indicating the reliability of the results. Moreover, a certain segment in a multidimensional space may contain none or too few samples. This requires some additional analysis to return trustable results.In this paper we propose a Sampling Cube framework, which efficiently calculates confidence intervals for any multidimensional query and uses the OLAP structure to group similar segments to increase sampling size when needed. Further, to handle high dimensional data, a Sampling Cube Shell method is proposed to effectively reduce the storage requirement while still preserving query result quality.

Proceedings ArticleDOI
25 Mar 2008
TL;DR: This paper develops an algorithm that effectively and efficiently prunes the space of potentially beneficial views and indexes when given realistic-size instances of the OLAP view- and index-selection problem, and provides formal proofs that the resulting integer-programming model is guaranteed to find an optimal solution.
Abstract: In on-line analytical processing (OLAP), precomputing (materializing as views) and indexing auxiliary data aggregations is a common way of reducing query-evaluation time costs for important data-analysis queries. We consider an OLAP view- and index-selection problem stated as an optimization problem, where (i) the inputs include the data-warehouse schema, a set of data-analysis queries of interest, and a storage-limit constraint, and (ii) the output is a set of views and indexes that minimizes the costs of the input queries, subject to the storage limit. While greedy and other heuristic strategies for choosing views or indexes might help to some extent in improving the costs, it is highly nontrivial to arrive at a globally optimum solution, one that reduces the processing costs of typical OLAP queries as much as is theoretically possible. In fact, as observed in [17] and to the best of our knowledge, there is no known approximation algorithm for OLAP view or index selection with nontrivial performance guarantees.In this paper we propose a systematic study of the OLAP view- and index-selection problem. Our specific contributions are as follows: (1) We develop an algorithm that effectively and efficiently prunes the space of potentially beneficial views and indexes when given realistic-size instances of the problem. (2) We provide formal proofs that our pruning algorithm keeps at least one globally optimum solution in the search space, thus the resulting integer-programming model is guaranteed to find an optimal solution. (3) We develop a family of algorithms to further reduce the size of the search space, so that we are able to solve larger problem instances, although we no longer guarantee the global optimality of the resulting solution. (4) Finally, we present an experimental comparison of our proposed approaches with the state-of-the-art approaches of [2, 12]. Our experiments show that our approaches to view and index selection result in high-quality solutions --- in fact, in globally optimum solutions for many realistic-size problem instances. Thus, they compare favorably with the well-known OLAP-centered approach of [12] and provide for a winning combination with the end-to-end framework of [2] for generic view and index selection.

Proceedings ArticleDOI
01 Nov 2008
TL;DR: This paper defines a personalization framework for OLAP database systems based on user context-aware preferences, and considers a qualitative preference model which handles user preferences on the multidimensional schema.
Abstract: Personalization of information systems brings new challenges to OLAP technology. A key characteristic of emerging OLAP database systems will be customizability of their behaviour taking into account users preferences as well as their context of analysis. In this paper we define a personalization framework for OLAP database systems based on user context-aware preferences. We consider a qualitative preference model which handles user preferences on the multidimensional schema. Context is modelled as a tree of multidimensional components of an OLAP analysis (fact, measures, dimensions, parameters). We define some OLAP operations that support personalization. User queries are dynamically enhanced with his preferences and are aware of the analysis context.

Patent
09 May 2008
TL;DR: In this article, the indexing component analyzes attributes involved in predicate conditions of filter requests to form slice indexes for same filtering criteria, wherein resulting data set share the same filter criteria in form of attributes.
Abstract: Systems and methods that employ auxiliary data structures in form of indexes (e.g., slice indexes) to process incoming queries in query retrieval systems (e.g., Online Analytical Processing (OLAP) environments.) The indexing component analyzes attributes involved in predicate conditions of filter requests to form slice indexes for same filtering criteria, wherein resulting data set share the same filtering criteria in form of attributes. The indexes of the subject innovation can be created on-the-fly, and typically without intervention by system administrators.

Proceedings ArticleDOI
11 Feb 2008
TL;DR: This work log user interaction and perform usage mining using OLAP to discover context-dependent preferences for different information types and builds a more generic and adaptive system that automatically selects the most relevant content and presents it to the user in a succinct manner that supports ease of consumption and comprehension.
Abstract: What constitutes relevant information to an individual may vary widely under different contexts. However, previous work on pervasive information systems has mostly focused on context-aware delivery of application-specific information. Such systems are only able to operate within narrow application domains and cannot be generalized to handle other heterogeneous types of information. To fill this gap, we propose a context-aware system for information integration that can handle arbitrary information types and determine their relevance to the user's current context. In contrast to existing model-based approaches to context reasoning, we log user interaction and perform usage mining using OLAP to discover context-dependent preferences for different information types. This allows us to build a more generic and adaptive system that automatically selects the most relevant content and presents it to the user in a succinct manner that supports ease of consumption and comprehension.

01 Jan 2008
TL;DR: A prototype clinical decision support system which combines the strengths of both OLAP and data mining is presented, which provides a rich knowledge environment which is not achievable by using OLAP or data mining alone.
Abstract: The healthcare industry collects huge amounts of data which, unfortunately, are not turned into useful information for effective decision making. Decision support systems (DSS) can now use advanced technologies such as On-Line Analytical Processing (OLAP) and data mining to deliver advanced capabilities. This paper presents a prototype clinical decision support system which combines the strengths of both OLAP and data mining. It provides a rich knowledge environment which is not achievable by using OLAP or data mining alone.

Journal ArticleDOI
TL;DR: A user-centered approach was taken to design a spatial data warehouse and online analytical processing (OLAP) tools for data exploration in ecological research, resulting in a multidimensional data model with a fact table representing biological stream survey measurements and dimension tables representing spatial and categorical site and landscape variables.