Showing papers on "Online analytical processing published in 2009"

PDF

Open Access

Journal Article•DOI•

A Survey of Uncertain Data Algorithms and Applications

[...]

Charu C. Aggarwal¹, Philip S. Yu¹•Institutions (1)

01 May 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper provides a survey of uncertain data mining and management applications, and discusses different methodologies to process and mine uncertain data in a variety of forms.

...read moreread less

Abstract: In recent years, a number of indirect data collection methodologies have lead to the proliferation of uncertain data. Such data points are often represented in the form of a probabilistic function, since the corresponding deterministic value is not known. This increases the challenge of mining and managing uncertain data, since the precise behavior of the underlying data is no longer known. In this paper, we provide a survey of uncertain data mining and management applications. In the field of uncertain data management, we will examine traditional methods such as join processing, query processing, selectivity estimation, OLAP queries, and indexing. In the field of uncertain data mining, we will examine traditional mining problems such as classification and clustering. We will also examine a general transform based technique for mining uncertain data. We discuss the models for uncertain data, and how they can be leveraged in a variety of applications. We discuss different methodologies to process and mine uncertain data in a variety of forms.

...read moreread less

497 citations

Proceedings Article•DOI•

A common database approach for OLTP and OLAP using an in-memory column database

[...]

Hasso Plattner¹•Institutions (1)

Hasso Plattner Institute¹

29 Jun 2009

TL;DR: This paper will question some of the fundamentals of the OLAP and OLTP separation and present a new proposal for an enterprise data management concept that will allow for revolutionize transactional applications while providing an optimal platform for analytical data processing.

...read moreread less

Abstract: When SQL and the relational data model were introduced 25 years ago as a general data management concept, enterprise software migrated quickly to this new technology. It is fair to say that SQL and the various implementations of RDBMSs became the backbone of enterprise systems. In those days. we believed that business planning, transaction processing and analytics should reside in one single system. Despite the incredible improvements in computer hardware, high-speed networks, display devices and the associated software, speed and flexibility remained an issue. The nature of RDBMSs, being organized along rows, prohibited us from providing instant analytical insight and finally led to the introduction of so-called data warehouses. This paper will question some of the fundamentals of the OLAP and OLTP separation. Based on the analysis of real customer environments and experience in some prototype implementations, a new proposal for an enterprise data management concept will be presented. In our proposal, the participants in enterprise applications, customers, orders, accounting documents, products, employees etc. will be modeled as objects and also stored and maintained as such. Despite that, the vast majority of business functions will operate on an in memory representation of their objects. Using the relational algebra and a column-based organization of data storage will allow us to revolutionize transactional applications while providing an optimal platform for analytical data processing. The unification of OLTP and OLAP workloads on a shared architecture and the reintegration of planning activities promise significant gains in application development while simplifying enterprise systems drastically. The latest trends in computer technology -- e.g. blade architecture, multiple CPUs per blade with multiple cores per CPU allow for a significant parallelization of application processes. The organization of data in columns supports the parallel use of cores for filtering and aggregation. Elements of application logic can be implemented as highly efficient stored procedures operating on columns. The vast increase in main memory combined with improvements in L1--, L2--, L3--caching, together with the high data compression rate column storage will allow us to support substantial data volumes on one single blade. Distributing data across multiple blades using a shared nothing approach provides further scalability.

...read moreread less

404 citations

Patent•

Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements

[...]

Reuven Bakalash, Guy Shaked, Joseph Caspi

31 Mar 2009

TL;DR: In this paper, an improved method of and apparatus for joining and aggregating data elements integrated within a relational database management system (RDBMS) using a non-relational multi-dimensional data structure (MDD) is presented.

...read moreread less

Abstract: Improved method of and apparatus for joining and aggregating data elements integrated within a relational database management system (RDBMS) using a non-relational multi-dimensional data structure (MDD). The improved RDBMS system of the present invention can be used to realize achieving a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use. The improved RDBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system or the like.

...read moreread less

265 citations

Patent•

Online analytic processing cube with time stamping

[...]

Steven Wagner¹•Institutions (1)

Thomson Reuters¹

23 Mar 2009

TL;DR: In this article, a system and method are presented for receiving data that include a time stamp, which includes building an Online Analytical Processing (OLAP) cube that includes a dimension, the dimension acting as a schema for the data that includes the time stamp.

...read moreread less

Abstract: In one example embodiment, a system and method are shown for receiving data that include a time stamp. The system and method also include building an Online Analytical Processing (OLAP) cube that includes a dimension, the dimension acting as a schema for the data that include the time stamp. The system and method may also include populating the OLAP cube with an object, the object including the data and the time stamp as at least one attribute. The system and method may also include storing the OLAP cube.

...read moreread less

112 citations

Proceedings Article•

Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases.

[...]

Duo Zhang¹, ChengXiang Zhai¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 2009

TL;DR: A new data model called topic cube is proposed to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database and a heuristic method to speed up the iterative EM algorithm for estimating topic models is proposed.

...read moreread less

Abstract: As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we propose a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and store probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose a heuristic method to speed up the iterative EM algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experiment results show that this heuristic method is much faster than the baseline method of computing each topic cube from scratch. We also discuss potential uses of topic cube and show sample experimental results.

...read moreread less

108 citations

Proceedings Article•DOI•

Query recommendations for OLAP discovery driven analysis

[...]

Arnaud Giacometti¹, Patrick Marcel¹, Elsa Negre¹, Arnaud Soulet¹•Institutions (1)

François Rabelais University¹

06 Nov 2009

TL;DR: This paper presents a framework for a recommender system for OLAP users, that leverages former users' investigations to enhance discovery driven analysis.

...read moreread less

Abstract: Recommending database queries is an emerging and promising field of investigation. This is of particular interest in the domain of OLAP systems where the user is left with the tedious process of navigating large datacubes. In this paper we present a framework for a recommender system for OLAP users, that leverages former users' investigations to enhance discovery driven analysis. The main idea is to recommend to the user the discoveries detected in those former sessions that investigated the same unexpected data as the current session.

...read moreread less

96 citations

Book Chapter•DOI•

Mining Heterogeneous Information Networks by Exploring the Power of Links

[...]

Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

07 Oct 2009

TL;DR: This work explores the power of links at mining heterogeneous information networks with several interesting tasks, including link-based object distinction, veracity analysis, multidimensional online analytical processing of heterogeneous Information networks, and rank-based clustering.

...read moreread less

Abstract: Knowledge is power but for interrelated data, knowledge is often hidden in massive links in heterogeneous information networks. We explore the power of links at mining heterogeneous information networks with several interesting tasks, including link-based object distinction, veracity analysis, multidimensional online analytical processing of heterogeneous information networks, and rank-based clustering. Some recent results of our research that explore the crucial information hidden in links will be introduced, including (1) Distinct for object distinction analysis, (2) TruthFinder for veracity analysis, (3) Infonet-OLAP for online analytical processing of information networks, and (4) RankClus for integrated ranking-based clustering. We also discuss some of our on-going studies in this direction.

...read moreread less

83 citations

Journal Article•DOI•

Graph OLAP: a multi-dimensional framework for graph data analysis

[...]

Chen Chen¹, Xifeng Yan², Feida Zhu¹, Jiawei Han¹, Philip S. Yu³ - Show less +1 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, Santa Barbara², University of Illinois at Chicago³

01 Oct 2009-Knowledge and Information Systems

TL;DR: It is argued that it is critically important to OLAP graph structured data and a novel Graph OLAP framework is proposed, and a discovery-driven multi-dimensional analysis model is proposed to ensure that OLAP is performed in an intelligent manner, guided by expert rules and knowledge discovery processes.

...read moreread less

Abstract: Databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, graphs have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data. Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In particular, OLAP (On-Line Analytical Processing) has been a popular tool for fast and user-friendly multi-dimensional analysis of data warehouses. Can we OLAP graphs? Unfortunately, to our best knowledge, there are no OLAP tools available that can interactively view and analyze graph data from different perspectives and with multiple granularities. In this paper, we argue that it is critically important to OLAP graph structured data and propose a novel Graph OLAP framework. According to this framework, given a graph dataset with its nodes and edges associated with respective attributes, a multi-dimensional model can be built to enable efficient on-line analytical processing so that any portions of the graphs can be generalized/specialized dynamically, offering multiple, versatile views of the data. The contributions of this work are three-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the Graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Second, we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. As we can see, due to the increased structural complexity of data, aggregated graphs that depend on the underlying “network” properties of the graph dataset are much harder to compute than their traditional OLAP counterparts. Third, to provide more flexible, interesting and informative OLAP of graphs, we further propose a discovery-driven multi-dimensional analysis model to ensure that OLAP is performed in an intelligent manner, guided by expert rules and knowledge discovery processes. We outline such a framework and discuss some challenging research issues for discovery-driven Graph OLAP.

...read moreread less

83 citations

Book Chapter•DOI•

Recommending Multidimensional Queries

[...]

Arnaud Giacometti¹, Patrick Marcel¹, Elsa Negre¹•Institutions (1)

François Rabelais University¹

30 Aug 2009

TL;DR: This paper proposes to apply a Collaborative Work approach that leverages former explorations of the cube to recommend OLAP queries, and adapts Approximate String Matching, a technique popular in Information Retrieval, to match the current analysis with theFormer explorations and help suggesting a query to the user.

...read moreread less

Abstract: Interactive analysis of datacube, in which a user navigates a cube by launching a sequence of queries is often tedious since the user may have no idea of what the forthcoming query should be in his current analysis. To better support this process we propose in this paper to apply a Collaborative Work approach that leverages former explorations of the cube to recommend OLAP queries. The system that we have developed adapts Approximate String Matching, a technique popular in Information Retrieval, to match the current analysis with the former explorations and help suggesting a query to the user. Our approach has been implemented with the open source Mondrian OLAP server to recommend MDX queries and we have carried out some preliminary experiments that show its efficiency for generating effective query recommendations.

...read moreread less

74 citations

Patent•

System with a data aggregation module generating aggregated data for responding to OLAP analysis queries in a user transparent manner

[...]

Reuven Bakalash, Guy Shaked, Joseph Caspi

22 Oct 2009

TL;DR: In this article, the authors propose a system for supporting OLAP analysis over a network consisting of an OLAP server and a data aggregation module comprising a multi-dimensional datastore, an aggregation engine integrated with the multidimensional data, and an interface for loading base data from a data source to the aggregation engine.

...read moreread less

Abstract: A system for supporting OLAP analysis over a network. The system comprises an OLAP server for enabling an OLAP user to perform OLAP analysis via interaction with a client machine on the network. The system also includes a data aggregation module comprising a multi-dimensional datastore, an aggregation engine integrated with the multi-dimensional datastore, and a first interface for loading base data from a data source to the aggregation engine. The aggregation engine performs data aggregation operations on loaded base data, generates aggregated data from the base data, and stores the aggregated data in the multi-dimensional datastore. A second interface receives requests for OLAP analysis from the OLAP server, accesses the aggregation engine to retrieve from the multi-dimensional datastore, aggregated data corresponding to requests, and communicates the retrieved aggregated data to the OLAP server for query servicing, in a manner transparent to the OLAP user.

...read moreread less

68 citations

Book Chapter•DOI•

An ETL process for OLAP using RDF/OWL ontologies

[...]

Marko Niinimäki¹, Tapio Niemi¹•Institutions (1)

Helsinki Institute of Physics¹

01 Jan 2009-Journal on Data Semantics

TL;DR: An advanced method for on-demand construction of OLAP cubes for ROLAP systems that contains the steps from cube design to ETL but focuses on ETL, and proposes an ontology based tool that will work as a user interface to the system from design to actual analysis.

...read moreread less

Abstract: In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database. Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps. In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.

...read moreread less

Journal Article•DOI•

Enabling OLAP in mobile environments via intelligent data cube compression techniques

[...]

Alfredo Cuzzocrea¹, Filippo Furfaro¹, Domenico Saccà¹•Institutions (1)

University of Calabria¹

01 Oct 2009

TL;DR: This paper introduces a very effective compression technique for multidimensional data cubes, and the system Hand-OLAP, which exploits this technique to allow handheld devices to extract and browse compressed two-dimensional OLAP views coming from multiddimensional data cubes stored on a remote OLAP server localized on the wired network.

...read moreread less

Abstract: The main drawbacks of handheld devices (small storage space, small size of the display screen, discontinuance of the connection to the WLAN etc) are often incompatible with the need of querying and browsing information extracted from enormous amounts of data which are accessible through the network. In this application scenario, data compression and summarization have a leading role: data in a lossy compressed format can be transmitted more efficiently than the original ones, and can be effectively stored in handheld devices (setting the compression ratio accordingly). In this paper, we introduce a very effective compression technique for multidimensional data cubes, and the system Hand-OLAP, which exploits this technique to allow handheld devices to extract and browse compressed two-dimensional OLAP views coming from multidimensional data cubes stored on a remote OLAP server localized on the wired network. Hand-OLAP effectively and efficiently enables OLAP in mobile environments, and also enlarges the potentialities of Decision Support Systems by taking advantage from the "naturally" decentralized nature of such environments. The idea which the system is based on is: rather than querying the original multidimensional data cubes, it may be more convenient to generate a compressed OLAP view of them, store such view into the handheld device, and query it locally (off-line), thus obtaining approximate answers that are suitable for OLAP applications.

...read moreread less

Journal Article•DOI•

Spatial aggregation: Data model and implementation

[...]

Leticia I. Gómez¹, Sofie Haesevoets, Bart Kuijpers², Alejandro A. Vaisman³•Institutions (3)

Instituto Tecnológico de Buenos Aires¹, University of Hasselt², University of Buenos Aires³

01 Sep 2009-Information Systems

TL;DR: This work defines the notion of geometric aggregation, a general framework for aggregate queries in a GIS setting, and presents an implementation, denoted Piet, which supports four kinds of queries: standard GIS, standard OLAP, geometric aggregation and integrated GIS-OLAP queries.

...read moreread less

Book Chapter•DOI•

Preference-Based Recommendations for OLAP Analysis

[...]

Houssem Jerbi, Franck Ravat, Olivier Teste, Gilles Zurfluh

30 Aug 2009

TL;DR: This paper presents a framework for integrating OLAP and recommendations, and focuses on the anticipatory recommendation process that assists the user during his OLAP analysis by proposing to him the forthcoming analysis step.

...read moreread less

Abstract: This paper presents a framework for integrating OLAP and recommendations. We focus on the anticipatory recommendation process that assists the user during his OLAP analysis by proposing to him the forthcoming analysis step. We present a context-aware preference model that matches decision-makers intuition, and we discuss a preference-based approach for generating personalized recommendations.

...read moreread less

Book•

Data Mining Methods

[...]

Rajan Chattamvelli

30 Jan 2009

TL;DR: This paper presents a meta-analyses of decision trees and their applications in data mining and suggests a number of approaches that could be used in the design of data mining algorithms.

...read moreread less

Abstract: Preface / List of Figures / List of Tables / Basic Concepts in Data Mining / Data Visualisation Techniques / Probability and Statistics / Datawarehousing and OLAP / Decision Trees / Association Rules / Cluster Analysis / Genetic Algorithms / Neural Networks / Web Mining / Support Vector Machines / Latent Semantic Indexing / Appendix-A / Solution to Selected Exercises / Index.

...read moreread less

Journal Article•DOI•

Using contextual information and multidimensional approach for recommendation

[...]

Sung-Shun Weng¹, Binshan Lin², Wen-Tien Chen¹•Institutions (2)

Fu Jen Catholic University¹, Louisiana State University in Shreveport²

01 Mar 2009-Expert Systems With Applications

TL;DR: This study proposes the integrated contextual information as the foundation concept of multidimensional recommendation model, and uses the online analytical processing (OLAP) ability of data warehousing to solve the contradicting problems among hierarchy ratings.

...read moreread less

Abstract: It has been recognized that recommendation system is a very important and indispensable topic in E-commerce. Many famous E-commerce websites utilize recommendation systems to convert browsers into buyers. The forms of recommendation include suggesting products/services to the customer, providing personalized product/service information, summarizing community opinion, and providing community critiques. Personalized recommendation methods are mainly classified into content-based recommendation approach and collaborative filtering recommendation approach. Both recommendation approaches, however, have their own drawbacks. This study proposes the integrated contextual information as the foundation concept of multidimensional recommendation model, and uses the online analytical processing (OLAP) ability of data warehousing to solve the contradicting problems among hierarchy ratings. The evaluation studies show that by establishing additional customer profiles and using multidimensional analyses to find the key factors affecting customer perceptions, the proposed approach increases the recommendation quality.

...read moreread less

Book Chapter•DOI•

Expressing OLAP Preferences

[...]

Matteo Golfarelli¹, Stefano Rizzi¹•Institutions (1)

University of Bologna¹

02 Jun 2009

TL;DR: A preference algebra for OLAP is proposed, that takes into account the three peculiarities of the OLAP domain, and includes both elemental and aggregated facts.

...read moreread less

Abstract: Multidimensional databases play a relevant role in statistical and scientific applications, as well as in business intelligence systems. Their users express complex OLAP queries, often returning huge volumes of facts, sometimes providing little or no information. Thus, expressing preferences could be highly valuable in this domain. The OLAP domain is representative of an unexplored class of preference queries, characterized by three peculiarities: preferences can be expressed on both numerical and categorical domains; they can also be expressed on the aggregation level of facts; the space on which preferences are expressed includes both elemental and aggregated facts. In this paper we propose a preference algebra for OLAP, that takes into account the three peculiarities above.

...read moreread less

Journal Article•DOI•

Parallel OLAP query processing in database clusters with data replication

[...]

Alexandre A. B. Lima, Camille Furtado¹, Patrick Valduriez², Marta Mattoso¹•Institutions (2)

Federal University of Rio de Janeiro¹, French Institute for Research in Computer Science and Automation²

01 Apr 2009-Distributed and Parallel Databases

TL;DR: More efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication are proposed and a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas is proposed.

...read moreread less

Abstract: We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.

...read moreread less

Book Chapter•DOI•

Privacy Preserving OLAP and OLAP Security

[...]

Alfredo Cuzzocrea¹, Vincenzo Russo¹•Institutions (1)

University of Calabria¹

01 Jan 2009

Journal Article•DOI•

A Survey of Open Source Tools for Business Intelligence

[...]

Christian Thomsen¹, Torben Bach Pedersen¹•Institutions (1)

Aalborg University¹

01 Jul 2009-International Journal of Data Warehousing and Mining

TL;DR: In this paper, the authors consider the capabilities of a number of open source tools for BI, including Extract-Transform-Load (ETL) tools, database management systems (DBMSs), On-Line Analytical Processing (OLAP) servers, and OLAP clients.

...read moreread less

Abstract: The industrial use of open source Business Intelligence (BI) tools is becoming more common, but is still not as widespread as for other types of software. It is therefore of interest to explore which possibilities are available for open source BI and compare the tools. In this survey article, we consider the capabilities of a number of open source tools for BI. In the article, we consider a number of Extract-Transform-Load (ETL) tools, database management systems (DBMSs), On-Line Analytical Processing (OLAP) servers, and OLAP clients. We find that, unlike the situation a few years ago, there now exist mature and powerful tools in all these categories. However, the functionality still falls somewhat short of that found in commercial tools.

...read moreread less

Proceedings Article•

Interactive Analysis of Web-Scale Data.

[...]

Christopher Olston¹, Edward Bortnikov², Khaled Elmeleegy³, Flavio Junqueira¹, Benjamin Reed¹ - Show less +1 more•Institutions (3)

Yahoo!¹, Technion – Israel Institute of Technology², Rice University³

01 Jan 2009

TL;DR: The aim is to build a general two-phase query system that can support interactive querying over webscale data and supply specic instantiations of the template.

...read moreread less

Abstract: We consider how to support interactive querying over webscale data. The basic approach is to view querying as a two-phase activity: rst supply a query template, and later supply specic instantiations of the template. Interactive responsiveness is oered in the second phase only. While instances of this problem have been studied in the past, e.g., OLAP and web search, we pursue a more general formulation. Our aim is to build a general two-phase query system.

...read moreread less

Book Chapter•DOI•

CAMS: OLAPing Multidimensional Data Streams Efficiently

[...]

Alfredo Cuzzocrea¹•Institutions (1)

University of Calabria¹

30 Aug 2009

TL;DR: Both analytical and experimental results clearly connote CAMS as an enabling component for next-generation Data Stream Management Systems.

...read moreread less

Abstract: In the context of data stream research, taming the multidimensionality of real-life data streams in order to efficiently support OLAP analysis/mining tasks is a critical challenge. Inspired by this fundamental motivation, in this paper we introduce CAMS ( C ube-based A cquisition model for M ultidimensional S treams ), a model for efficiently OLAPing multidimensional data streams . CAMS combines a set of data stream processing methodologies, namely (i ) the OLAP dimension flattening process , which allows us to obtain dimensionality reduction of multidimensional data streams, and (ii ) the OLAP stream aggregation scheme , which aggregates data stream readings according to an OLAP-hierarchy-based membership approach. We complete our analytical contribution by means of experimental assessment and analysis of both the efficiency and the scalability of OLAPing capabilities of CAMS on synthetic multidimensional data streams. Both analytical and experimental results clearly connote CAMS as an enabling component for next-generation Data Stream Management Systems .

...read moreread less

Journal Article•DOI•

Fragmenting very large XML data warehouses via K-means clustering algorithm

[...]

Alfredo Cuzzocrea¹, Jérôme Darmont², Hadj Mahboubi²•Institutions (2)

University of Calabria¹, University of Lyon²

01 Nov 2009-International Journal of Business Intelligence and Data Mining

TL;DR: This paper proposes the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML data warehouses and complements the analytical contribution with a comprehensive experimental assessment where the efficiency of the proposal is compared against existing fragmentation algorithms.

...read moreread less

Abstract: XML data sources are gaining popularity in the context of Business Intelligence and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing complex and heterogeneous data. However, XML-native database systems currently suffer from limited performance, both in terms of volumes of manageable data and query response time. Therefore, recent research efforts are focusing on horizontal fragmentation techniques, which are able to overcome the above limitations. However, classical fragmentation algorithms are not suitable to control the number of originated fragments, which instead plays a critical role in data warehouses. In this paper, we propose the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML data warehouses. We complement our analytical contribution with a comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms.

...read moreread less

Journal Issue•DOI•

Topic modeling for OLAP on multidimensional text databases: topic cube and its applications

[...]

Duo Zhang¹, ChengXiang Zhai¹, Jiawei Han¹, Ashok N. Srivastava², Nikunj C. Oza² - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Ames Research Center²

01 Dec 2009-Statistical Analysis and Data Mining

TL;DR: A new data model called topic cube is studied to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database and proposes two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models.

...read moreread less

Abstract: As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. Although online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we study a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and stores probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experimental results show that these heuristic aggregations are much faster than the baseline method of computing each topic cube from scratch. We also discuss some potential uses of topic cube and show sample experimental results. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 378-395, 2009

...read moreread less

Journal Article•DOI•

Compression and Aggregation for Logistic Regression Analysis in Data Cubes

[...]

Ruibin Xi¹, Nan Lin¹, Yixin Chen¹•Institutions (1)

Washington University in St. Louis¹

01 Apr 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel scheme to compress the data in such a way that it can reconstruct logistic regression models to answer any OLAP query without accessing the raw data, and it is proved that the compression is nearly lossless in the sense that the aggregated estimator deviates from the true model by an error that is bounded and approaches to zero when the data size increases.

...read moreread less

Abstract: Logistic regression is an important technique for analyzing and predicting data with categorical attributes. In this paper, We consider supporting online analytical processing (OLAP) of logistic regression analysis for multi-dimensional data in a data cube where it is expensive in time and space to build logistic regression models for each cell from the raw data. We propose a novel scheme to compress the data in such a way that we can reconstruct logistic regression models to answer any OLAP query without accessing the raw data. Based on a first-order approximation to the maximum likelihood estimating equations, we develop a compression scheme that compresses each base cell into a small compressed data block with essential information to support the aggregation of logistic regression models. Aggregation formulae for deriving high-level logistic regression models from lower level component cells are given. We prove that the compression is nearly lossless in the sense that the aggregated estimator deviates from the true model by an error that is bounded and approaches to zero when the data size increases. The results show that the proposed compression and aggregation scheme can make feasible OLAP of logistic regression in a data cube. Further, it supports real-time logistic regression analysis of stream data, which can only be scanned once and cannot be permanently retained. Experimental results validate our theoretical analysis and demonstrate that our method can dramatically save time and space costs with almost no degradation of the modeling accuracy.

...read moreread less

Journal Article•DOI•

Easier surveillance of climate-related health vulnerabilities through a Web-based spatial OLAP application

[...]

Eveline Bernier¹, Pierre Gosselin¹, Thierry Badard¹, Yvan Bédard¹•Institutions (1)

Laval University¹

03 Apr 2009-International Journal of Health Geographics

TL;DR: A spatio-temporal web-based application that goes beyond GIS applications with regard to speed, ease of use, and interactive analysis capabilities, and supports the multi-scale exploration and analysis of integrated socio-economic, health and environmental geospatial data over several periods.

...read moreread less

Abstract: Climate change has a significant impact on population health. Population vulnerabilities depend on several determinants of different types, including biological, psychological, environmental, social and economic ones. Surveillance of climate-related health vulnerabilities must take into account these different factors, their interdependence, as well as their inherent spatial and temporal aspects on several scales, for informed analyses. Currently used technology includes commercial off-the-shelf Geographic Information Systems (GIS) and Database Management Systems with spatial extensions. It has been widely recognized that such OLTP (On-Line Transaction Processing) systems were not designed to support complex, multi-temporal and multi-scale analysis as required above. On-Line Analytical Processing (OLAP) is central to the field known as BI (Business Intelligence), a key field for such decision-support systems. In the last few years, we have seen a few projects that combine OLAP and GIS to improve spatio-temporal analysis and geographic knowledge discovery. This has given rise to SOLAP (Spatial OLAP) and a new research area. This paper presents how SOLAP and climate-related health vulnerability data were investigated and combined to facilitate surveillance. Based on recent spatial decision-support technologies, this paper presents a spatio-temporal web-based application that goes beyond GIS applications with regard to speed, ease of use, and interactive analysis capabilities. It supports the multi-scale exploration and analysis of integrated socio-economic, health and environmental geospatial data over several periods. This project was meant to validate the potential of recent technologies to contribute to a better understanding of the interactions between public health and climate change, and to facilitate future decision-making by public health agencies and municipalities in Canada and elsewhere. The project also aimed at integrating an initial collection of geo-referenced multi-scale indicators that were identified by Canadian specialists and end-users as relevant for the surveillance of the public health impacts of climate change. This system was developed in a multidisciplinary context involving researchers, policy makers and practitioners, using BI and web-mapping concepts (more particularly SOLAP technologies), while exploring new solutions for frequent automatic updating of data and for providing contextual warnings for users (to minimize the risk of data misinterpretation). According to the project participants, the final system succeeds in facilitating surveillance activities in a way not achievable with today's GIS. Regarding the experiments on frequent automatic updating and contextual user warnings, the results obtained indicate that these are meaningful and achievable goals but they still require research and development for their successful implementation in the context of surveillance and multiple organizations. Surveillance of climate-related health vulnerabilities may be more efficiently supported using a combination of BI and GIS concepts, and more specifically, SOLAP technologies (in that it facilitates and accelerates multi-scale spatial and temporal analysis to a point where a user can maintain an uninterrupted train of thought by focussing on "what" she/he wants (not on "how" to get it) and always obtain instant answers, including to the most complex queries that take minutes or hours with OLTP systems (e.g., aggregated, temporal, comparative)). The developed system respects Newell's cognitive band of 10 seconds when performing knowledge discovery (exploring data, looking for hypotheses, validating models). The developed system provides new operators for easily and rapidly exploring multidimensional data at different levels of granularity, for different regions and epochs, and for visualizing the results in synchronized maps, tables and charts. It is naturally adapted to deal with multiscale indicators such as those used in the surveillance community, as confirmed by this project's end-users.

...read moreread less

Book Chapter•DOI•

A Conceptual Modeling Approach for OLAP Personalization

[...]

Irene Garrigós¹, Jesús Pardillo¹, Jose-Norberto Mazón¹, Juan Trujillo¹•Institutions (1)

University of Alicante¹

10 Nov 2009

TL;DR: A novel approach to personalizing OLAP systems at the conceptual level based on the underlying multidimensional model of the data warehouse, a user model and a set of personalization rules is presented.

...read moreread less

Abstract: Data warehouses rely on multidimensional models in order to provide decision makers with appropriate structures to intuitively analyze data with OLAP technologies However, data warehouses may be potentially large and multidimensional structures become increasingly complex to be understood at a glance Even if a departmental data warehouse (also known as data mart) is used, these structures would be also too complex As a consequence, acquiring the required information is more costly than expected and decision makers using OLAP tools may get frustrated In this context, current approaches for data warehouse design are focused on deriving a unique OLAP schema for all analysts from their previously stated information requirements, which is not enough to lighten the complexity of the decision making process To overcome this drawback, we argue for personalizing multidimensional models for OLAP technologies according to the continuously changing user characteristics, context, requirements and behaviour In this paper, we present a novel approach to personalizing OLAP systems at the conceptual level based on the underlying multidimensional model of the data warehouse, a user model and a set of personalization rules The great advantage of our approach is that a personalized OLAP schema is provided for each decision maker contributing to better satisfy their specific analysis needs Finally, we show the applicability of our approach through a sample scenario based on our CASE tool for data warehouse development

...read moreread less

Book•

New Trends In Data Warehousing And Data Analysis

[...]

Stanislaw Kozielski, Robert Wrembel

01 Jan 2009

TL;DR: The objective of NEW TRENDS in data warehousing and data analysis is to bring together the most recent research and practical achievements in the DW and OLAP technologies, to open and discuss new, just emerging areas of further development.

...read moreread less

Abstract: Most of modern enterprises, institutions, and organizations rely on knowledge-based management systems. In these systems, knowledge is gained from data analysis. Nowadays, knowledge-based management systems include data warehouses as their core components. The purpose of building a data warehouse is twofold. Firstly, to integrate multiple heterogeneous, autonomous, and distributed data sources within an enterprise. Secondly, to provide a platform for advanced, complex, and efficient data analysis. Data integrated in a data warehouse are analyzed by the so-called On-Line Analytical Processing (OLAP) applications designed among others for discovering trends, patterns of behavior, and anomalies as well as for finding dependencies between data. Massive amounts of integrated data and the complexity of integrated data that more and more often come from WEB-based, XML-based, spatio-temporal, object, and multimedia systems, make data integration and processing challenging. The objective of NEW TRENDS IN DATA WAREHOUSING AND DATA ANALYSIS is fourfold: First, to bring together the most recent research and practical achievements in the DW and OLAP technologies. Second, to open and discuss new, just emerging areas of further development. Third, to provide the up-to-date bibliography of published works and the resource of research achievements for anyone interested in up-to-date data warehouse issues. And, finally, to assist in the dissemination of knowledge in the field of advanced DW and OLAP.

...read moreread less

Proceedings Article•DOI•

A multidimensional model representing continuous fields in spatial data warehouses

[...]

Alejandro A. Vaisman¹, Esteban Zimányi²•Institutions (2)

University of Buenos Aires¹, Université libre de Bruxelles²

04 Nov 2009

TL;DR: This paper extends a conceptual multidimensional model with continuous fields, showing that this can be achieved by defining an appropriate data type that encapsulates the different operations needed for manipulating such fields, and defines a query language based on relational calculus that allows expressing spatial OLAP queries involving continuous fields.

...read moreread less

Abstract: Data warehouses and On-Line Analytical Processing (OLAP) provide an analysis framework supporting the decision making process. In many application domains, complex analysis tasks often require to take geographical information into account. Several proposals exist for integrating OLAP and Geographic Information Systems (GIS). However, there are very few attempts to support continuous fields, i.e., phenomena that are perceived as having a value at each point in space and/or time. Examples of such phenomena include temperature, altitude, or land use. In this paper, we extend a conceptual multidimensional model with continuous fields, showing that this can be achieved by defining an appropriate data type that encapsulates the different operations needed for manipulating such fields. We also define a query language based on relational calculus that allows expressing spatial OLAP queries involving continuous fields, and use this language to formally characterize this class of queries.

...read moreread less

Evaluating XML-Extended OLAP Queries Based on Physical Algebra.

[...]

Xuepeng Yin¹, Torben Bach Pedersen¹•Institutions (1)

Aalborg University¹

01 Jan 2009

TL;DR: Previous work on the logical federation of OLAP and XML data sources is extended by presenting a simplified query semantics, a physical query algebra and a robust OLAP-XML query engine.

...read moreread less

Abstract: In todayâ€™s OLAP systems, physically integrating fast-changing data (e.g., stock quotes) into a cube is complex and time-consuming. The data is likely to be available in XML format on the World Wide Web (WWW); thus, instead of physical integration, making XML data logically federated with OLAP systems is desirable. In this article, we extend previous work on the logical federation of OLAP and XML data sources by presenting simplified query semantics, a physical query algebra, and a robust OLAP-XML query engine, as well as the query evaluation techniques. Performance experiments with a prototypical implementation suggest that the performance for OLAP-XML federations is comparable to queries on physically integrated data.

...read moreread less

Collapse