Showing papers on "Online analytical processing published in 2010"

PDF

Open Access

Proceedings Article•DOI•

An evaluation of alternative architectures for transaction processing in the cloud

[...]

Donald Kossmann¹, Tim Kraska¹, Simon Loesing¹•Institutions (1)

06 Jun 2010

TL;DR: The focus of this work is on transaction processing (i.e., read and update workloads), rather than analytics or OLAP workloads, which have recently gained a great deal of attention.

...read moreread less

Abstract: Cloud computing promises a number of advantages for the deployment of data-intensive applications. One important promise is reduced cost with a pay-as-you-go business model. Another promise is (virtually) unlimited throughput by adding servers if the workload increases. This paper lists alternative architectures to effect cloud computing for database applications and reports on the results of a comprehensive evaluation of existing commercial cloud services that have adopted these architectures. The focus of this work is on transaction processing (i.e., read and update workloads), rather than analytics or OLAP workloads, which have recently gained a great deal of attention. The results are surprising in several ways. Most importantly, it seems that all major vendors have adopted a different architecture for their cloud services. As a result, the cost and performance of the services vary significantly depending on the workload.

...read moreread less

291 citations

Journal Article•DOI•

Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support

[...]

Xuezhong Zhou¹, Shibo Chen, Baoyan Liu, Runsun Zhang², Yinghui Wang², Ping Li², Yufeng Guo², Hua Zhang³, Zhuye Gao³, Xiufeng Yan² - Show less +6 more•Institutions (3)

Beijing Jiaotong University¹, Peking Union Medical College², Beijing University of Chinese Medicine³

01 Feb 2010-Artificial Intelligence in Medicine

TL;DR: The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.

...read moreread less

210 citations

Journal Article•DOI•

A framework for multidimensional design of data warehouses from ontologies

[...]

Oscar Romero¹, Alberto Abelló¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Nov 2010

TL;DR: A user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks is introduced, based on a reengineering process that derives the multiddimensional schema from a conceptual formalization of the domain.

...read moreread less

Abstract: The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources. Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration and analysis of the data sources. However, the task to elicit the end-user information requirements might result in a thorough task. Importantly, in the data warehousing context, the analysis capabilities of the target data warehouse depend on what kind of data is available in the data sources. Thus, in those scenarios where the analysis capabilities of the data sources are not (fully) known, it is possible to help the data warehouse designer to identify and elicit unknown analysis capabilities. In this paper we introduce a user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks. Our proposal is based on a reengineering process that derives the multidimensional schema from a conceptual formalization of the domain. It starts by fully analyzing the data sources to identify, without considering requirements yet, the multidimensional knowledge they capture (i.e., data likely to be analyzed from a multidimensional point of view). Next, we propose to exploit this knowledge in order to support the requirements elicitation task. In this way, we are already conciliating requirements with the data sources, and we are able to fully exploit the analysis capabilities of the sources. Once requirements are clear, we automatically create the data warehouse conceptual schema according to the multidimensional knowledge extracted from the sources.

...read moreread less

96 citations

Algebraic and Graphic Languages for OLAP Manipulations.

[...]

Franck Ravat¹, Olivier Teste¹, Ronan Tournier¹, Gilles Zurfluh¹•Institutions (1)

University of Toulouse¹

01 Jan 2010

TL;DR: In this paper, a query algebra for multidimensional analyses is presented, which supports complex analyses through advanced operators and binary operators, and a graphical language, based on this algebra, is also provided to ease the specification of multi-dimensional queries.

...read moreread less

Abstract: This article deals with multidimensional analyses. Analyzed data are designed according to a conceptual model as a constellation of facts and dimensions, which are composed of multi-hierarchies. This model supports a query algebra defining a minimal core of operators, which produce multidimensional tables for displaying analyzed data. This user-oriented algebra supports complex analyses through advanced operators and binary operators. A graphical language, based on this algebra, is also provided to ease the specification of multidimensional queries. These graphical manipulations are expressed from a constellation schema and they produce multidimensional tables.

...read moreread less

91 citations

Journal Article•DOI•

Multidimensional Databases and Data Warehousing

[...]

Christian S. Jensen¹, Torben Bach Pedersen², Christian Thomsen²•Institutions (2)

Aarhus University¹, Aalborg University²

13 Sep 2010-Synthesis Lectures on Data Management

TL;DR: The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing.

...read moreread less

Abstract: The present book's subject is multidimensional data models and data modeling concepts as they are applied in real data warehouses. The book aims to present the most important concepts within this subject in a precise and understandable manner. The book's coverage of fundamental concepts includes data cubes and their elements, such as dimensions, facts, and measures and their representation in a relational setting; it includes architecture-related concepts; and it includes the querying of multidimensional databases. The book also covers advanced multidimensional concepts that are considered to be particularly important. This coverage includes advanced dimension-related concepts such as slowly changing dimensions, degenerate and junk dimensions, outriggers, parent-child hierarchies, and unbalanced, non-covering, and non-strict hierarchies. The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing. The book ends with a chapter that presents the literature on which the book is based and offers further readings for those readers who wish to engage in more in-depth study of specific aspects of the book's subject. Table of Contents: Introduction / Fundamental Concepts / Advanced Concepts / Implementation Issues / Further Readings

...read moreread less

89 citations

Journal Article•DOI•

Automatic validation of requirements to support multidimensional design

[...]

Oscar Romero¹, Alberto Abelló¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Sep 2010

TL;DR: The most relevant step in the framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements, and is a fully automatic approach that handles and analyzes the end- user requirements automatically.

...read moreread less

Abstract: It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm. In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.

...read moreread less

85 citations

Journal Article•

Assessing Benefits of Business Intelligence Systems – A Case Study

[...]

Borut Hočevar, Jurij Jaklič¹•Institutions (1)

University of Ljubljana¹

11 Jun 2010-Management : Journal of Contemporary Management Issues

TL;DR: In this paper, the authors present a case study in which the evaluation of an investment in on-line analytical processing (OLAP) technology in the company Melamin was made through an qualitative approach.

...read moreread less

Abstract: Several arguments can be found in business intelligence literature that the use of business intelligence systems can bring multiple benefits, for example, via faster and easier access to information, savings in information technology (‘IT’) and greater customer satisfaction all the way through to the improved competitiveness of enterprises. Yet, most of these benefits are often very difficult to measure because of their indirect and delayed effects on business success. On top of the difficulties in justifying investments in information technology (‘IT’), particularly business intelligence (‘BI’), business executives generally want to know whether the investment is worth the money and if it can be economically justified. In looking for an answer to this question, various methods of evaluating investments can be employed. We can use the classic return on investment (‘ROI’) calculation, cost-benefit analysis, the net present value (‘NPV’) method, the internal rate of return (‘IRR’) and others. However, it often appears in business practice that the use of these methods alone is inappropriate, insufficient or unfeasible for evaluating an investment in business intelligence systems. Therefore, for this purpose, more appropriate methods are those based mainly on a qualitative approach, such as case studies, empirical analyses, user satisfaction analyses, and others that can be employed independently or can help us complete the whole picture in conjunction with the previously mentioned methods. Since there is no universal approach to the evaluation of an investment in information technology and business intelligence, it is necessary to approach each case in a different way based on the specific circumstances and purpose of the evaluation. This paper presents a case study in which the evaluation of an investment in on-line analytical processing (‘OLAP’) technology in the company Melamin was made through an

...read moreread less

81 citations

Journal Article•DOI•

When Spatial Analysis Meets OLAP: Multidimensional Model and Operators

[...]

Maryvonne Miquel¹, Sandro Bimonte, François Pinet, Anne Tchounikine¹•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

01 Oct 2010-International Journal of Data Warehousing and Mining

TL;DR: The authors propose the GeoCube model, which enriches the SOLAP concepts of spatial measure and spatial dimension and take into account the semantic component of geographic information.

...read moreread less

Abstract: Introducing spatial data into multidimensional models leads to the concept of Spatial OLAP SOLAP. Existing SOLAP models do not completely integrate the semantic component of geographic information alphanumeric attributes and relationships or the flexibility of spatial analysis into multidimensional analysis. In this paper, the authors propose the GeoCube model and its associated operators to overcome these limitations. GeoCube enriches the SOLAP concepts of spatial measure and spatial dimension and take into account the semantic component of geographic information. The authors define geographic measures and dimensions as geographic and/or complex objects belonging to hierarchy schemas. GeoCube's algebra extends SOLAP operators with five new operators, i.e., Classify, Specialize, Permute, OLAP-Buffer and OLAP-Overlay. In addition to classical drill-and-slice OLAP operators, GeoCube provides two operators for navigating the hierarchy of the measures, and two spatial analysis operators that dynamically modify the structure of the geographic hypercube. Finally, to exploit the symmetrical representation of dimensions and measures, GeoCube provides an operator capable of permuting dimension and measure. In this paper, GeoCube is presented using environmental data on the pollution of the Venetian Lagoon.

...read moreread less

63 citations

Development and implementation of air quality data mart for Ontario, Canada : a case study of air quality in Ontario using OLAP tool

[...]

Samira Muhammad

01 Jan 2010

TL;DR: In this paper, the authors describe the development and implementation of Air Quality Data Mart for Ontario and Canada using Online Analytical Processing (OLAP) tool and evaluate the functionality of the tool by extracting the data across several dimensions.

...read moreread less

Abstract: This thesis describes the development and implementation of Air Quality Data Mart for Ontario Canada using Online Analytical Processing (OLAP) tool. It is followed by a case study which presents comparisons of air quality between the urban and rural areas, peak and non-peak hours, working days and weekends for various cities in Ontario. The purpose of this study is to develop a user friendly tool for historical air quality data and evaluate the functionality of the tool by extracting the data across several dimensions. The data for air quality is available on the Ontario Ministry of Environment website for 43 monitoring stations across Ontario. This data is in the form of static Hyper Text Markup Language (HTML) pages which cannot be used for analytical purposes. Air quality data mart was developed using open source OLAP. The database was designed using multidimensional modeling approach. OLAP server “Mondrian” was used as the presentation server whereas “Openi” client was used as an end user tool for this study. The different functions available in this data mart are: rollup, drill down and slice and dice the data across several dimensions such as time, location and pollutant. The most important conclusion of this thesis is the successful implementation of an air quality data mart with the possibility to extract accurate historical air quality data. The data in the form of a data mart provides numerous advantages, where it can be analyzed according to the required analytical perspective for a given city/cities. The only drawback of having data in the form of a data mart is that, if the data is drilled down to the finest precision i.e. to the hour (depending on the number of dimensions selected) the resulting chart will be very crowded but the generated report will present a complete overview of the analysis.

...read moreread less

60 citations

Book•

Business Intelligence Techniques: A Perspective from Accounting and Finance

[...]

Murugan Anandarajan, Asokan Anandarajan, Cadambi A. Srinivasan

07 Dec 2010

TL;DR: Business IntelligenceTechniques is a compilation of chapters written by experts in the various areas that provide a comprehensive overview of how to exploit accounting data in the business environment.

...read moreread less

Abstract: Modern businesses generate huge volumes of accounting data on a daily basis. The recent advancements in information technology have given organizations the ability to capture and store data in an efficient and effective manner. However, there is a widening gap between this data storage and usage of the data. Business intelligence techniques can help an organization obtain and process relevant accounting data quickly and cost efficiently. Such techniques include: query and reporting tools, online analytical processing (OLAP), statistical analysis, text mining, data mining, and visualization. Business IntelligenceTechniques is a compilation of chapters written by experts in the various areas. While these chapters stand on their own, taken together they provide a comprehensive overview of how to exploit accounting data in the business environment.

...read moreread less

60 citations

Journal Article•DOI•

Event-based lossy compression for effective and efficient OLAP over data streams

[...]

Alfredo Cuzzocrea¹, Sharma Chakravarthy²•Institutions (2)

University of Calabria¹, University of Texas at Arlington²

01 Jul 2010

TL;DR: The compression strategy proposed in ECM-DS puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario.

...read moreread less

Abstract: An innovative event-based lossy compression model for effective and efficient OLAP over data streams, called ECM-DS, is presented and experimentally assessed in this paper. The main novelty of our compression approach with respect to traditional data stream compression techniques relies on exploiting the semantics of the reference application scenario in order to drive the compression process by means of the ''degree of interestingness'' of events occurring in the target stream. This finally improves the quality of retrieved approximate answers to OLAP queries over data streams, and, in turn, the quality of complex knowledge discovery tasks over data streams developed on top of ECM-DS, and implemented via ad-hoc data stream mining algorithms. Overall, the compression strategy we propose in this research puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario. Finally, a comprehensive experimental evaluation over several classes of data stream sets clearly confirms the benefits deriving from the event-based data stream compression approach proposed in ECM-DS.

...read moreread less

Data warehousing, data mining, olap and oltp technologies are essential elements to support decision-making process in industries

[...]

Rallabandi Srinivasu, M. Poorna Chander Rao, Srikanth Reddy Rikkula

01 Jan 2010

TL;DR: This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, applications and the architecture of Data Warehousing.

...read moreread less

Abstract: This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, applications and the architecture of Data Warehousing. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases. Data warehouses provide on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data mining. Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. OLTP is customer-oriented and is used for transaction and query processing by clerks, clients and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives and analysts. Data warehousing and OLAP have emerged as leading technologies that facilitate data storage, organization and then, significant retrieval. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications.

...read moreread less

Proceedings Article•DOI•

Balancing accuracy and privacy of OLAP aggregations on data cubes

[...]

Alfredo Cuzzocrea¹, Domenico Saccà¹•Institutions (1)

University of Calabria¹

30 Oct 2010

TL;DR: An innovative framework based on flexible sampling-based data cube compression techniques for computing privacy preserving OLAP aggregations on data cubes while allowing approximate answers to be efficiently evaluated over such aggregations is proposed.

...read moreread less

Abstract: In this paper we propose an innovative framework based on flexible sampling-based data cube compression techniques for computing privacy preserving OLAP aggregations on data cubes while allowing approximate answers to be efficiently evaluated over such aggregations. In our proposal, this scenario is accomplished by means of the so-called accuracy/privacy contract, which determines how OLAP aggregations must be accessed throughout balancing accuracy of approximate answers and privacy of sensitive ranges of multidimensional data.

...read moreread less

Proceedings Article•DOI•

Mining knowledge from databases: an information network analysis approach

[...]

Jiawei Han¹, Yizhou Sun¹, Xifeng Yan², Philip S. Yu³•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, Santa Barbara², University of Illinois at Chicago³

06 Jun 2010

TL;DR: This tutorial presents an organized picture on how to turn a database into one or a set of organized heterogeneous information networks, how information networks can be used for data cleaning, data consolidation, and data qualify improvement, and how to discover various kinds of knowledge from information networks.

...read moreread less

Abstract: Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. In this tutorial, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency, facilitate data integration, and generate interesting knowledge. This tutorial presents an organized picture on how to turn a database into one or a set of organized heterogeneous information networks, how information networks can be used for data cleaning, data consolidation, and data qualify improvement, how to discover various kinds of knowledge from information networks, how to perform OLAP in information networks, and how to transform database data into knowledge by information network analysis. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks.

...read moreread less

Journal Article•DOI•

Processing online analytics with classification and association rule mining

[...]

Amy H. L. Lim¹, Chien-Sing Lee¹•Institutions (1)

Multimedia University¹

01 Apr 2010-Knowledge Based Systems

TL;DR: A decision support system containing the methodology, Weighted and Layered workflow evaluation (WaLwFA), extended to incorporate business intelligence using C4.5 and association rule algorithms is described.

...read moreread less

Abstract: Business performance measurements, decision support systems (DSS) and online analytical processing (OLAP) have a common goal i.e., to assist decision-makers during the decision-making process. Integrating DSS and OLAP into existing business performance measurements hopes to improve the accuracy of analysis and provide in-depth, multi-angle view of data. This paper describes a decision support system containing our methodology, Weighted and Layered workflow evaluation (WaLwFA), extended to incorporate business intelligence using C4.5 and association rule algorithms. C4.5 produces more comprehensible decision trees by showing only important attributes. Furthermore, C4.5 can be transformed into IF-THEN rules. However, association rules are preferred as data can be described in rules of multiple granularities. Sorting rules based on rules' complexities permits OLAP to navigate through layers of complexities to extract rules of relevant sizes and to view data from multidimensional perspectives in each layer. Experimental results on an airline domain are presented.

...read moreread less

Journal Article•DOI•

Extending OCL for OLAP querying on conceptual multidimensional models of data warehouses

[...]

Jesús Pardillo¹, Jose-Norberto Mazón¹, Juan Trujillo¹•Institutions (1)

University of Alicante¹

01 Mar 2010-Information Sciences

TL;DR: A solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators that can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model.

...read moreread less

Journal Article•DOI•

Business intelligence for small and middle-sized entreprises

[...]

Oksana Grabova¹, Jérôme Darmont¹, Jean-Hugues Chauchat¹, Iryna Zolotaryova²•Institutions (2)

University of Lyon¹, Kharkiv National University of Economics²

06 Dec 2010

TL;DR: In this article, the authors review web-based business intelligence approaches for small and middle-sized enterprises in decision making, and discuss the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools).

...read moreread less

Abstract: Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises.Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also review in-memory processing.Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.

...read moreread less

Proceedings Article•DOI•

Towards scalable RDF graph analytics on MapReduce

[...]

Padmashree Ravindra¹, Vikas V. Deshpande¹, Kemafor Anyanwu¹•Institutions (1)

North Carolina State University¹

26 Apr 2010

TL;DR: UDFs are proposed that re-factor analytical processing on RDF graphs in a way that enables more parallelized processing and perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan.

...read moreread less

Abstract: In order to exploit the growing amount of RDF data in decision-making, there is an increasing demand for analytics-style processing of such data RDF data is modeled as a labeled graph that represents a collection of binary relations (triples) In this context, analytical queries can be interpreted as consisting of three main constructs namely pattern matching, grouping and aggregation, and require several join operations to reassemble them into n-ary relations relevant to the given query, unlike traditional OLAP systems where data is suitably organized MapReduce-based parallel processing systems like Pig have gained success in processing scalable analytical workloads However, these systems offer only relational algebra style operators which would require an iterative n-tuple reassembly process in which intermediate results need to be materialized This leads to high I/O costs that negatively impacts performance In this paper, we propose UDFs that (i) re-factor analytical processing on RDF graphs in a way that enables more parallelized processing (ii) perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan These functions have been integrated into the Pig Latin function library and the experimental results show up to 50% improvement in execution times for certain classes of queries An important impact of this work is that it could serve as the foundation for additional physical operators in systems such as Pig for more efficient graph processing

...read moreread less

Journal Article•DOI•

Finding an application-appropriate model for XML data warehouses

[...]

Franck Ravat¹, Olivier Teste¹, Ronan Tournier¹, Gilles Zurfluh¹•Institutions (1)

University of Toulouse¹

01 Sep 2010-Information Systems

TL;DR: This survey paper presents an overview of the different proposals that use XML within data warehousing technology, which range from using XML data sources for regular warehouses to those using full XML warehousing solutions.

...read moreread less

Proceedings Article•DOI•

T-Warehouse: Visual OLAP analysis on trajectory data

[...]

Luca Leonardi, Gerasimos Marketos¹, Elias Frentzos¹, Nikos Giatrakos¹, Salvatore Orlando, Nikos Pelekis¹, Alessandra Raffaetà, Alessandro Roncato, Claudio Silvestri, Yannis Theodoridis¹ - Show less +6 more•Institutions (1)

University of Piraeus¹

01 Mar 2010

TL;DR: This work demonstrates a framework that transforms the traditional data cube model into a trajectory warehouse, T-WAREHOUSE, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.

...read moreread less

Abstract: Technological advances in sensing technologies and wireless telecommunication devices enable novel research fields related to the management of trajectory data. As it usually happens in the data management world, the challenge after storing the data is the implementation of appropriate analytics for extracting useful knowledge. However, traditional data warehousing systems and techniques were not designed for analyzing trajectory data. Thus, in this work, we demonstrate a framework that transforms the traditional data cube model into a trajectory warehouse. As a proof-of-concept, we implemented T-WAREHOUSE, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.

...read moreread less

Proceedings Article•DOI•

Exploring graphics processing units as parallel coprocessors for online aggregation

[...]

Tobias Lauer¹, Amitava Datta², Zurab Khadikov, Christoffer Anselm•Institutions (2)

University of Freiburg¹, University of Western Australia²

30 Oct 2010

TL;DR: This work presents a GPU-based cube data structure and algorithms for fast multidimensional aggregation, implemented using Nvidia's CUDA framework, and shows a substantial speedup over state-of-the-art sequential algorithms.

...read moreread less

Abstract: Multidimensional aggregation is one of the most important computational building blocks and hence also a potential performance bottleneck in Online Analytic Processing (OLAP). In order to deliver fast query responses for interactive operations such as slicing, dicing, roll-up and drill-down, it is essential that aggregates along the relevant dimensions of a data cube can be calculated as efficiently as possible. General-purpose computing on graphics processing units (GPGPU) is a recent trend used in many computing domains with the potential for tremendous speedups through the massively data-parallel computation available on such devices. We present a GPU-based cube data structure and algorithms for fast multidimensional aggregation, implemented using Nvidia's CUDA framework. Our experimental tests show a substantial speedup over state-of-the-art sequential algorithms. Moreover, the performance gain is particularly high in cases exposing the weaknesses of traditional algorithms, i.e. when the number of base cells involved in an aggregation is large.

...read moreread less

Proceedings Article•DOI•

E-Cube: Multi-dimensional event sequence processing using concept and pattern hierarchies

[...]

Mo Liu¹, Elke A. Rundensteiner¹, Kara Greenfield¹, Chetan Gupta², Song Wang², Ismail Ari³, Abhay Mehta² - Show less +3 more•Institutions (3)

Worcester Polytechnic Institute¹, Hewlett-Packard², Özyeğin University³

01 Mar 2010

TL;DR: A novel E-Cube model is demonstrated that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels and a London transit scenario is given to demonstrate the utility and performance of this proposed technology.

...read moreread less

Abstract: Many modern applications including tag based mass transit systems, RFID-based supply chain management systems and online financial feeds require special purpose event stream processing technology to analyze vast amounts of sequential multi-dimensional data available in real-time data feeds. Traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while Complex Event Processing (CEP) systems are designed for sequence detection and do not support OLAP operations. We will demonstrate a novel E-Cube model that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels. A London transit scenario will be given to demonstrate the utility and performance of this proposed technology.

...read moreread less

Proceedings Article•DOI•

Visual cube and on-line analytical processing of images

[...]

Xin Jin¹, Jiawei Han¹, Liangliang Cao¹, Jiebo Luo², Bolin Ding¹, Cindy Xide Lin¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Eastman Kodak Company²

26 Oct 2010

TL;DR: Visual Cube and multi-dimensional OLAP of image collections, such as web images indexed in search engines, product images and photos shared on social networks, are proposed and efficient algorithms are developed to construct Visual Cube.

...read moreread less

Abstract: On-Line Analytical Processing (OLAP) has shown great success in many industry applications, including sales, marketing, management, financial data analysis, etc. In this paper, we propose Visual Cube and multi-dimensional OLAP of image collections, such as web images indexed in search engines (e.g., Google and Bing), product images (e.g. Amazon) and photos shared on social networks (e.g., Facebook and Flickr). It provides online responses to user requests with summarized statistics of image information and handles rich semantics related to image visual features. A clustering structure measure is proposed to help users freely navigate and explore images. Efficient algorithms are developed to construct Visual Cube. In addition, we introduce the new issue of Cell Overlapping in data cube and present efficient solutions for Visual Cube computation and OLAP operations. Extensive experiments are conducted and the results show good performance of our algorithms.

...read moreread less

Journal Article•DOI•

Virtual Data Warehouse Modeling Using Petri Nets for Distributed Decision Making

[...]

Nabendu Chaki, Bidyut Biman Sarkar

31 Jul 2010-Journal of Convergence Information Technology

TL;DR: An analytical model using Petri Net for distributed data management in a data warehouse to ease the OLAP (Online Analytical Processing) operations is proposed and some of the properties like safeness, boundedness, liveness and conservativeness are verified.

...read moreread less

Abstract: A decision maker wants the pool of data at the finger tip while making the decision. In state of the art applications, decision making is no more a centralized process. Distribution of resources is a challenge before the system designers. Besides, for timely analyzing the distributed data, a robust query processing system along with the physical storage with schema definitions are also necessary. In the present state of business process skills, technologies, processes, applications and practices all are falling under the categories of competitive intelligence. In short BI aims to support better business decision-making. In this paper we propose an analytical model using Petri Net for distributed data management in a data warehouse to ease the OLAP (Online Analytical Processing) operations. Some of the properties of the model like safeness, boundedness, liveness and conservativeness are also verified..

...read moreread less

Patent•

OLAP Execution Model Using Relational Operations

[...]

Stefan Dipper, Erich Marschall, Tobias Mindnich, Daniel Baeumges, Christoph Weyerhaeuser - Show less +1 more

22 Nov 2010

TL;DR: In this paper, the authors present an OLAP execution model using relational operations, where the first query is generated by the OLAP and the second query is received in a relational engine coupled to the datastore.

...read moreread less

Abstract: In one embodiment the present invention includes an OLAP execution model using relational operations. In one embodiment, the present invention includes, a method comprising receiving a first query in an online analytic processor (OLAP) executing on one or more computers, the OLAP generating and comprising a model specifying a graph defining a plurality of nodes and a plurality of tiers, each node corresponding to a different operation on data. A second query is generated by the OLAP. The second query includes a plurality of layered subqueries each corresponding to one of the nodes in the graph for specifying the different operations on data. The second query is received in a relational engine coupled to the datastore. The relational engine executes the second query, and in accordance therewith, retrieves data.

...read moreread less

Book Chapter•DOI•

Providing Scalable Database Services on the Cloud

[...]

Chun Chen¹, Gang Chen¹, Dawei Jiang², Beng Chin Ooi², Hoang Tam Vo², Sai Wu², Quanqing Xu² - Show less +3 more•Institutions (2)

Zhejiang University¹, National University of Singapore²

12 Dec 2010

TL;DR: The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively).

...read moreread less

Abstract: The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud. In this paper, we present an overview of our current on-going work in developing epiC - an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC's storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.

...read moreread less

Journal Article•DOI•

A Multidimensional Model for Data Warehouses of Simulation Results

[...]

Hadj Mahboubi, Thierry Faure, Sandro Bimonte, Guillaume Deffuant, Jean-Pierre Chanet, François Pinet - Show less +2 more

01 Jul 2010-International Journal of Agricultural and Environmental Information Systems

TL;DR: A generic multidimensional schema is proposed to analyze the results of a simulation model, which can guide modelers in designing specific data warehouses, and an adaptation of an OLAP client tool to provide an adequate visualization of data is proposed.

...read moreread less

Abstract: This paper examines the multidimensional modeling of a data warehouse for simulation results. Environmental dynamics modeling is used to study complex scenarios like urbanization, climate change and deforestation while allowing decision makers to understand and predict the evolution of the environment in response to potential value changes in a large number of influence variables. In this context, exploring simulation models produces a huge volume of data, which must often be studied extensively at different levels of aggregation due to there being a great need to define tools and methodologies specifically adapted for the storage and analysis of such complex data. Data warehousing systems provide technologies for managing simulation results from different sources. Moreover, OLAP technologies allow one to analyze and compare these results and their corresponding models. In this paper, the authors propose a generic multidimensional schema to analyze the results of a simulation model, which can guide modelers in designing specific data warehouses, and an adaptation of an OLAP client tool to provide an adequate visualization of data. As an example, a data warehouse for the analysis of results produced from a savanna simulation model is implemented using a Relational OLAP architecture.

...read moreread less

Proceedings Article•DOI•

A MapReduceMerge-based Data Cube Construction Method

[...]

Yuxiang Wang¹, Aibo Song¹, Junzhou Luo¹•Institutions (1)

Southeast University¹

01 Nov 2010

TL;DR: A MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP and can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches.

...read moreread less

Abstract: The pre-computation of data cubes is critical to improve the response time of On-Line Analytical Processing (OLAP) system. However, as the size of data grows, the time it takes to construct data cubes becomes a significant performance bottleneck. Therefore, we need the parallel pre-computation approach to further improve the performance of OLAP. Current parallel approaches can be grouped into two categories: work partitioning and data partitioning. But the first one can not guarantee the load balance among processors and the second one produces massive data movement between processors. This paper proposes a MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP. Our method can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches. MapReduceMerge is the expansion of Map Reduce which is a programming model that enables easy development of parallel applications to process massive data on large clusters and it is the key element of Hadoop(an cloud computing framework) which used to support the businesses of Face book under cloud environment. We modify the original MapReduceMerge framework to make it meet the needs of cuboids construction and show the implementation in details through an example of 2-dimension cuboids construction. In the mean time, we discuss the optimization for the construction of multi-dimension cuboids.

...read moreread less

Journal Article•DOI•

Modelling and querying geographical data warehouses

[...]

Joel da Silva¹, Anjolina Grisi de Oliveira¹, Robson do Nascimento Fidalgo¹, Ana Carolina Salgado¹, Valéria Cesário Times¹ - Show less +1 more•Institutions (1)

Federal University of Pernambuco¹

01 Jul 2010-Information Systems

TL;DR: The UML class diagram of a GDW metamodel and its formal specifications are discussed and the Geographical Multidimensional Query Language (GeoMDQL) is proposed, based on well-known standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL.

...read moreread less

Proceedings Article•DOI•

Amalgamating Contextual Information into Recommender System

[...]

Raj Gaurang Tiwari, Mohd. Husain, Bineet Kumar Gupta¹, Anil Agrawal•Institutions (1)

Mizan–Tepi University¹

19 Nov 2010

TL;DR: The integrated contextual information is proposed as the foundation concept of multidimensional recommendation model and the Online Analytical Processing (OLAP) ability of data warehousing is used to solve the contradicting tribulations among hierarchy ratings.

...read moreread less

Abstract: Recommender systems utilize the times of yore experiences and preferences of the target customers as a basis to offer personalized recommendations for them as well as resolve the information overloading hitch. Personalized recommendation methods are primarily classified into content-based recommendation approach and collaborative filtering recommendation approach. Both recommendation approaches have their own advantages, drawbacks and complementarities. Because conventional recommendation techniques don’t consider the contextual information, the real factor why a customer likes a specific product is unable to be understood. Therefore, in reality, it often causes a decrease in the accuracy of the recommendation results and also persuades the recommendation quality. In this paper, we propose the integrated contextual information as the foundation concept of multidimensional recommendation model and use the Online Analytical Processing (OLAP) ability of data warehousing to solve the contradicting tribulations among hierarchy ratings. This work hopes that by establishing additional user profiles and multidimensional analysis to find the key factors affecting user perceptions.

...read moreread less

Collapse