Showing papers on "Online analytical processing published in 2013"

PDF

Open Access

Patent•

System and method for data mining and security policy management

[...]

Ratinder Paul Singh Ahuja¹, Faizel Lakhani¹•Institutions (1)

16 May 2013

TL;DR: In this article, the authors present a method for generating a query for a database for information stored in the database and then generating an Online Analytical Processing (OLAP) element to represent information received from the query.

...read moreread less

Abstract: A method is provided in one example and includes generating a query for a database for information stored in the database. The information relates to data discovered through a capture system. The method further includes generating an Online Analytical Processing (OLAP) element to represent information received from the query. A rule based on the OLAP element is generated and the rule affects data management for one or more documents that satisfy the rule. In more specific embodiments, the method further includes generating a capture rule that defines items the capture system should capture. The method also includes generating a discovery rule that defines objects the capture system should register. In still other embodiments, the method includes developing a policy based on the rule, where the policy identifies how one or more documents are permitted to traverse a network.

...read moreread less

147 citations

Proceedings Article•DOI•

Data warehousing and OLAP over big data: current challenges and future research directions

[...]

Alfredo Cuzzocrea¹, Ladjel Bellatreche, Il-Yeol Song²•Institutions (2)

University of Calabria¹, Drexel University²

28 Oct 2013

TL;DR: Open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data are highlighted and several novel research directions arising in this field are derived.

...read moreread less

Abstract: In this paper, we highlight open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data, an emerging term in Data Warehousing and OLAP research. We also derive several novel research directions arising in this field, and put emphasis on possible contributions to be achieved by future research efforts.

...read moreread less

120 citations

Proceedings Article•DOI•

Big data: a research agenda

[...]

Alfredo Cuzzocrea¹, Domenico Saccà¹, Jeffrey D. Ullman²•Institutions (2)

University of Calabria¹, Stanford University²

09 Oct 2013

TL;DR: Three important aspects of Big Data research are discussed, namely OLAP over Big Data, Big Data Posting, and Privacy of Big data, and future research directions are depicted, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

...read moreread less

Abstract: Recently, a great deal of interest for Big Data has risen, mainly driven from a widespread number of research problems strongly related to real-life applications and systems, such as representing, modeling, processing, querying and mining massive, distributed, large-scale repositories (mostly being of unstructured nature). Inspired by this main trend, in this paper we discuss three important aspects of Big Data research, namely OLAP over Big Data, Big Data Posting, and Privacy of Big Data. We also depict future research directions, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

...read moreread less

113 citations

Book Chapter•DOI•

Process cubes : slicing, dicing, rolling up and drilling down event data for process mining

[...]

Wil M. P. van der Aalst¹, Wil M. P. van der Aalst², Wil M. P. van der Aalst³•Institutions (3)

Queensland University of Technology¹, Eindhoven University of Technology², National Research University – Higher School of Economics³

29 Aug 2013

TL;DR: This paper proposes the notion of process cubes where events and process models are organized using different dimensions and each cell in the process cube corresponds to a set of events and can be used to discover a process model, to check conformance with respect to some process models, or to discover bottlenecks.

...read moreread less

Abstract: Recent breakthroughs in process mining research make it possible to discover, analyze, and improve business processes based on event data. The growth of event data provides many opportunities but also imposes new challenges. Process mining is typically done for an isolated well-defined process in steady-state. However, the boundaries of a process may be fluid and there is a need to continuously view event data from different angles. This paper proposes the notion of process cubes where events and process models are organized using different dimensions. Each cell in the process cube corresponds to a set of events and can be used to discover a process model, to check conformance with respect to some process model, or to discover bottlenecks. The idea is related to the well-known OLAP (Online Analytical Processing) data cubes and associated operations such as slice, dice, roll-up, and drill-down. However, there are also significant differences because of the process-related nature of event data. For example, process discovery based on events is incomparable to computing the average or sum over a set of numerical values. Moreover, dimensions related to process instances (e.g. cases are split into gold and silver customers), subprocesses (e.g. acquisition versus delivery), organizational entities (e.g. backoffice versus frontoffice), and time (e.g., 2010, 2011, 2012, and 2013) are semantically different and it is challenging to slice, dice, roll-up, and drill-down process mining results efficiently.

...read moreread less

95 citations

Journal Article•DOI•

Finding the Needle in the Big Data Systems Haystack

[...]

Tim Kraska¹•Institutions (1)

Brown University¹

01 Jan 2013-IEEE Internet Computing

TL;DR: With the increasing importance of big data, many new systems have been developed to "solve" the big data challenge but at the same time, famous database researchers argue that there is nothing new about these systems and that they're actually a step backward.

...read moreread less

Abstract: With the increasing importance of big data, many new systems have been developed to "solve" the big data challenge. At the same time, famous database researchers argue that there is nothing new about these systems and that they're actually a step backward. This article sheds some light on this discussion.

...read moreread less

79 citations

Proceedings Article•DOI•

Analytics over Big Data: Exploring the Convergence of DataWarehousing, OLAP and Data-Intensive Cloud Infrastructures

[...]

Alfredo Cuzzocrea¹•Institutions (1)

University of Calabria¹

22 Jul 2013

TL;DR: This paper explores the convergence of Data Warehousing, OLAP and data-intensive Cloud Infrastructures in the context of so-called analytics over Big Data.

...read moreread less

Abstract: This paper explores the convergence of Data Warehousing, OLAP and data-intensive Cloud Infrastructures in the context of so-called analytics over Big Data The paper briefly reviews some state-of-the-art proposals, highlights open research issues and, finally, it draws possible research directions in this scientific field

...read moreread less

76 citations

Journal Article•DOI•

A hybrid OLAP-association rule mining based quality management system for extracting defect patterns in the garment industry

[...]

C.K.H. Lee¹, King Lun Tommy Choy¹, G. T. S. Ho¹, Kwai-Sang Chin², Kris M. Y. Law¹, Ying Kei Tse³ - Show less +2 more•Institutions (3)

Hong Kong Polytechnic University¹, City University of Hong Kong², University of York³

01 Jun 2013-Expert Systems With Applications

TL;DR: A hybrid OLAP-association rule mining based quality management system (HQMS) to extract defect patterns in the garment industry and the results indicate that the HQMS contributes significantly to the formulation of quality improvement in the industry.

...read moreread less

Abstract: In today's garment industry, garment defects have to be minimized so as to fulfill the expectations of demanding customers who seek products of high quality but low cost. However, without any data mining tools to manage massive data related to quality, it is difficult to investigate the hidden patterns among defects which are important information for improving the quality of garments. This paper presents a hybrid OLAP-association rule mining based quality management system (HQMS) to extract defect patterns in the garment industry. The mined results indicate the relationship between defects which serves as a reference for defect prediction, root cause identification and the formulation of proactive measures for quality improvement. Because real-time access to desirable information is crucial for survival under the severe competition, the system is equipped with Online Analytical Processing (OLAP) features so that manufacturers are able to explore the required data in a timely manner. The integration of OLAP and association rule mining allows data mining to be applied on a multidimensional basis. A pilot run of the HQMS is undertaken in a garment manufacturing company to demonstrate how OLAP and association rule mining are effective in discovering patterns among product defects. The results indicate that the HQMS contributes significantly to the formulation of quality improvement in the industry.

...read moreread less

57 citations

Journal Article•DOI•

Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes

[...]

Cristina Dutra de Aguiar Ciferri¹, Ricardo Rodrigues Ciferri¹, Leticia I. Gómez², Markus Schneider³, Alejandro A. Vaisman⁴, Esteban Zimányi⁴ - Show less +2 more•Institutions (4)

University of São Paulo¹, Instituto Tecnológico de Buenos Aires², University of Florida³, Université libre de Bruxelles⁴

01 Apr 2013-International Journal of Data Warehousing and Mining

TL;DR: A user-centric conceptual model for data warehouses and OLAP systems, called the Cube Algebra, that takes the cube metaphor literally and provides the knowledge worker with high-level cube objects and related concepts is proposed.

...read moreread less

Abstract: The lack of an appropriate conceptual model for data warehouses and OLAP systems has led to the tendency to deploy logical models for example, star, snowflake, and constellation schemas for them as conceptual models. ER model extensions, UML extensions, special graphical user interfaces, and dashboards have been proposed as conceptual approaches. However, they introduce their own problems, are somehow complex and difficult to understand, and are not always user-friendly. They also require a high learning curve, and most of them address only structural design, not considering associated operations. Therefore, they are not really an improvement and, in the end, only represent a reflection of the logical model. The essential drawback of offering this system-centric view as a user concept is that knowledge workers are confronted with the full and overwhelming complexity of these systems as well as complicated and user-unfriendly query languages such as SQL OLAP and MDX. In this article, the authors propose a user-centric conceptual model for data warehouses and OLAP systems, called the Cube Algebra. It takes the cube metaphor literally and provides the knowledge worker with high-level cube objects and related concepts. A novel query language leverages well known high-level operations such as roll-up, drill-down, slice, and drill-across. As a result, the logical and physical levels are hidden from the unskilled end user.

...read moreread less

49 citations

Book Chapter•DOI•

OLAP*: Effectively and Efficiently Supporting Parallel OLAP over Big Data

[...]

Alfredo Cuzzocrea¹, Rim Moussa², Guandong Xu³•Institutions (3)

University of Calabria¹, Tunis University², University of Technology, Sydney³

25 Sep 2013

TL;DR: This paper investigates solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and proposes the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark T PC-H.

...read moreread less

Abstract: In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.

...read moreread less

46 citations

Book Chapter•DOI•

No Size Fits All – Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views

[...]

Benedikt Kämpgen¹, Andreas Harth¹•Institutions (1)

Karlsruhe Institute of Technology¹

26 May 2013

TL;DR: A empirical argument for the need of an OLAP-to-SPARQL engines for analytical query capabilities in industry is given, and the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube are measured.

...read moreread less

Abstract: Statistics published as Linked Data promise efficient extraction, transformation and loading (ETL) into a database for decision support. The predominant way to implement analytical query capabilities in industry are specialised engines that translate OLAP queries to SQL queries on a relational database using a star schema (ROLAP). A more direct approach than ROLAP is to load Statistical Linked Data into an RDF store and to answer OLAP queries using SPARQL. However, we assume that general-purpose triple stores – just as typical relational databases – are no perfect fit for analytical workloads and need to be complemented by OLAP-to-SPARQL engines. To give an empirical argument for the need of such an engine, we first compare the performance of our generated SPARQL and of ROLAP SQL queries. Second, we measure the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube.

...read moreread less

44 citations

Journal Article•DOI•

Ontological approach to enhance results of business process mining and analysis

[...]

Wirat Jareevongpiboon¹, Paul Janecek•Institutions (1)

Asian Institute of Technology¹

10 Nov 2013-Business Process Management Journal

TL;DR: Semantically enriching process execution data can successfully raise analysis from the syntactic to the semantic level, and enable multiple perspectives of analysis on business processes.

...read moreread less

Abstract: Purpose – The purpose of this paper is to propose a solution to the problem of a lack of machine processable semantics in business process management.Design/methodology/approach – The paper introduces a methodology that combines domain and company‐specific ontologies and databases to obtain multiple levels of abstraction for process mining and analysis. The authors valuated this approach with a real case study from the apparel domain, using a prototype system and techniques developed in the Process Mining Framework (ProM). The results of this approach are compared with similar research.Findings – Semantically enriching process execution data can successfully raise analysis from the syntactic to the semantic level, and enable multiple perspectives of analysis on business processes. Combining this approach with complementary research in semantic business process management (SBPM) can provide results comparable to multidimensional analysis in data warehouse and on line analytical processing (OLAP) technologi...

...read moreread less

Patent•

OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform

[...]

Yansong Zhang, Shan Wang

20 Nov 2013

Journal Article•

Processing in the Hybrid OLTP & OLAP Main-Memory Database System HyPer.

[...]

Alfons Kemper, Thomas Neumann, Jan Finis, Florian Funke, Viktor Leis, Henrik Mühe, Tobias Mühlbauer, Wolf Rödiger - Show less +4 more

01 Jan 2013-IEEE Data(base) Engineering Bulletin

TL;DR: The HyPerScript transaction programming language, the mainmemory indexing technique ART, which is decisive for high transaction processing performance, and HyPer’s transaction management that allows heterogeneous workloads consisting of short pre-canned transactions, OLAP-style queries, and long interactive transactions are surveyed.

...read moreread less

Abstract: Two emerging hardware trends have re-initiated the development of in-core database systems: ever increasing main-memory capacities and vast multi-core parallel processing power. Main-memory capacities of several TB allow to retain all transactional data of even the largest applications in-memory on one (or a few) servers. The vast computational power in combination with low data management overhead yields unprecedented transaction performance which allows to push transaction processing (away from application servers) into the database server and still “leaves room” for additional query processing directly on the transactional data. Thereby, the often postulated goal of real-time business intelligence, where decision makers have access to the latest version of the transactional state, becomes feasible. In this paper we will survey the HyPerScript transaction programming language, the mainmemory indexing technique ART, which is decisive for high transaction processing performance, and HyPer’s transaction management that allows heterogeneous workloads consisting of short pre-canned transactions, OLAP-style queries, and long interactive transactions.

...read moreread less

Proceedings Article•DOI•

OLAPing social media: the case of Twitter

[...]

Nafees Ur Rehman¹, Andreas Weiler¹, Marc H. Scholl¹•Institutions (1)

University of Konstanz¹

25 Aug 2013

TL;DR: This work attempts to extend the established OLAP technology to allow multidimensional analysis of social media data by integrating text and opinion mining methods into the data warehousing system and by exploiting various knowledge discovery techniques to deal with semi-structured and unstructured data from social media.

...read moreread less

Abstract: Social networks are platforms where millions of users interact frequently and share variety of digital content with each other. Users express their feelings and opinions on every topic of interest. These opinions carry import value for personal, academic and commercial applications, but the volume and the speed at which these are produced make it a challenging task for researchers and the underlying technologies to provide useful insights to such data. We attempt to extend the established OLAP(On-line Analytical Processing) technology to allow multidimensional analysis of social media data by integrating text and opinion mining methods into the data warehousing system and by exploiting various knowledge discovery techniques to deal with semi-structured and unstructured data from social media. The capabilities of OLAP are extended by semantic enrichment of the underlying dataset to discover new measures and dimensions for building data cubes and by supporting up-to-date analysis of the evolving as well as the historical social media data. The benefits of such an analysis platform are demonstrated by building a data warehouse for a social network of Twitter, dynamically enriching the underlying dataset and enabling multidimensional analysis.

...read moreread less

Proceedings Article•DOI•

ScyPer: elastic OLAP throughput on transactional data

[...]

Tobias Mühlbauer¹, Wolf Rödiger¹, Angelika Reiser¹, Alfons Kemper¹, Thomas Neumann¹ - Show less +1 more•Institutions (1)

Technische Universität München¹

23 Jun 2013

TL;DR: ScyPer is presented, a Scale-out of the authors' HyPer main memory database system that horizontally scales out on shared-nothing hardware, aimed at sustaining the superior OLTP throughput of a single HyPer server, and providing elastic OLAP throughput by provisioning additional servers on-demand, e.g., in the Cloud.

...read moreread less

Abstract: Ever increasing main memory sizes and the advent of multi-core parallel processing have fostered the development of in-core databases. Even the transactional data of large enterprises can be retained in-memory on a single server. Modern in-core databases like our HyPer system achieve best-of-breed OLTP throughput that is sufficient for the lion's share of applications. Remaining server resources are used for OLAP query processing on the latest transactional data, i.e., real-time business analytics. While OLTP performance of a single server is sufficient, an increasing demand for OLAP throughput can only be satisfied economically by a scale-out.In this work we present ScyPer, a Scale-out of our HyPer main memory database system that horizontally scales out on shared-nothing hardware. With ScyPer we aim at (i) sustaining the superior OLTP throughput of a single HyPer server, and (ii) providing elastic OLAP throughput by provisioning additional servers on-demand, e.g., in the Cloud.

...read moreread less

Proceedings Article•DOI•

Business intelligence solutions in healthcare a case study: Transforming OLTP system to BI solution

[...]

Osama Ali¹, Ali Bou Nassif¹, Luiz Fernando Capretz¹•Institutions (1)

University of Western Ontario¹

19 Jun 2013

TL;DR: Current state-of-the-art B.I components (tools) are discussed and hospitals advances in their businesses by using B.i solutions through focusing on inter-relationship of business needs and the IT technologies are outlined.

...read moreread less

Abstract: Healthcare environment is growing to include not only the traditional information systems, but also a business intelligence platform. For executive leaders, consultants, and analysts, there is no longer a need to spend hours in design and develop of typical reports or charts, the entire solution can be completed through using Business Intelligence “BI” software. This paper discusses current state-of-the-art B.I components (tools) and outlines hospitals advances in their businesses by using B.I solutions through focusing on inter-relationship of business needs and the IT technologies. We also present a case study that illustrates of transforming a traditional online transactional processing (OLTP) system towards building an online analytical processing (OLAP) solution.

...read moreread less

Book Chapter•DOI•

Comparative Process Mining in Education: An Approach Based on Process Cubes

[...]

Wmp Wil van der Aalst¹, Wmp Wil van der Aalst², S Shengnan Guo¹, Pierre Gorissen³•Institutions (3)

Eindhoven University of Technology¹, National Research University – Higher School of Economics², Fontys University of Applied Sciences³

30 Aug 2013

TL;DR: This paper proposes comparative process mining using process cubes, and focuses on educational data, which allows for the comparison of students watching video lectures given by the first author and differences between male and female students, between different parts of the course, and between Dutch students and international students.

...read moreread less

Abstract: Process mining techniques enable the analysis of a wide variety of processes using event data. For example, event logs can be used to automatically learn a process model (e.g., a Petri net or BPMN model). Next to the automated discovery of the real underlying process, there are process mining techniques to analyze bottlenecks, to uncover hidden inefficiencies, to check compliance, to explain deviations, to predict performance, and to guide users towards “better” processes. Dozens (if not hundreds) of process mining techniques are available and their value has been proven in many case studies. However, existing techniques focus on the analysis of a single process rather than the comparison of different processes. In this paper, we propose comparative process mining using process cubes. An event has attributes referring to the dimensions of the process cube. Through slicing, dicing, rolling-up, and drilling-down we can view event data from different angles and produce process mining results that can be compared. To illustrate the process cube concept, we focus on educational data. In particular, we analyze data of students watching video lectures given by the first author. The dimensions of the process cube allow us to compare the process of students that passed the course versus the process of students that failed. We can also analyze differences between male and female students, between different parts of the course, and between Dutch students and international students. The initial analysis provided in this paper is used to elicit requirements for better tool support facilitating comparative process mining.

...read moreread less

Journal Article•DOI•

Storing and analysing voice of the market data in the corporate data warehouse

[...]

Lisette García-Moya¹, Shahad Kudama¹, María José Aramburu¹, Rafael Berlanga¹•Institutions (1)

James I University¹

01 Jul 2013-Information Systems Frontiers

TL;DR: A multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse and a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented.

...read moreread less

Abstract: Web opinion feeds have become one of the most popular information sources users consult before buying products or contracting services. Negative opinions about a product can have a high impact in its sales figures. As a consequence, companies are more and more concerned about how to integrate opinion data in their business intelligence models so that they can predict sales figures or define new strategic goals. After analysing the requirements of this new application, this paper proposes a multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse. Then, a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented. In this method, Wikipedia is used as the main knowledge resource, together with some well-known lexicons of opinion words and other corporate data and metadata stores describing the company products like, for example, technical specifications and user manuals. The resulting information system allows users to perform new analysis tasks by using the traditional OLAP-based data warehouse operators. We have developed a case study over a set of real opinions about digital devices which are offered by a wholesale dealer. Over this case study, the quality of the extracted sentiment data is evaluated, and some query examples that illustrate the potential uses of the integrated model are provided.

...read moreread less

Proceedings Article•DOI•

CXT-cube: contextual text cube model and aggregation operator for text OLAP

[...]

Lamia Oukid¹, Ounas Asfari², Fadila Bentayeb², Nadjia Benblidia¹, Omar Boussaid² - Show less +1 more•Institutions (2)

University of Blida¹, University of Lyon²

28 Oct 2013

TL;DR: A contextual text cube model denoted CXT-Cube is proposed which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data.

...read moreread less

Abstract: Traditional data warehousing technologies and On-Line Analytical Processing (OLAP) are unable to analyze textual data. Moreover, as OLAP queries of a decision-maker are generally related to a context, contextual information must be taken into account during the exploitation of data warehouses. Thus, we propose a contextual text cube model denoted CXT-Cube which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data. CXT-Cube is characterized by several contextual dimensions, each one related to a contextual factor. In addition, we extend our aggregation OLAP operator for textual data ORank (OLAP-Rank) to consider all the contextual factors defined in our CXT-Cube model. To validate our model, we perform an experimental study and the preliminary results show the importance of our approach for integrating textual data into a data warehouse and improving the decision-making.

...read moreread less

Book Chapter•DOI•

A text cube approach to human, social and cultural behavior in the twitter stream

[...]

Xiong Liu, Kaizhi Tang, Jeffrey T. Hancock¹, Jiawei Han², Mitchell Song, Roger Xu, Bob Pokorny - Show less +3 more•Institutions (2)

Cornell University¹, University of Illinois at Urbana–Champaign²

02 Apr 2013

TL;DR: A text cube approach to studying different kinds of human, social and cultural behavior (HSCB) embedded in the Twitter stream is discussed, including public sentiment in a U.S. city and political sentiment in the Arab Spring.

...read moreread less

Abstract: Twitter is a microblogging website that has been useful as a source for human social behavioral analysis, such as political sentiment analysis, user influence, and spread of news. In this paper, we discuss a text cube approach to studying different kinds of human, social and cultural behavior (HSCB) embedded in the Twitter stream. Text cube is a new way to organize data (e.g., Twitter text) in multiple dimensions and multiple hierarchies for efficient information query and visualization. With the HSCB measures defined in a cube, users are able to view statistical reports and perform online analytical processing. Along with viewing and analyzing Twitter text using cubes and charts, we have also added the capability to display the contents of the cube on a heat map. The degree of opacity is directly proportional to the value of the behavioral, social or cultural measure. This kind of map allows the analyst to focus attention on hotspots of concern in a region of interest. In addition, the text cube architecture supports the development of data mining models using the data taken from cubes. We provide several case studies to illustrate the text cube approach, including public sentiment in a U.S. city and political sentiment in the Arab Spring.

...read moreread less

ScyPer: A Hybrid OLTP&OLAP Distributed Main Memory Database System for Scalable Real-Time Analytics.

[...]

Tobias Mühlbauer, Wolf Rödiger, Angelika Reiser, Alfons Kemper, Thomas Neumann¹ - Show less +1 more•Institutions (1)

Technische Universität München¹

01 Jan 2013

TL;DR: This demo shows that ScyPer achieves a near-linear scale-out of OLAP query throughput with the number of active nodes, sustains a constant OLTP throughput, and offers real-time analytical capabilities through market-leading query response times and periodically forked TX-consistent virtual memory snapshots with sub-second lifetime durations.

...read moreread less

Abstract: ScyPer is an abbreviation for Scaled-out HyPer, a version of the HyPer main memory hybrid OLTP&OLAP database system that horizontally scales out on sharednothing commodity hardware. Our demo shows that ScyPer a) achieves a near-linear scale-out of OLAP query throughput with the number of active nodes, b) sustains a constant OLTP throughput, c) is resilient to node failures, and d) offers real-time analytical capabilities through market-leading query response times and periodically forked TX-consistent virtual memory snapshots with sub-second lifetime durations.

...read moreread less

Proceedings Article•DOI•

An automatic physical design tool for clustered column-stores

[...]

Alexander Rasin¹, Stan Zdonik²•Institutions (2)

DePaul University¹, Brown University²

18 Mar 2013

TL;DR: Results are presented for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads and the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views.

...read moreread less

Abstract: Good database design is typically a very difficult and costly process. As database systems get more complex and as the amount of data under management grows, the stakes increase accordingly. Past research produced a number of design tools capable of automatically selecting secondary indexes and materialized views for a known workload. However, a significant bulk of research on automated database design has been done in the context of row-store DBMSes. While this work has produced effective design tools, new specialized database architectures demand a rethinking of automated design algorithms.In this paper, we present results for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads. In particular, we have chosen a commercial column store DBMS that supports data sorting. In this setting, the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views. This paper describes our automatic design algorithms as well as the results of some experiments using it on realistic data sets.

...read moreread less

Book Chapter•DOI•

Predicting Your Next OLAP Query Based on Recent Analytical Sessions

[...]

Marie-Aude Aufaure¹, Nicolas Kuchmann-Beauger¹, Patrick Marcel², Stefano Rizzi³, Yves Vanrompay¹ - Show less +1 more•Institutions (3)

École Centrale Paris¹, François Rabelais University², University of Bologna³

26 Aug 2013

TL;DR: This paper proposes a framework to predict the most likely next query and recommend this to the user based on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric.

...read moreread less

Abstract: In Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments.

...read moreread less

Proceedings Article•DOI•

ProtOLAP: rapid OLAP prototyping with on-demand data supply

[...]

Sandro Bimonte, Elodie Edoh-Alove, Hassan Nazih¹, Myoung-Ah Kang², Stefano Rizzi³ - Show less +1 more•Institutions (3)

Blaise Pascal University¹, Centre national de la recherche scientifique², University of Bologna³

28 Oct 2013

TL;DR: ProtOLAP is proposed, a tool-assisted fast prototyping methodology that enables quick and reliable test and validation of data warehouse schemata in situations where data supply is collected on users' demand and users' ICT skills are minimal.

...read moreread less

Abstract: The approaches to data warehouse design are based on the assumption that source data are known in advance and available. While this assumption is true in common project situations, in some peculiar contexts it is not. This is the case of the French national project for analysis of energetic agricultural farms, that is the case study of this paper. Here, the above-mentioned methods can hardly be applied because source data can only be identified and collected once user requirements indicate a need. Besides, the users involved in this project found it very hard to express their analysis needs in abstract terms, i.e., without visualizing sample results of queries, which in turn would require availability of source data. To solve this deadlock we propose ProtOLAP, a tool-assisted fast prototyping methodology that enables quick and reliable test and validation of data warehouse schemata in situations where data supply is collected on users' demand and users' ICT skills are minimal. To this end, users manually feed sample realistic data into a prototype created by designers, then they access and explore these sample data using pivot tables to validate the prototype.

...read moreread less

Proceedings Article•DOI•

A distributed tree data structure for real-time OLAP on cloud architectures

[...]

Frank Dehne¹, Q. Kong², Andrew Rau-Chaplin², Hamidreza Zaboli¹, R. Zhou¹ - Show less +1 more•Institutions (2)

Carleton University¹, Dalhousie University²

23 Dec 2013

TL;DR: CR-OLAP is introduced, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors.

...read moreread less

Abstract: In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.

...read moreread less

Journal Article•DOI•

Integration of sentiment analysis into customer relational model: The Importance of Feature Ontology and Synonym

[...]

Mohd Ridzwan Yaakub¹, Yuefeng Li¹, Jinglan Zhang¹•Institutions (1)

Queensland University of Technology¹

01 Jan 2013

TL;DR: In this paper, a multi-dimensional model for opinion mining is proposed, which integrates customers' characteristics and their opinion about any products, and the model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location.

...read moreread less

Abstract: Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.

...read moreread less

Patent•

On-line analytical processing (OLAP) massive multidimensional data dimension storage method

[...]

Song Aibo, He Zhanguo, Luo Junzhou

20 Mar 2013

TL;DR: In this paper, an on-line analytical processing (OLAP) massive multidimensional data dimension storage method is presented, where dimensions are divided according to dimensions, dimension hierarchical encoding is built, a high definition (HD) File dimension storage file structure is designed, only relevant dimension corresponding data needs to be accessed for aggregation calculation, and therefore retrieval of unrelated data is avoided.

...read moreread less

Abstract: The invention discloses an on-line analytical processing (OLAP) massive multidimensional data dimension storage method. Firstly, OLAP multidimensional data are divided according to dimensions, dimension hierarchical encoding is built, a high definition (HD) File dimension storage file structure is designed, only relevant dimension corresponding data needs to be accessed for aggregation calculation, and therefore retrieval of unrelated data is avoided; secondly, a B+ tree index based on the dimension hierarchical encoding is built for rapid positioning of the dimension storage data, and therefore input (I)/output (O) overhead is saved; and at last, a high-efficiency parallel query algorithm is designed, and OLAP query efficiency is further improved. Therefore, the OLAP massive multidimensional data dimension storage method which is high in efficiency, easy to use and scalable is provided for massive data analysis application for scientific experimental statistics, environmental meteorology, bioinformatics computing and the like.

...read moreread less

Proceedings Article•

QPPT: Query Processing on Prefix Trees

[...]

Thomas Kissinger¹, Benjamin Schlegel¹, Dirk Habich¹, Wolfgang Lehner¹•Institutions (1)

Dresden University of Technology¹

03 May 2013

TL;DR: The indexed table-at-a-time processing model allows the ecient construction of composed operators like the multi-way-select-join-group and speed up the processing of complex OLAP queries so that the approach outperforms state-of-the-art in-memory databases.

...read moreread less

Abstract: Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that oer a balanced read/write performance. The indexed tableat-a-time processing model allows the ecient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases.

...read moreread less

Proceedings Article•DOI•

EventCube: multi-dimensional search and mining of structured and text data

[...]

Fangbo Tao¹, Kin Hou Lei¹, Jiawei Han¹, ChengXiang Zhai¹, Xiao Cheng¹, Marina Danilevsky¹, Nihit Desai¹, Bolin Ding¹, Jing Ge Ge¹, Heng Ji², Rucha Kanade¹, Anne Kao, Qi Li², Yanen Li¹, Cindy Xide Lin¹, Jialu Liu¹, Nikunj C. Oza, Ashok N. Srivastava, Rod Tjoelker, Chi Wang¹, Duo Zhang¹, Bo Zhao¹ - Show less +18 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, City University of New York²

11 Aug 2013

TL;DR: This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data.

...read moreread less

Abstract: A large portion of real world data is either text or structured (e.g., relational) data. Moreover, such data objects are often linked together (e.g., structured specification of products linking with the corresponding product descriptions and customer comments). Even for text data such as news data, typed entities can be extracted with entity extraction tools. The EventCube project constructs TextCube and TopicCube from interconnected structured and text data (or from text data via entity extraction and dimension building), and performs multidimensional search and analysis on such datasets, in an informative, powerful, and user-friendly manner. This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data. The system has high potential to be extended in many powerful ways and serve as a general platform for search, OLAP (online analytical processing) and data mining on integrated text and structured data. After the system demo in the conference, the system will be put on the web for public access and evaluation.

...read moreread less

Book Chapter•DOI•

Multi-Dimensional Data Modelling for a Tourism Destination Data Warehouse

[...]

Wolfram Höpken¹, Wolfram Höpken², Matthias Fuchs², Gerhard Höll¹, Dimitri Keil², Maria Lexhagen² - Show less +2 more•Institutions (2)

University of Applied Sciences Ravensburg-Weingarten¹, Mid Sweden University²

01 Jan 2013

TL;DR: The prototypical implementation of the DMIS focussing on the data warehouse and OLAP functionalities for customer feedback processes proved the suitability and effectiveness of the proposed overall architecture.

...read moreread less

Abstract: Information and communication technologies (ICTS) play a crucial role to increase the knowledge base of destination stakeholders. Organisational learning and managerial effectiveness can particularly be enhanced by applying methods of business intelligence (BI). Although huge amounts of data are available in tourism destinations these valuable knowledge sources typically remain unused. The described problem is solved by conceptualizing, prototypically implementing and testing a novel destination management information system (DMIS) that applies methods of BI and data warehousing for the leading Swedish ski destination, Are. As being a central DMIS component, the destination-wide data warehouse (DW), its underlying multi-dimensional data model, the technical architecture, as well as critical implementation issues are discussed. Finally, the prototypical implementation of the DMIS focussing on the data warehouse and OLAP functionalities for customer feedback processes proved the suitability and effectiveness of the proposed overall architecture.

...read moreread less

Collapse