Showing papers on "Online analytical processing published in 2011"

PDF

Open Access

Proceedings Article•DOI•

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

[...]

Alfons Kemper¹, Thomas Neumann¹•Institutions (1)

11 Apr 2011

TL;DR: This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.

...read moreread less

Abstract: The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a main-memory database system that guarantees the ACID properties of OLTP transactions and executes OLAP query sessions (multiple queries) on the same, arbitrarily current and consistent snapshot. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy on update) yields both at the same time: unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark.

...read moreread less

674 citations

Proceedings Article•DOI•

Analytics over large-scale multidimensional data: the big data revolution!

[...]

Alfredo Cuzzocrea¹, Il-Yeol Song², Karen C. Davis³•Institutions (3)

University of Calabria¹, Drexel University², University of Cincinnati³

28 Oct 2011

TL;DR: This paper provides an overview of state-of-the-art research issues and achievements in the field of analytics over big data, and extends the discussion to Analytics over big multidimensional data as well, by highlighting open problems and actual research trends.

...read moreread less

Abstract: In this paper, we provide an overview of state-of-the-art research issues and achievements in the field of analytics over big data, and we extend the discussion to analytics over big multidimensional data as well, by highlighting open problems and actual research trends. Our analytical contribution is finally completed by several novel research directions arising in this field, which plays a leading role in next-generation Data Warehousing and OLAP research.

...read moreread less

321 citations

Proceedings Article•DOI•

Graph cube: on warehousing and OLAP multidimensional networks

[...]

Peixiang Zhao¹, Xiaolei Li², Dong Xin³, Jiawei Han¹•Institutions (3)

University of Illinois at Urbana–Champaign¹, Microsoft², Google³

12 Jun 2011

TL;DR: Graph Cube is introduced, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks and is shown to be a powerful and efficient tool for decision support on large multi-dimensional networks.

...read moreread less

Abstract: We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the so-called multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven to be effective tools for decision support on relational data. However, they are not well-equipped to handle the new yet important multidimensional networks. In this paper, we introduce Graph Cube, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks. By taking account of both attribute aggregation and structure summarization of the networks, Graph Cube goes beyond the traditional data cube model involved solely with numeric value based group-by's, thus resulting in a more insightful and structure-enriched aggregate network within every possible multidimensional space. Besides traditional cuboid queries, a new class of OLAP queries, crossboid, is introduced that is uniquely useful in multidimensional networks and has not been studied before. We implement Graph Cube by combining special characteristics of multidimensional networks with the existing well-studied data cube techniques. We perform extensive experimental studies on a series of real world data sets and Graph Cube is shown to be a powerful and efficient tool for decision support on large multidimensional networks.

...read moreread less

179 citations

Proceedings Article•DOI•

The mixed workload CH-benCHmark

[...]

Richard L. Cole, Florian Funke¹, Leo Giakoumakis², Wey Guy², Alfons Kemper¹, Stefan Krompass¹, Harumi Kuno³, Raghunath Nambiar⁴, Thomas Neumann¹, Meikel Poess⁵, Kai-Uwe Sattler⁶, Michael Seibold¹, Eric Simon, Florian Waas - Show less +10 more•Institutions (6)

Technische Universität München¹, Microsoft², Hewlett-Packard³, Cisco Systems, Inc.⁴, Oracle Corporation⁵, Technische Universität Ilmenau⁶

13 Jun 2011

TL;DR: The definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark, which bridges the gap between the established single-workload suites of TPC-C for OLTP and T PC-H for OLAP, and executes a complex mixed workload.

...read moreread less

Abstract: While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to the definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark. This benchmark bridges the gap between the established single-workload suites of TPC-C for OLTP and TPC-H for OLAP, and executes a complex mixed workload: a transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite run in parallel on the same tables in a single database system. As it is derived from these two most widely used TPC benchmarks, the CH-benCHmark produces results highly relevant to both hybrid and classic single-workload systems.

...read moreread less

133 citations

Journal Article•DOI•

Aggregated estimating equation estimation

[...]

Nan Lin¹, Ruibin Xi²•Institutions (2)

Washington University in St. Louis¹, Harvard University²

01 Jan 2011-Statistics and Its Interface

TL;DR: A computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy that is strongly consistent and asymptotically equivalent to the EE estimator.

...read moreread less

Abstract: Motivated by the recent active research on online analytical processing (OLAP), we develop a computation and storage efficient algorithm for estimating equation (EE) estimation in massive data sets using a “divide-and-conquer” strategy. In each partition of the data set, we compress the raw data into some low dimensional statistics and then discard the raw data. Then, we obtain an approximation to the EE estimator, the aggregated EE (AEE) estimator, by solving an equation aggregated from the saved low dimensional statistics in all partitions. Such low dimensional statistics are taken as the EE estimates and first-order derivatives of the estimating equations in each partition. We show that, under proper partitioning and some regularity conditions, the AEE estimator is strongly consistent and asymptotically equivalent to the EE estimator. A major application of the AEE technique is to support fast OLAP of EE estimations for data warehousing technologies such as data cubes and data streams. It can also be used to reduce the computation time and conquer the memory constraint problem posed by massive data sets. Simulation studies show that the AEE estimator provides efficient storage and remarkable deduction in computational time, especially in its applications to data cubes and data streams.

...read moreread less

129 citations

Journal Article•DOI•

Anonymous Publication of Sensitive Transactional Data

[...]

Gabriel Ghinita¹, Panos Kalnis², Yufei Tao³•Institutions (3)

Purdue University¹, King Abdullah University of Science and Technology², The Chinese University of Hong Kong³

01 Feb 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes two categories of novel anonymization methods based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH) and two data transformations that capture the correlation in the underlying data: reduction to a band matrix and Gray encoding-based sorting.

...read moreread less

Abstract: Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and l-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.

...read moreread less

97 citations

Proceedings Article•DOI•

ES 2 : A cloud data storage system for supporting both OLTP and OLAP

[...]

Yu Cao¹, Chun Chen², Fei Guo¹, Dawei Jiang¹, Yuting Lin¹, Beng Chin Ooi¹, Hoang Tam Vo¹, Sai Wu¹, Quanqing Xu¹ - Show less +5 more•Institutions (2)

National University of Singapore¹, Zhejiang University²

11 Apr 2011

TL;DR: ES2 - the elastic data storage system of epiC, which is designed to support both functionalities within the same storage, and experimental results which demonstrate the efficiency of the system.

...read moreread less

Abstract: Cloud computing represents a paradigm shift driven by the increasing demand of Web based applications for elastic, scalable and efficient system architectures that can efficiently support their ever-growing data volume and large-scale data analysis. A typical data management system has to deal with real-time updates by individual users, and as well as periodical large scale analytical processing, indexing, and data extraction. While such operations may take place in the same domain, the design and development of the systems have somehow evolved independently for transactional and periodical analytical processing. Such a system-level separation has resulted in problems such as data freshness as well as serious data storage redundancy. Ideally, it would be more efficient to apply ad-hoc analytical processing on the same data directly. However, to the best of our knowledge, such an approach has not been adopted in real implementation. Intrigued by such an observation, we have designed and implemented epiC, an elastic power-aware data-itensive Cloud platform for supporting both data intensive analytical operations (ref. as OLAP) and online transactions (ref. as OLTP). In this paper, we present ES2 - the elastic data storage system of epiC, which is designed to support both functionalities within the same storage. We present the system architecture and the functions of each system component, and experimental results which demonstrate the efficiency of the system.

...read moreread less

96 citations

Proceedings Article•DOI•

E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing

[...]

Mo Liu¹, Elke A. Rundensteiner¹, Kara Greenfield¹, Chetan Gupta², Song Wang², Ismail Ari³, Abhay Mehta² - Show less +3 more•Institutions (3)

Worcester Polytechnic Institute¹, Hewlett-Packard², Özyeğin University³

12 Jun 2011

TL;DR: This work proposes a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels, and designs a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E- Cube hierarchy execution.

...read moreread less

Abstract: Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. Existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for sequence detection do not support OLAP operations. We propose a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Our analysis of the interrelationships in both concept abstraction and pattern refinement among queries facilitates the composition of these queries into an integrated E-Cube hierarchy. Based on this E-Cube hierarchy, strategies of drill-down (refinement from abstract to more specific patterns) and of roll-up (generalization from specific to more abstract patterns) are developed for the efficient workload evaluation. Our proposed execution strategies reuse intermediate results along both the concept and the pattern refinement relationships between queries. Based on this foundation, we design a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E-Cube hierarchy execution. Our experimental studies comparing alternate strategies on a real world financial data stream under different workload conditions demonstrate the superiority of the Chase method. In particular, our Chase execution in many cases performs ten fold faster than the state-of-the art strategy for real stock market query workloads.

...read moreread less

78 citations

Proceedings Article•DOI•

Transforming statistical linked data for use in OLAP systems

[...]

Benedikt Kämpgen¹, Andreas Harth¹•Institutions (1)

Karlsruhe Institute of Technology¹

07 Sep 2011

TL;DR: An extract-transform-load (ETL) pipeline is used to convert statistical Linked Data into a format suitable for loading into an open-source OLAP system, and thus it is demonstrated how standard OLAP infrastructure can be used for elaborate querying and visualisation of integrated statistical Linking Data.

...read moreread less

Abstract: The amount of available Linked Data on the Web is increasing, and data providers start to publish statistical datasets that comprise numerical data. Such statistical datasets differ significantly from the currently predominant network-style data published on the Web. We explore the possibility of integrating statistical data from multiple Linked Data sources. We provide a mapping from statistical Linked Data into the Multidimensional Model used in data warehouses. We use an extract-transform-load (ETL) pipeline to convert statistical Linked Data into a format suitable for loading into an open-source OLAP system, and thus demonstrate how standard OLAP infrastructure can be used for elaborate querying and visualisation of integrated statistical Linked Data. We discuss lessons learned from three experiments and identify areas which require future work to ultimately arrive at a well-interlinked set of statistical data from multiple sources which is processable with standard OLAP systems.

...read moreread less

75 citations

Journal Article•DOI•

myOLAP: An Approach to Express and Evaluate OLAP Preferences

[...]

Matteo Golfarelli¹, Stefano Rizzi¹, P. Biondi¹•Institutions (1)

University of Bologna¹

01 Jul 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents myOLAP, an approach for expressing and evaluating OLAP preferences, devised by taking into account the three peculiarities of the OLAP domain, and proposes an algorithm called WeSt that relies on a novel graph representation where two types of domination between sets of facts may be expressed.

...read moreread less

Abstract: Multidimensional databases are the core of business intelligence systems. Their users express complex OLAP queries, often returning large volumes of facts, sometimes providing little or no information. Thus, expressing preferences could be highly valuable in this domain. The OLAP domain is representative of an unexplored class of preference queries, characterized by three peculiarities: preferences can be expressed on both numerical and categorical domains; they can also be expressed on the aggregation level of facts; the space on which preferences are expressed includes both elemental and aggregated facts. In this paper, we present myOLAP, an approach for expressing and evaluating OLAP preferences, devised by taking into account the three peculiarities above. We first propose a preference algebra where users are enabled to express their preferences, besides on attributes and measures, also on the aggregation level of facts, for instance, by stating that monthly data are preferred to yearly and daily data. Then, with respect to preference evaluation, we propose an algorithm called WeSt that relies on a novel graph representation where two types of domination between sets of facts may be expressed, which considerably improves efficiency. The approach is extensively tested for efficiency and effectiveness on real data, and compared against two other approaches in the literature.

...read moreread less

73 citations

Proceedings Article•

Towards a One Size Fits All Database Architecture.

[...]

Jens Dittrich¹, Alekh Jindal¹•Institutions (1)

Saarland University¹

01 Jan 2011

TL;DR: This work proposes a new type of database system coined OctopusDB, which uses a logical event log as its primary storage structure and introduces the concept of Storage Views (SV), i.e. secondary, alternative physical data representations covering all or subsets of the primary log.

...read moreread less

Abstract: We propose a new type of database system coined OctopusDB. Our approach suggests a unified, one size fits all data processing architecture for OLTP, OLAP, streaming systems, and scan-oriented database systems. OctopusDB radically departs from existing architectures in the following way: it uses a logical event log as its primary storage structure. To make this approach efficient we introduce the concept of Storage Views (SV), i.e. secondary, alternative physical data representations covering all or subsets of the primary log. OctopusDB (1) allows us to use different types of SVs for different subsets of the data; and (2) eliminates the need to use different types of database systems for different applications. Thus, based on the workload, OctopusDB emulates different types of systems (row stores, column stores, streaming systems, and more importantly, any hybrid combination of these). This is a feature impossible to achieve with traditional DBMSs.

...read moreread less

Patent•

Data extraction and testing method and system

[...]

Narendar Yalamanchilli

12 May 2011

TL;DR: In this article, the authors present a testing framework that automates the querying, extraction and loading of test data into a test result database from plurality of data sources and application interfaces using source specific adaptors.

...read moreread less

Abstract: The present method and apparatus provides for automated testing of data integration and business intelligence projects using Extract, Load and Validate (ELV) architecture. The method and computer program product provides a testing framework that automates the querying, extraction and loading of test data into a test result database from plurality of data sources and application interfaces using source specific adaptors. The test data available for extraction using the adaptors include metadata such as the database query generated by the OLAP Tools that are critical to validate the changes in business intelligence systems. A validation module helps define validation rules for verifying the test data loaded into the test result database. The validation module further provides a framework for comparing the test data with previously archived test data as well as benchmark test data.

...read moreread less

Patent•

Automatic synthesis and presentation of OLAP cubes from semantically enriched data sources

[...]

Daniel Paul Miranker

11 Jul 2011

TL;DR: In this paper, the authors present a system that simplifies the creation of multidimensional OLAP models from one or more semantically enabled data sources, including web-enabled OLAP interfaces.

...read moreread less

Abstract: This system comprises methods that simplify the creation of multidimensional OLAP models from one or more semantically enabled data sources. The system also comprises methods enabling interoperability between existing OLAP end-user interfaces, the system's representation of OLAP and the underlying data sources. This includes web-enabled OLAP interfaces.

...read moreread less

Patent•

Hybrid oltp and olap high performance database system

[...]

Alfons Kemper¹, Thomas Neumann¹•Institutions (1)

Technische Universität München¹

04 Apr 2011

TL;DR: In this article, a hybrid OLTP and OLAP database is maintained by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data, where the updated data object is accessible for OLTP transactions, while the non-updated data object remains accessible for OO queries.

...read moreread less

Abstract: There is provided a method of maintaining a hybrid OLTP and OLAP database, the method comprising: executing one or more OLTP transactions; creating a virtual memory snapshot; and executing one or more OLAP queries using the virtual memory snapshot. Preferably, the method further comprises replicating a virtual memory page on which a data object is stored in response to an update to the data object, whereby the updated data object is accessible for OLTP transactions, while the non-updated data object remains accessible for OLAP queries. Accordingly, the present invention provides a hybrid systemthat can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. Fig. 3

...read moreread less

Patent•

Computer method and system for combining OLTP database and OLAP database environments

[...]

David Edward Tewksbary¹•Institutions (1)

Dassault Systèmes¹

07 Dec 2011

TL;DR: In this paper, an OLTP database server adapts to receive a query according to the mode of operation (e.g., read or update) of the client application and the synchronization status of the OLAP database server.

...read moreread less

Abstract: A computer system provides access to both an online transaction processing (OLTP) database server and an online analytics processing (OLAP) database server. The computer system includes a client application adapted to receive a query. According to (a) mode of operation (e.g., read or update) of the client application and (b) synchronization status of the OLAP database server, the client application redirects the query to the OLTP database server or to the OLAP database server. The client application redirects the query to the OLTP database server when the mode of operation is other than a read-only operation or the synchronization status is "unsynchronized". The client application redirects the query to the OLAP database server when the mode of operation is a read-only operation and the synchronization status is "synchronized". The computer system further includes an OLTP application server (e.g., Enovia V6) comprising an OLTP adapter and an OLAP adapter. The OLAP adapter is formed of a mapping component adapted to map data between OLTP semantics and OLAP semantics.

...read moreread less

Book Chapter•DOI•

Efficient topological OLAP on information networks

[...]

Qiang Qu¹, Feida Zhu¹, Xifeng Yan², Jiawei Han³, Philip S. Yu⁴, Hongyan Li⁵ - Show less +2 more•Institutions (5)

Peking University¹, Singapore Management University², University of California, Santa Barbara³, University of Illinois at Urbana–Champaign⁴, University of Illinois at Chicago⁵

22 Apr 2011

TL;DR: Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization and this work provides a T-OLAP query processing framework into which these techniques are weaved.

...read moreread less

Abstract: We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called "TOLAP"), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these techniques are weaved. To the best of our knowledge, this is the first work to give a framework study for topological OLAP on information networks. Experimental results demonstrate both the effectiveness and efficiency of our proposed framework.

...read moreread less

Journal Article•DOI•

Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach

[...]

Alfredo Cuzzocrea¹, Elisa Bertino²•Institutions (2)

University of Calabria¹, Purdue University²

01 Nov 2011-Journal of Computer and System Sciences

TL;DR: A novel Secure Multiparty Computation (SMC)-based privacy preserving OLAP framework for distributed collections of XML documents is proposed, which has many novel features ranging from nice theoretical properties to an effective and efficient protocol, called Secure Distributed OLAP aggregation protocol (SDO).

...read moreread less

Proceedings Article•DOI•

Reduce, You Say: What NoSQL Can Do for Data Aggregation and BI in Large Repositories

[...]

Laurent Bonnet, Anne Laurent, Michel Sala, Bénédicte Laurent, Nicolas Sicard - Show less +1 more

29 Aug 2011

TL;DR: It is shown how NoSQL databases such as MongoDB and its key-value stores, thanks to the native MapReduce algorithm, can provide an efficient framework to aggregate large volumes of data.

...read moreread less

Abstract: Data aggregation is one of the key features used in databases, especially for Business Intelligence (e.g., ETL, OLAP) and analytics/data mining. When considering SQL databases, aggregation is used to prepare and visualize data for deeper analyses. However, these operations are often impossible on very large volumes of data regarding memory-and-time-consumption. In this paper, we show how NoSQL databases such as MongoDB and its key-value stores, thanks to the native MapReduce algorithm, can provide an efficient framework to aggregate large volumes of data. We provide basic material about the MapReduce algorithm, the different NoSQL databases (read intensive vs. write intensive). We investigate how to efficiently modelize the data framework for BI and analytics. For this purpose, we focus on read intensive NoSQL databases using MongoDB and we show how NoSQL and MapReduce can help handling large volumes of data.

...read moreread less

Book Chapter•DOI•

Retrieving accurate estimates to OLAP queries over uncertain and imprecise multidimensional data streams

[...]

Alfredo Cuzzocrea¹•Institutions (1)

University of Calabria¹

20 Jul 2011

TL;DR: A novel framework for estimating OLAP queries over uncertain and imprecise multidimensional data streams is introduced, along with a probabilistic data stream model that exploits "natural" features of OLAP data, such as the presence of clusters and high correlations.

...read moreread less

Abstract: In this paper, we introduce a novel framework for estimating OLAP queries over uncertain and imprecise multidimensional data streams, along with three relevant research contributions: (i) a probabilistic data stream model, which describes both precise and imprecise multidimensional data stream readings in terms of nice confidence-interval-based Probability Distribution Functions (PDF); (ii) a possible-world semantics for uncertain and imprecise multidimensional data streams, which is based on an innovative data-driven approach that exploits "natural" features of OLAP data, such as the presence of clusters and high correlations; (iii) an innovative approach for providing theoretically-founded estimates to OLAP queries over uncertain and imprecise multidimensional data streams that exploits the well-recognized probabilistic estimators theory.

...read moreread less

Posted Content•

Business Intelligence for Small and Middle-Sized Entreprises

[...]

Oksana Grabova¹, Jérôme Darmont¹, Jean-Hugues Chauchat¹, Iryna Zolotaryova²•Institutions (2)

University of Lyon¹, Kharkiv National University of Economics²

01 Feb 2011-arXiv: Databases

TL;DR: The existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making are discussed.

...read moreread less

Abstract: Data warehouses are the core of decision support sys- tems, which nowadays are used by all kind of enter- prises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt ex- isting solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architec- tures and tools (hardware and software) providing on- line data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also re- view in-memory processing. Consequently, this paper discusses the existing approa- ches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.

...read moreread less

Book Chapter•DOI•

Mining preferences from OLAP query logs for proactive personalization

[...]

Julien Aligon¹, Matteo Golfarelli², Patrick Marcel¹, Stefano Rizzi², Elisa Turricchia² - Show less +1 more•Institutions (2)

François Rabelais University¹, University of Bologna²

20 Sep 2011

TL;DR: A proactive approach that couples an MDX-based language for expressing OLAP preferences to a mining technique for automatically deriving preferences is proposed, which proves the effectiveness and efficiency of the approach.

...read moreread less

Abstract: The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout. In the OLAP context personalization is quite beneficial, because queries can be very complex and they may return huge amounts of data. Aimed at making the user's experience with OLAP as plain as possible, in this paper we propose a proactive approach that couples an MDX-based language for expressing OLAP preferences to a mining technique for automatically deriving preferences. First, the log of past MDX queries issued by that user is mined to extract a set of association rules that relate sets of frequent query fragments; then, given a specific query, a subset of pertinent and effective rules is selected; finally, the selected rules are translated into a preference that is used to annotate the user's query. A set of experimental results proves the effectiveness and efficiency of our approach.

...read moreread less

Journal Article•DOI•

Query Recommendations for OLAP Discovery-Driven Analysis

[...]

Arnaud Giacometti¹, Patrick Marcel¹, Elsa Negre¹, Arnaud Soulet¹•Institutions (1)

François Rabelais University¹

01 Apr 2011-International Journal of Data Warehousing and Mining

TL;DR: A framework for a recommender system for OLAP users that leverages former users' investigations to enhance discovery-driven analysis and is implemented in a system that uses the open source Mondrian server and recommends MDX queries.

...read moreread less

Abstract: Recommending database queries is an emerging and promising field of research and is of particular interest in the domain of OLAP systems, where the user is left with the tedious process of navigating large datacubes. In this paper, the authors present a framework for a recommender system for OLAP users that leverages former users' investigations to enhance discovery-driven analysis. This framework recommends the discoveries detected in former sessions that investigated the same unexpected data as the current session. This task is accomplished by 1 analysing the query log to discover pairs of cells at various levels of detail for which the measure values differ significantly, and 2 analysing a current query to detect if a particular pair of cells for which the measure values differ significantly can be related to what is discovered in the log. This framework is implemented in a system that uses the open source Mondrian server and recommends MDX queries. Preliminary experiments were conducted to assess the quality of the recommendations in terms of precision and recall, as well as the efficiency of their on-line computation.

...read moreread less

Patent•

Method and system for validating data

[...]

Xue C. Li¹, Xiao J. Fu¹, Xue F. Gao¹, Xin Xin¹•Institutions (1)

IBM¹

23 Feb 2011

TL;DR: In this paper, a method and system for validating data is presented, where a data cube is generated by transforming the warehouse data via an OLAP transformation model, and a reference dataset (S) is generated from the source data.

...read moreread less

Abstract: A method and system for validating data. Warehouse data is generated by transforming source data via an ETL transformation model. A data cube is generated by transforming the warehouse data via an OLAP transformation model. A report dataset (MDS1) is generated from the data cube. A reference dataset (S) is generated from the source data. Whether MDS1 matches S is determined. If MDS1 doesn't match S, then an OLAP inverse transformation is performed on MDS1 to generate an OLAP dataset (MDS2) and whether MDS2 matches S is determined. If MDS1 doesn't match S and MDS2 does not match S, then an ETF inverse transformation is performed on MDS2 to generate an ETL dataset (MDS3) and whether MDS2 matches MDS1 and whether MDS3 matches S is determined. If MDS1 doesn't match S and MDS2 does not match S and MDS3 does not match S, then whether MDS3 matches MDS2 is determined.

...read moreread less

Proceedings Article•DOI•

Building cubes with MapReduce

[...]

Alberto Abelló¹, Jaume Ferrarons¹, Oscar Romero¹•Institutions (1)

Polytechnic University of Catalonia¹

28 Oct 2011

TL;DR: This paper explores the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts and the comparison of three different approaches to retrieve data cubes from BigTable by means of Map Reduce.

...read moreread less

Abstract: In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g., BigTable) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Indeed, specific software tools to exploit a cloud are also here. The trend in this case is toward using tools based on the MapReduce paradigm developed by Google. In this paper, we explore the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts. Our main contribution is the comparison of three different approaches to retrieve data cubes from BigTable by means of MapReduce and the definition of criteria to choose among them.

...read moreread less

Patent•

System and Method of Relating Data and Generating Reports

[...]

Jie Zhao, Bin Dong, Yingyu Chen, Xin Xu

14 Mar 2011

TL;DR: In this article, the authors present a computer implemented method of relating data and generating reports, which includes storing, by an OLAP system, a network data structure that relates a plurality of data objects.

...read moreread less

Abstract: In one embodiment the present invention includes a computer implemented method of relating data and generating reports. The method includes storing, by an OLAP system, a network data structure that relates a plurality of data objects. The method further includes storing transactional data in an in-memory database in the OLAP system. The method further includes generating, by the OLAP system, a report using the stored transactional data according to the network data structure. In this manner, deficiencies of the traditional star schema paradigm of data warehousing may be avoided.

...read moreread less

Patent•

Apparatus, systems and methods for data storage and/or retrieval based on a database model-agnostic, schema-agnostic and workload-agnostic data storage and access models

[...]

Duncan Gunther Pauly

06 Apr 2011

TL;DR: In this article, a database access model and storage structure that efficiently support concurrent OLTP and OLAP activity independently of the data model or schema used, is described, and the storage structure and access model presented avoid the need to design schemas for particular workloads or query patterns.

...read moreread less

Abstract: A database access model and storage structure that efficiently support concurrent OLTP and OLAP activity independently of the data model or schema used, are described. The storage structure and access model presented avoid the need to design schemas for particular workloads or query patterns and avoid the need to design or implement indexing to support specific queries. Indeed, the access model presented is independent of the database model used and can equally support relational, object and hierarchical models amongst others.

...read moreread less

Patent•

System and method for graphically distinguishing levels of a multidimensional database

[...]

Huifang Wang¹, Lina Clover¹, Walden B. Crabtree¹, Douglas R. Dotson¹•Institutions (1)

SAS Institute¹

18 Jan 2011

TL;DR: In this paper, a system and methods for graphically distinguishing levels from a multidimensional database are described, such as associating two or more of database's levels with a plurality of different visual indicators.

...read moreread less

Abstract: In accordance with the teachings described herein, systems and methods are provided for graphically distinguishing levels from a multidimensional database. Levels from a multidimensional database are distinguished, such as by associating two or more of database's levels with a plurality of different visual indicators.

...read moreread less

Journal Article•DOI•

Linguistic query answering on data cubes with time dimension

[...]

Rita Castillo-Ortega¹, Nicolás Marín¹, Daniel Sánchez•Institutions (1)

University of Granada¹

01 Oct 2011-Journal of intelligent systems

TL;DR: A methodology for providing linguistic answers to queries involving the comparison of time series obtained from data cubes with time dimension, based on linguistically quantified statements and pointwise definitions of the degree and sign of local change is proposed.

...read moreread less

Abstract: In this paper, we propose a methodology for providing linguistic answers to queries involving the comparison of time series obtained from data cubes with time dimension. Time series related to events which are interesting for the user are obtained by querying data cubes using OnLine Analytical Processing (OLAP) operations on the time dimension. The comparison of these query results can be summarized so that an appropriate short linguistic description of the series is provided to the user. Our approach is based on linguistically quantified statements and pointwise definitions of the degree and sign of local change. Our linguistic summaries are well suited to be included in an interface layer of a data warehouse system, improving the quality of human-machine interaction and the understandability of the results. © 2011 Wiley Periodicals, Inc. © 2011 Wiley Periodicals, Inc.

...read moreread less

Journal Article•DOI•

Supporting real-time supply chain decisions based on RFID data streams

[...]

Damianos Chatziantoniou¹, Katerina Pramatari¹, Yannis Sotiropoulos¹•Institutions (1)

Athens University of Economics and Business¹

01 Apr 2011-Journal of Systems and Software

TL;DR: This paper claims that a spreadsheet-like query model, where formulation is done in a column-wise fashion, can express intuitively a large class of useful and practical RFDM queries and proposes a simple SQL extension to do that and shows how these queries can be evaluated efficiently.

...read moreread less

Proceedings Article•DOI•

How to efficiently snapshot transactional data: hardware or software controlled?

[...]

Henrik Mühe¹, Alfons Kemper¹, Thomas Neumann¹•Institutions (1)

Technische Universität München¹

13 Jun 2011

TL;DR: An in-memory database system that separates transaction processing from OLAP query processing via periodically refreshed snapshots is designed, so that OLAP queries can be executed without any synchronization and OLTP transaction processing follows the lock-free, mostly serial processing paradigm of H-Store.

...read moreread less

Abstract: The quest for real-time business intelligence requires executing mixed transaction and query processing workloads on the same current database state. However, as Harizopoulos et al. [6] showed for transactional processing, co-execution using classical concurrency control techniques will not yield the necessary performance -- even in re-emerging main memory database systems. Therefore, we designed an in-memory database system that separates transaction processing from OLAP query processing via periodically refreshed snapshots. Thus, OLAP queries can be executed without any synchronization and OLTP transaction processing follows the lock-free, mostly serial processing paradigm of H-Store [8]. In this paper, we analyze different snapshot mechanisms: Hardware-supported Page Shadowing, which lazily copies memory pages when changed by transactions, software controlled Tuple Shadowing, which generates a new version when a tuple is modified, software controlled Twin Tuple, which constantly maintains two versions of each tuple and HotCold Shadowing, which effectively combines Tuple Shadowing and hardware-supported Page Shadowing by clustering update-intensive objects. We evaluate their performance based on the mixed workload CH-BenCHmark which combines the TPC-C and the TPC-H benchmarks on the same database schema and state.

...read moreread less

Collapse