Showing papers on "Online analytical processing published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads

[...]

Björn Daase¹, Lars Jonas Bollmeier¹, Lawrence Benson¹, Tilmann Rabl¹•Institutions (1)

09 Jun 2021

TL;DR: In this article, the authors show that PMEM is suitable for large, read-heavy OLAP workloads with an average query runtime slowdown of 1.66x compared to DRAM.

...read moreread less

Abstract: Modern database systems for online analytical processing (OLAP) typically rely on in-memory processing. Keeping all active data in DRAM severely limits the data capacity and makes larger deployments much more expensive than disk-based alternatives. Byte-addressable persistent memory (PMEM) is an emerging storage technology that bridges the gap between slow-but-cheap SSDs and fast-but-expensive DRAM. Thus, research and industry have identified it as a promising alternative to pure in-memory data warehouses. However, recent work shows that PMEM's performance is strongly dependent on access patterns and does not always yield good results when simply treated like DRAM. To characterize PMEM's behavior in OLAP workloads, we systematically evaluate PMEM on a large, multi-socket server commonly used for OLAP workloads. Our evaluation shows that PMEM can be treated like DRAM for most read access but must be used differently when writing. To support our findings, we run the Star Schema Benchmark on PMEM and DRAM. We show that PMEM is suitable for large, read-heavy OLAP workloads with an average query runtime slowdown of 1.66x compared to DRAM. Following our evaluation, we present 7 best practices on how to maximize PMEM's bandwidth utilization in future system designs.

...read moreread less

33 citations

Proceedings Article•DOI•

MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems

[...]

Lin Ma¹, William Zhang¹, Jie Jiao¹, Wuwen Wang¹, Matthew Butrovich¹, Wan Shen Lim¹, Prashanth Menon¹, Andrew Pavlo¹ - Show less +4 more•Institutions (1)

Carnegie Mellon University¹

09 Jun 2021

TL;DR: ModelBot2 as mentioned in this paper is an end-to-end framework for constructing and maintaining prediction models using machine learning (ML) in self-driving DBMSs, which decomposes a DBMS's architecture into fine-grained operating units that make it easier to estimate the system's behavior for configurations that it has never seen before.

...read moreread less

Abstract: Database management systems (DBMSs) are notoriously difficult to deploy and administer. The goal of a self-driving DBMS is to remove these impediments by managing itself automatically. However, a critical problem in achieving full autonomy is how to predict the DBMS's runtime behavior and resource consumption. These predictions guide a self-driving DBMS's decision-making components to tune and optimize all aspects of the system. We present the ModelBot2 end-to-end framework for constructing and maintaining prediction models using machine learning (ML) in self-driving DBMSs. Our approach decomposes a DBMS's architecture into fine-grained operating units that make it easier to estimate the system's behavior for configurations that it has never seen before. ModelBot2 then provides an offline execution environment to exercise the system to produce the training data used to train its models. We integrated ModelBot2 in an in-memory DBMS and measured its ability to predict its performance for OLTP and OLAP workloads running in dynamic environments. We also compare ModelBot2 against state-of-the-art ML models and show that our models are up to 25x more accurate in multiple scenarios.

...read moreread less

22 citations

Proceedings Article•DOI•

Greenplum: A Hybrid Database for Transactional and Analytical Workloads

[...]

Zhenghua Lyu¹, Huan Hubert Zhang¹, Gang Xiong¹, Gang Guo¹, Haozhou Wang¹, Jinbao Chen¹, Asim Praveen¹, Yu Yang¹, Xiaoming Gao¹, Alexandra Wang¹, Wen Lin¹, Ashwin Agrawal¹, Junfeng Yang¹, Hao Wu¹, Xiaoliang Li¹, Feng Guo¹, Jiang Wu¹, Jesse Zhang¹, Venkatesh Raghavan¹ - Show less +15 more•Institutions (1)

VMware¹

09 Jun 2021

TL;DR: In this paper, the authors propose a global deadlock detector to increase the concurrency of query processing and a one-phase commit to speed up query processing for OLTP queries.

...read moreread less

Abstract: Demand for enterprise data warehouse solutions to support real-time Online Transaction Processing (OLTP) queries as well as long-running Online Analytical Processing (OLAP) workloads is growing. Greenplum database is traditionally known as an OLAP data warehouse system with limited ability to process OLTP workloads. In this paper, we augment Greenplum into a hybrid system to serve both OLTP and OLAP workloads. The challenge we address here is to achieve this goal while maintaining the ACID properties with minimal performance overhead. In this effort, we identify the engineering and performance bottlenecks such as the under-performing restrictive locking and the two-phase commit protocol. Next we solve the resource contention issues between transactional and analytical queries. We propose a global deadlock detector to increase the concurrency of query processing. When transactions that update data are guaranteed to reside on exactly one segment we introduce one-phase commit to speed up query processing. Our resource group model introduces the capability to separate OLAP and OLTP workloads into more suitable query processing mode. Our experimental evaluation on the TPC-B and CH-benCHmark benchmarks demonstrates the effectiveness of our approach in boosting the OLTP performance without sacrificing the OLAP performance.

...read moreread less

18 citations

Journal Article•DOI•

Requirements-driven data warehouse design based on enhanced pivot tables

[...]

Sandro Bimonte, Leandro Antonelli¹, Stefano Rizzi²•Institutions (2)

National University of La Plata¹, University of Bologna²

01 Mar 2021-Requirements Engineering

TL;DR: The requirements analysis process is iterative and relies on both unstructured and structured interviews; particular attention is given to enable the design of irregular multidimensional schemata, which are often present in real-world DWs but can hardly be understood by unskilled users.

...read moreread less

Abstract: The design of data warehouses (DWs) is based on both their data sources and users’ requirements. The more closely the DW multidimensional schema reflects the stakeholders’ needs, the more effectively they will make use of the DW content for their OLAP analyses. Thus, considerable attention has been given in the literature to DW requirements analysis, including requirements elicitation, specification and validation. Unfortunately, traditional approaches are based on complex formalisms that cannot be used with decision makers who have no previous experience with DWs and OLAP. This forces a sharp separation between elicitation and specification. To cope with this problem, we propose a new requirements analysis process where pivot tables, a well-known representation for multidimensional data often used by decision makers, are enhanced to be used both for elicitation and as a specification formalism. A pivot table is a two-dimensional spreadsheet that supports the analyses of multidimensional data by nesting several dimensions on the x- or y-axis and displaying data on multiple pages. The requirements analysis process we propose is iterative and relies on both unstructured and structured interviews; particular attention is given to enable the design of irregular multidimensional schemata, which are often present in real-world DWs but can hardly be understood by unskilled users. Finally, we validate our proposal using a real case study in the biodiversity domain.

...read moreread less

13 citations

Proceedings Article•DOI•

Performance Characterization of HTAP Workloads

[...]

Utku Sirin¹, Sandhya Dwarkadas², Anastasia Ailamaki¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Rochester²

01 Jan 2021

TL;DR: In this article, the authors show that the OLTP throughput drops by up to 42% due to sharing the hardware resources in a hybrid transactional and analytical processing (HTAP) system.

...read moreread less

Abstract: Hybrid Transactional and Analytical Processing (HTAP) systems have become popular in the past decade. HTAP systems allow running transactional and analytical processing workloads on the same data and hardware. As a result, they suffer from workload interference. Despite the large body of existing work in HTAP systems and architectures, none of the existing work has systematically analyzed workload interference for HTAP systems.In this work, we characterize workload interference for HTAP systems. We show that the OLTP throughput drops by up to 42% due to sharing the hardware resources. Partitioning the last-level cache (LLC) among the OLTP and OLAP workloads can significantly improve the OLTP throughput without hurting the OLAP throughput. The OLAP throughput is significantly reduced due to sharing the data. The OLAP execution time is exponentially increased if the OLTP workload generates fresh tuples faster than the HTAP system propagates them. Therefore, in order to minimize the workload interference, HTAP systems should isolate the OLTP and OLAP workloads in the shared hardware resources and should allocate enough resources to fresh tuple propagation to propagate the fresh tuples faster than they are generated.

...read moreread less

11 citations

Proceedings Article•DOI•

JSON Tiles: Fast Analytics on Semi-Structured Data

[...]

Dominik Durner¹, Viktor Leis², Thomas Neumann¹•Institutions (2)

Technische Universität München¹, University of Jena²

09 Jun 2021

TL;DR: In this article, the authors present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed by automatically detecting the most important keys and extracting them transparently.

...read moreread less

Abstract: Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.

...read moreread less

10 citations

Journal Article•DOI•

MR-MVPP: A map-reduce-based approach for creating MVPP in data warehouses for big data applications

[...]

Hossein Azgomi¹, Mohammad Karim Sohrabi¹•Institutions (1)

Islamic Azad University¹

01 Sep 2021-Information Sciences

TL;DR: The MR-MVPP (Map-Reduce-based construction of the MVPP) is the proposed method to address the problem of selecting an appropriate set of views to be materialized to speed up analytical query processing of data warehouses.

...read moreread less

10 citations

Journal Article•DOI•

Influence of Strategic Interrelationships and Decision-Making in Chilean Port Networks on Their Degree of Sustainability

[...]

Claudia A. Durán, Fredi Edgardo Palominos, Raúl Carrasco, Eduardo Carrillo

02 Apr 2021-Sustainability

TL;DR: The conclusions state that the development of this green technology requires cultural changes, public policy initiatives and the incorporation of new actors and more research is needed in this area to identify other relevant sustainable variables.

...read moreread less

Abstract: An extensive literary review is carried out to determine the strategic and business advantages, and difficulties that non-smart ports must face to develop sustainability. Based on a two-port case study, the strategic text of the corporate missions of port administrators and operators is analyzed and classified in order to understand to what extend economic, social and environmental aspects are fulfilled. A conceptual model is designed for an information system based on indicators that can determine the state or degree of sustainability in the critical operational activities of the ports studied. A system is proposed that is based on a data warehouse core and a multidimensional database, which can be implemented in the ROLAP mode, allowing taking advantage of the good characteristics of relational databases without losing the OLAP approach. A discussion of the strategic feasibility of implementing this conceptual model of case study monitoring and its long-term benefits is delivered. The conclusions state that the development of this green technology requires cultural changes, public policy initiatives and the incorporation of new actors. In addition, more research is needed in this area to identify other relevant sustainable variables.

...read moreread less

9 citations

Proceedings Article•DOI•

Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload

[...]

Johan Kok Zhi Kang¹, Gaurav, Sien Yi Tan, Feng Cheng, Shixuan Sun¹, Bingsheng He¹ - Show less +2 more•Institutions (1)

National University of Singapore¹

09 Jun 2021

TL;DR: Prestroid as discussed by the authors is a tree convolution based data science pipeline that accurately predicts resource consumption patterns of query traces, but at a much lower cost than traditional deep learning models.

...read moreread less

Abstract: The use of deep learning models for forecasting the resource consumption patterns of SQL queries have recently been a popular area of study. While these models have demonstrated promising accuracy, training them over large scale industry workloads are expensive. Space inefficiencies of encoding techniques over large numbers of queries and excessive padding used to enforce shape consistency across diverse query plans implies 1) longer model training time and 2) the need for expensive, scaled up infrastructure to support batched training. In turn, we developed Prestroid, a tree convolution based data science pipeline that accurately predicts resource consumption patterns of query traces, but at a much lower cost. We evaluated our pipeline over 19K Presto OLAP queries, on a data lake of more than 20PB of data from Grab. Experimental results imply that our pipeline outperforms benchmarks on predictive accuracy, contributing to more precise resource prediction for large-scale workloads, yet also reduces per-batch memory footprint by 13.5x and per-epoch training time by 3.45x. We demonstrate direct cost savings of up to 13.2x for large batched model training over Microsoft Azure VMs.

...read moreread less

9 citations

Journal Article•DOI•

Designing business intelligence (BI) for production, distribution and customer services: a case study of a UAE-based organization.

[...]

Mohammed T. Nuseir

01 Feb 2021-Business Process Management Journal

TL;DR: This study highlights the process of BI in the production, distribution and customer services based on the National Food Products Company (NFPC) in the United Arab Emirates (UAE) and refers to graphical illustrations of the business needs and the organization's target key performance indicators (KPI).

...read moreread less

Abstract: Business intelligence (BI) is a strategic approach that can use analytical tools to collect and integrate information, apply business rules and ensure the appropriate visible output of organizational information This study aims to present the design and implementation of BI in areas of business process improvement for production, distribution and customer services,This study highlights the process of BI in the production, distribution and customer services based on the National Food Products Company (NFPC) in the United Arab Emirates (UAE) This study discusses the step-by-step development process of BI and refers to graphical illustrations of the business needs and the organization's target key performance indicators (KPI),Based on the business needs and chosen KPIs to maximize production and improve distribution and customer services the BI tool shows that the “star scheme” is the most appropriate one Relational Online Analytical Processing (ROLAP) based on Mondrian system is employed as Online Analytical Processing (OLAP) architecture since the NFPC's technological infrastructure was better adapted to this vision The analysis starts with data retrieval from two databases' customer' and production and distribution databases Finally, visualization and reporting processes that respect the end-users improve the NFPC's decisions,The study will help other organizations, BI developers, data warehouses (DW) developers and administrators, project managers as well as academic researchers understand how to develop a successful BI framework and implement BI based on business needs,This is a unique and original study on the BI experience from a UAE-based organization and will encourage other organizations to apply BI in their business process

...read moreread less

8 citations

Journal Article•DOI•

Knowledge Graph OLAP: A Multidimensional Model and Query Operations for Contextualized Knowledge Graphs

[...]

Christoph G. Schuetz¹, Loris Bozzato², Bernd Neumayr¹, Michael Schrefl¹, Luciano Serafini² - Show less +1 more•Institutions (2)

Johannes Kepler University of Linz¹, fondazione bruno kessler²

01 Jan 2021-Social Work

TL;DR: This paper decomposes the roll-up operation from traditional OLAP into a merge and an abstraction operation, which corresponds to the selection of knowledge from different contexts whereas abstraction replaces entities with more general entities.

...read moreread less

Abstract: A knowledge graph (KG) represents real-world entities and their relationships. The represented knowledge is often context-dependent, leading to the construction of contextualized KGs. The multidimensional and hierarchical nature of context invites comparison with the OLAP cube model from multidimensional data analysis. Traditional systems for online analytical processing (OLAP) employ multidimensional models to represent numeric values for further analysis using dedicated query operations. In this paper, along with an adaptation of the OLAP cube model for KGs, we introduce an adaptation of the traditional OLAP query operations for the purposes of performing analysis over KGs. In particular, we decompose the roll-up operation from traditional OLAP into a merge and an abstraction operation. The merge operation corresponds to the selection of knowledge from different contexts whereas abstraction replaces entities with more general entities. The result of such a query is a more abstract, high-level view – a management summary – of the knowledge.

...read moreread less

Journal Article•DOI•

A dataspace-based framework for OLAP analyses in a high-variety multistore

[...]

Chiara Forresi¹, Enrico Gallinucci¹, Matteo Golfarelli¹, Hamdi Ben Hamadou²•Institutions (2)

University of Bologna¹, Aalborg University²

31 Jul 2021

TL;DR: This paper proposes an approach to support data analysis within a high-variety multistore, with heterogeneous schemas and overlapping records, by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying DBMSs.

...read moreread less

Abstract: The success of NoSQL DBMSs has pushed the adoption of polyglot storage systems that take advantage of the best characteristics of different technologies and data models. While operational applications take great benefit from this choice, analytical applications suffer the absence of schema consistency, not only between different DBMSs but within a single NoSQL system as well. In this context, the discipline of data science is steering analysts away from traditional data warehousing and toward a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper, we propose an approach to support data analysis within a high-variety multistore, with heterogeneous schemas and overlapping records. Our approach supports relational, document, wide-column, and key-value data models by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying DBMSs. The expressiveness we enable corresponds to GPSJ queries, which are the most common class of queries in OLAP applications. We rely on nested relational algebra to define a cross-database execution plan. The system has been prototyped on Apache Spark.

...read moreread less

Journal Article•DOI•

A Semi-Automatic Design Methodology for (Big) Data Warehouse Transforming Facts into Dimensions

[...]

Lucile Sautot¹, Sandro Bimonte, Ludovic Journaux•Institutions (1)

Agro ParisTech¹

01 Jan 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A theoretical view of the proposed methodology supported by a case study, an implemented prototype and a complete evaluation based on a standard benchmark are presented.

...read moreread less

Abstract: A decision support system is used by decision makers for a long time. But, in some cases, the originally designed multidimensional schema does not cover the entire needs of decision makers, which can change over time. One such unfulfilled need, is using facts to describe dimension members. In this article, we propose a methodology to transform the constellation schema of a data warehouse by integrating factual data into a dimension. The proposed methodology and algorithms enrich a constellation multidimensional schema with new analytical possibilities for decision makers. This enrichment has repercussions for the entire multidimensional schema that are managed by multidimensional modeling, hierarchy calculation and the hierarchy version. In this article, we present a theoretical view of the proposed methodology supported by a case study, an implemented prototype and a complete evaluation based on a standard benchmark.

...read moreread less

Book Chapter•DOI•

Efficient Data Cube Materialization

[...]

Raghu Prashant¹, Mann Suman¹, Raghu Eashwaran¹•Institutions (1)

Maharaja Surajmal Institute of Technology¹

01 Jan 2021

TL;DR: This paper proposes an algorithm for cuboid materialization starting from a source cuboid to the target cuboid in an optimal way such that the intermediate cuboids consume less space and require lower time to generate by making sure those cuboids have the least number of rows compared to other valid cuboids available for selection.

...read moreread less

Abstract: In the field of business intelligence, we require the analysis of multidimensional data with the need for it being fast and interactive. Data warehousing and OLAP approaches have been developed for this purpose in which the data is viewed in the form of a multidimensional data cube which allows interactive analysis of the data in various levels of abstraction presented in a graphical manner. In data cube, there may arise a need to materialize a particular cuboid given that some other cuboid is presently materialized, in this paper, we propose an algorithm for cuboid materialization starting from a source cuboid to the target cuboid in an optimal way such that the intermediate cuboids consume less space and require lower time to generate by making sure those cuboids have the least number of rows compared to other valid cuboids available for selection, by sorting them based on the product of cardinalities of dimensions present in each cuboid.

...read moreread less

Book Chapter•DOI•

Operationalizing Analytics with NewSQL

[...]

Ionela Chereja, Sarah Myriam Lydia Hahn, Oliviu Matei, Anca Avram

01 Apr 2021

TL;DR: This paper aims to provide a structured look into the features and capabilities offered by NewSQL systems that can be leveraged to allow Data Analysis over a variety of data types and an overview of Realtime Analytics offerings, Map Reduce capabilities and hybrid (transactional and analytical) features.

...read moreread less

Abstract: Operational data and analytical data are no longer two separate disciplines and discussions. Data Analysis is gaining more ground and more request from companies that begin to base their strategies - as well as decision intelligence and decision management - on factual information. In surveying the state of the data industry and the trends in data management technologies, NewSQL systems appear to be more present. They are able to answer the question of bridging operational data storage and administration with providing real-time access to analytical data. This paper aims to provide a structured look into the features and capabilities offered by NewSQL systems that can be leveraged to allow Data Analysis over a variety of data types. Furthermore, it provides an overview of Realtime Analytics offerings, Map Reduce capabilities and hybrid (transactional and analytical) features.

...read moreread less

Journal Article•DOI•

Collect and analysis of agro-biodiversity data in a participative context: A business intelligence framework

[...]

Sandro Bimonte, Olivier Billaud, Benoît Fontaine, Thomy Martin, Frédéric Flouvat, Ali Hassan, Nora Rouillier, Lucile Sautot¹ - Show less +4 more•Institutions (1)

Agro ParisTech¹

01 Mar 2021-Ecological Informatics

TL;DR: In this article, a real life citizen science program tailored for farmers is presented, where a data warehouse stores the data collected by citizens and a standard OLAP tool enables citizens and scientists to explore data.

...read moreread less

Journal Article•DOI•

Growth of relational model: Interdependence and complementary to big data

[...]

Sucharitha Shetty¹, B. Dinesh Rao, Srikanth Prabhu¹•Institutions (1)

Manipal Institute of Technology¹

01 Apr 2021-International Journal of Electrical and Computer Engineering

TL;DR: This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability, to highlight the adoption of relational model approaches by bigdata techniques.

...read moreread less

Abstract: A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.

...read moreread less

Journal Article•DOI•

Improving the Performance of Multidimensional Clinical Data for OLAP using an Optimized Data Clustering approach

[...]

Anjana Yadav et.al

11 Apr 2021

TL;DR: A Dragon Fly Optimization based Clustering (DFOC) approach is proposed to enhance the efficiency of data clustering by generating optimal clusters from multidimensional clinical data for OLAP.

...read moreread less

Abstract: Medicine is a fresh way to utilize for curing, analyzing and detecting the diseases through data clustering with OLAP (Online Analytical Processing). The large amount of multidimensional clinical data is reduced the efficiency of OLAP query processing by enhancing the query accessing time. Hence, the performance of OLAP model is improved by using data clustering in which huge data is divided into several groups (clusters) with cluster heads to achieve fast query processing in least time. In this paper, a Dragon Fly Optimization based Clustering (DFOC) approach is proposed to enhance the efficiency of data clustering by generating optimal clusters from multidimensional clinical data for OLAP. The results are evaluated on MATLAB 2019a tool and shown the better performance of DFOC against other clustering methods ACO, GA and K-Means in terms of intra-cluster distance, purity index, F-measure, and standard deviation

...read moreread less

Journal Article•DOI•

Enhancing Cubes with Models to Describe Multidimensional Data

[...]

Matteo Francia¹, Patrick Marcel², Verónika Peralta², Stefano Rizzi¹•Institutions (2)

University of Bologna¹, François Rabelais University²

11 Jun 2021-Information Systems Frontiers

TL;DR: In this article, the authors present an end-to-end implementation of describe, one of the five intention operators introduced by IAM, and assess the validity of their approach in terms of user effort for formulating intentions, effectiveness, efficiency and scalability.

...read moreread less

Abstract: The Intentional Analytics Model (IAM) has been recently envisioned as a new paradigm to couple OLAP and analytics. It relies on two basic ideas: (i) letting the user explore data by expressing her analysis intentions rather than the data she needs, and (ii) returning enhanced cubes, i.e., multidimensional data annotated with knowledge insights in the form of interesting model components (e.g., clusters). In this paper we contribute to give a proof-of-concept for the IAM vision by delivering an end-to-end implementation of describe, one of the five intention operators introduced by IAM. Among the research challenges left open in IAM, those we address are (i) automatically tuning the size of models (e.g., the number of clusters), (ii) devising a measure to estimate the interestingness of model components, (iii) selecting the most effective chart or graph for visualizing each enhanced cube depending on its features, and (iv) devising a visual metaphor to display enhanced cubes and interact with them. We assess the validity of our approach in terms of user effort for formulating intentions, effectiveness, efficiency, and scalability.

...read moreread less

Journal Article•DOI•

Research and Application of a Multidimensional Association Rules Mining Method Based on OLAP

[...]

Hairong Wang¹, Pan Huang¹, Xu Chen¹•Institutions (1)

Minzu University of China¹

01 Jan 2021-International Journal of Information Technology and Web Engineering

Journal Article•DOI•

TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes

[...]

Amine Ghrab, Oscar Romero¹, Sabri Skhiri, Esteban Zimányi²•Institutions (2)

Polytechnic University of Catalonia¹, Université libre de Bruxelles²

01 Feb 2021-Information Systems Frontiers

TL;DR: TopoGraph is introduced, an end-to-end framwork for building and analyzing graph cubes that extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multiddimensional integrity constraints.

...read moreread less

Abstract: Graphs are a fundamental structure that provides an intuitive abstraction for modeling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending decision-support systems with multidimensional analysis capabilities over graphs. In this paper, we introduce TopoGraph, an end-to-end framwork for building and analyzing graph cubes. TopoGraph extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multidimensional integrity constraints. This results in defining three new types of graph cubes: property graph cubes, topological graph cubes, and graph-structured cubes. Afterwards, we define the algebraic OLAP operations for such novel cubes. We implement and experimentally validate TopoGraph with different types of real-world datasets.

...read moreread less

Journal Article•DOI•

Implementing Online Analytical Processing in Hotel Customer Relationship Management

[...]

M Taufik, Faiza Renaldi, Fajri Rakhmat Umbara

01 Mar 2021

TL;DR: An OLAP-based analytical CRM system to analyze customer data and classify it into two main segments: geographically and demographically is developed and developed by integrating data from the hotel transactional system so that the analytical process can run in real-time.

...read moreread less

Abstract: Online Analytical Processing (OLAP) is increasingly being used by applying technology concepts that use a multidimensional view of grouping data to provide quick access to strategic information for analysis purposes. In the tourism industry, especially hospitality, this is very useful, especially in processing hotel operational data. Although OLAP technology has been widely applied in the hotel industry in reporting business sales, marketing, and reporting management analysis, there is still little research that discusses customer activity analysis. This study develops an OLAP-based analytical CRM system to analyze customer data and classify it into two main segments: geographically and demographically. Transactional data for hotels that have existed for the last three years are converted into a data warehouse, including the ETL process from the original database to a star schema database. It has a fact table and dimensions. Furthermore, OLAP cube operations are performed and generate customer reports. Testing in this study was carried out by calculating the total data’s access speed and accuracy from 100 to 5000 customers, and the estimated value was 45.50 to 80 milliseconds in 30 experiments. For further research, it can be developed by integrating data from the hotel transactional system so that the analytical process can run in real-time.

...read moreread less

Journal Article•DOI•

A privacy-preserving National Clinical Data Warehouse: Architecture and analysis

[...]

Md. Raihan Mia¹, Abu Sayed Md. Latiful Hoque¹, Shahidul Islam Khan, Sheikh Iqbal Ahamed²•Institutions (2)

Bangladesh University of Engineering and Technology¹, Marquette University²

01 Dec 2021-Smart Health

TL;DR: In this article, the anonymous National Clinical Data Warehouse (NCDW) framework is designed to reinforce research and analysis in a rapidly developing country, where the existing Electronic Health Records are stored in unconnected, heterogeneous sources with no unique patient identifier and consistency.

...read moreread less

Proceedings Article•DOI•

HeuristicDB: a hybrid storage database system using a non-volatile memory block device

[...]

Jinfeng Yang¹, Bingzhe Li², David J. Lilja¹•Institutions (2)

University of Minnesota¹, Oklahoma State University–Stillwater²

14 Jun 2021

TL;DR: HeuristicDB as discussed by the authors uses an emerging non-volatile memory (NVM) block device as an extension of the database buffer pool to support cache-priority for database requests.

...read moreread less

Abstract: Hybrid storage systems are widely used in big data fields to balance system performance and cost. However, due to a poor understanding of the characteristics of database block requests, past studies in this area cannot fully utilize the performance gain from emerging storage devices. This study presents a hybrid storage database system, called HeuristicDB, which uses an emerging non-volatile memory (NVM) block device as an extension of the database buffer pool. To consider the unique performance behaviors of NVM block devices and the block-level characteristics of database requests, a set of heuristic rules that associate database (block) requests with the appropriate quality of service for the purpose of caching priority are proposed. Using online analytical processing (OLAP) and online transactional processing (OLTP) benchmarks, both trace-based examination and system implementation on MySQL are carried out to evaluate the effectiveness of the proposed design. The experimental results indicate that HeuristicDB provides up to 75% higher performance and migrates 18X fewer data between storage and the NVM block device than existing systems.

...read moreread less

Book Chapter•DOI•

Analyzing Banking Data Using Business Intelligence: A Data Mining Approach

[...]

Anusha Aziz¹, Suman Saha¹, Mohammad Arifuzzaman²•Institutions (2)

College of Business Administration¹, East West University²

01 Jan 2021

TL;DR: In this paper, a data warehouse for bank data relating to consumers, goods, services, etc. has been presented, where the implementation steps of Kimball lifecycle have been presented followed by the ETL process for bank customer's data.

...read moreread less

Abstract: In today’s world, the banking sector has played a key role in the financial development of a country. Generally, in banking sector, there are many types of historical data in multiple heterogeneous databases, and posing queries on these heterogeneous databases is a very complex process. Since banks are running digitally—and generating numerous data—it is a simple transformation to attain a better way to use that data. Therefore, the increasing competition of market changes has demanded bank intelligence for analyzing those enormous data. In this paper, we construct a data warehouse and present the data warehouse applicability in the investigation of the banking data relating to consumers, goods, services, etc. At first, the implementation steps of Kimball lifecycle have been presented followed by the ETL process for bank customer’s data. Afterward, OLAP cube has been developed using Microsoft Visual Studio 2019. Finally, OLAP analysis has been done using Microsoft power BI. The experimental result has unveiled the uniformity and strength of OLAP-based solutions to expansible bank intelligence.

...read moreread less

Proceedings Article•DOI•

OLAP Patterns: A pattern-based approach to multidimensional data analysis

[...]

Ilko Kovacic¹, Christoph G. Schuetz¹, Bernd Neumayr¹, Michael Schrefl¹•Institutions (1)

Johannes Kepler University of Linz¹

25 Nov 2021

TL;DR: In this paper, the authors introduce a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language for OLAP query composition, which can be used to describe a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model.

...read moreread less

Abstract: Users of a business intelligence (BI) system employ an approach referred to as online analytical processing (OLAP) to view multidimensional data from different perspectives. Query languages, e.g., SQL or MDX, allow for flexible querying of multidimensional data but query formulation is often time-consuming and cognitively challenging for many users. Alternatives to using a query language, e.g., graphical OLAP clients, parameterized reports, or dashboards, are often not a full-blown alternative to using a query language. Experience in cooperative research projects with industry led to the following observations regarding the use of OLAP queries in practice. First, within the same organization, similar OLAP queries are repeatedly composed from scratch in order to satisfy similar information needs.Second, across different organizations and even domains, OLAP queries with similar structures are repeatedly composed from scratch. Finally, vague requirements regarding frequently composed OLAP queries in the early stages of a project potentially lead to rushed development in later stages, which can be alleviated by following best practices for OLAP query composition. In engineering, knowledge about best-practice solutions to frequently arising challenges is often documented and represented using patterns. In that spirit, an OLAP pattern describes a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model. This paper introduces a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language.

...read moreread less

Journal Article•DOI•

Multi-Temperate Logical Data Warehouse Design for Large-Scale Healthcare Data

[...]

Bryan Martin¹, Karen C. Davis²•Institutions (2)

University of Cincinnati¹, Miami University²

15 Jul 2021-Big Data Research

TL;DR: In this article, the authors present an OLAP workload for use in supporting and evaluating logical data warehouse (LDW) design algorithms for a large healthcare organization in multi-temperate storage systems.

...read moreread less

Journal Article•DOI•

Missing Data in OLAP Cubes: Challenges and Strategies

[...]

Monica Chiarini Tremblay¹, Alan R. Hevner²•Institutions (2)

College of William & Mary¹, University of South Florida²

01 Jul 2021-Journal of Database Management

Journal Article•DOI•

On-line Analytical Processing (OLAP) Operation for Outpatient Healthcare

[...]

Talib M. J. Al Taleb¹, Sami Hasan¹, Yaqoob Yousif Mahd¹•Institutions (1)

Nahrain University¹

14 Jan 2021-Iraqi journal of science

TL;DR: An architecture for the data warehouse of outpatient healthcare (DWOP) as a data repository collects data from two different sources and provides storage, functionality and responsiveness to queries to meet decision makers requirements.

...read moreread less

Abstract: This paper presents an architecture for the data warehouse of outpatient healthcare (DWOP) as a data repository collects data from two different sources (Databases of outpatient healthcare and Excel files from hospitals) and provides storage, functionality and responsiveness to queries to meet decision makers requirements.Successfully supporting managerial decision-making is critically dependent upon the availability of integrated, high quality information organized and presented in a timely and easily understood manner. â€œOn-Line Analytical Processing (OLAP) is utilized for decision support to get interesting informationâ€ from the data warehouse with a rapid execution time. OLAP is considered one of Business Intelligence tools.

...read moreread less

Journal Article•DOI•

Research on multidimensional information collection algorithm of marine buoy wireless communication network

[...]

Zhuoyu Hu, Tuanfeng Wu

01 Feb 2021-Microprocessors and Microsystems

TL;DR: A marine radio is the fastest solution to resolve the sea of systematically interconnected problems; it is still in the early stages of development, need a discussion on the development of maritime radio communication networks.

...read moreread less