Showing papers on "Online analytical processing published in 2022"

PDF

Open Access

Journal Article•DOI•

Data variety, come as you are in multi-model data warehouses

[...]

Sandro Bimonte, Enrico Gallinucci¹, Patrick Marcel², Stefano Rizzi¹•Institutions (2)

University of Bologna¹, François Rabelais University²

01 Feb 2022-Information Systems

TL;DR: This paper investigates the performances of an MMDBMS when used to store multidimensional data for OLAP analyses, and proposes and compares three logical solutions implemented on the PostgreSQL multi-model DBMS.

...read moreread less

10 citations

Proceedings Article•DOI•

Enabling CXL Memory Expansion for In-Memory Database Management Systems

[...]

Min Seon Ahn, Andrew Liang Ping Chang, Donghun Lee, Jongmin Gim, Jungmin Kim, Jae-Wuk Jung, Oliver Rebholz, Vincent Pham, Krishna Teja Malladi, Yang Seok Ki - Show less +6 more

12 Jun 2022

TL;DR: This work introduces flexible CXL memory expansion using a CXL type 3 prototype and evaluates its performance in an IMDBMS, showing that C XL memory devices interfaced with PCIe Gen5 are appropriate for memory expansion with nearly no throughput degradation in OLTP workloads and less than 8% throughput degraded in OLAP workloads.

...read moreread less

Abstract: Limited memory volume is always a performance bottleneck in an in-memory database management system (IMDBMS) as the data size keeps increasing. To overcome the physical memory limitation, heterogeneous and disaggregated computing platforms are proposed, such as Gen-Z, CCIX, OpenCAPI, and CXL. In this work, we introduce flexible CXL memory expansion using a CXL type 3 prototype and evaluate its performance in an IMDBMS. Our evaluation shows that CXL memory devices interfaced with PCIe Gen5 are appropriate for memory expansion with nearly no throughput degradation in OLTP workloads and less than 8% throughput degradation in OLAP workloads. Thus, CXL memory is a good candidate for memory expansion with lower TCO in IMDBMSs.

...read moreread less

10 citations

Journal Article•DOI•

Data variety, come as you are in multi-model data warehouses

[...]

01 Feb 2022-Information Systems

TL;DR: In this article , the authors investigate the performance of a multi-model DBMS when used to store multidimensional data for OLAP analyses, and propose and compare three logical solutions implemented on the PostgreSQL Multi-Model DBMS: one that extends a star schema with JSON, XML, graph-based, and keyvalue data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced).

...read moreread less

7 citations

Journal Article•DOI•

Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

[...]

Yassine Ramdane, Omar Boussaïd, Doulkifli Boukraâ, Nadia Kabachi, Fadila Bentayeb - Show less +1 more

01 Mar 2022-Parallel Computing

TL;DR: Wang et al. as mentioned in this paper proposed a data placement strategy for a big data warehouse over a Hadoop cluster, which enhances the projection, selection, and star-join operations of an OLAP query, such that the system optimizer can perform a star join process locally, in only one spark stage without a shuffle phase.

...read moreread less

6 citations

Journal Article•DOI•

COOL: A framework for conversational OLAP

[...]

Matteo Francia¹, Enrico Gallinucci¹, Matteo Golfarelli¹•Institutions (1)

University of Bologna¹

01 Feb 2022-Information Systems

TL;DR: This paper introduces COOL, a framework devised for COnversational OLap applications that interprets and translates a natural language dialogue into an OLAP session that starts with a GPSJ query and continues with the application of OLAP operators.

...read moreread less

6 citations

Proceedings Article•DOI•

Proteus: Autonomous Adaptive Storage for Mixed Workloads

[...]

Michael Abebe, Horatiu Lazu, Khuzaima Daudjee

10 Jun 2022

TL;DR: Proteus is presented, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads and delivers superior HTAP performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.

...read moreread less

Abstract: Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimized for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers. This paper presents Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. Using comprehensive HTAP workloads and state-of-the-art comparison systems, we demonstrate that Proteus delivers superior HTAP performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.

...read moreread less

4 citations

Journal Article•DOI•

A multi-source spatio-temporal data cube for large-scale geospatial analysis

[...]

Fan Gao, Peng Yue, Zhipeng Cao, Shuaifeng Zhao, Boyi Shangguan, Liangcun Jiang, Lei Hu, Zhe Fang, Zheheng Liang - Show less +5 more

14 Jun 2022-International Journal of Geographical Information Science

TL;DR: The proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data and improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for Eo data cube processing.

...read moreread less

Abstract: Abstract Data management and analysis are challenging with big Earth observation (EO) data. Expanding upon the rising promises of data cubes for analysis-ready big EO data, we propose a new geospatial infrastructure layered over a data cube to facilitate big EO data management and analysis. Compared to previous work on data cubes, the proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data. GeoCube is developed in terms of three major efforts: formalize cube dimensions for multi-source geospatial data, process geospatial data query along these dimensions, and organize cube data for high-performance geoprocessing. This strategy improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for EO data cube processing. The paper highlights the major efforts and key research contributions to online analytical processing for dimension formalization, distributed cube objects for tiles, and artificial intelligence enabled prediction of computational intensity for data cube processing. Case studies with data from Landsat, Gaofen, and OpenStreetMap demonstrate the capabilities and applicability of the proposed infrastructure.

...read moreread less

4 citations

Journal Article•DOI•

SQLite

[...]

01 Aug 2022-Proceedings of The Vldb Endowment

TL;DR: In this paper , the authors discuss SQLite in the context of this changing workload landscape and discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.

...read moreread less

Abstract: In the two decades following its initial release, SQLite has become the most widely deployed database engine in existence. Today, SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed "the SQLite for analytics", has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares. Furthermore, we are aware of no work that attempts to identify root causes for SQLite's performance behavior on OLAP workloads. In this paper, we discuss SQLite in the context of this changing workload landscape. We describe how SQLite evolved from its humble beginnings to the full-featured database engine it is today. We evaluate the performance of modern SQLite on three benchmarks, each representing a different flavor of in-process data management, including transactional, analytical, and blob processing. We delve into analytical data processing on SQLite, identifying key bottlenecks and weighing potential solutions. As a result of our optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.

...read moreread less

3 citations

Journal Article•DOI•

Evaluating English Language Teaching Quality in Classrooms Using OLAP and SVM Algorithms

[...]

Jingping Liu¹•Institutions (1)

Sichuan University¹

09 Apr 2022-Mobile Information Systems

TL;DR: In this paper , online analytical processing (OLAP) in combination with a support vector machine (SVM) classification algorithm is adopted, and then the algorithm constructs the linear optimal decision function in the feature space.

...read moreread less

Abstract: Currently, the existing techniques of evaluating quality of teaching quality colleges and universities within China are based on classical and statistical methods. However, these methods are unable to accurately reflect the real and actual evaluation of teaching quality. Furthermore, in this era of computerization, education has also revamped itself and is not limited to the conventional lecturing approach. Nowadays, English has become one of the most important skills for foreigners as countries move toward internationalization resulting in a huge amount of data being collected in educational databases, which remains unused. Powerful tools are required to improve the quality of English teaching and reap the benefits of big data generated in classrooms. In this paper, online analytical processing (OLAP) in combination with a support vector machine (SVM) classification algorithm is adopted, and then the algorithm constructs the linear optimal decision function in the feature space. Through the training of sample data by the SVM algorithm, relatively high-quality classification results can be obtained on the target object, especially for high-dimensional cases. The proposed approach has an extremely efficient application value. Compared with the existing methods, the error of evaluation results can be greatly improved, which leads to further improvement of accuracy in terms of evaluation results.

...read moreread less

3 citations

Journal Article•DOI•

COOL: A framework for conversational OLAP

[...]

01 Feb 2022-Information Systems

TL;DR: In this article , the authors introduce COOL, a framework devised for COnversational OLAP applications that interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of operators.

...read moreread less

3 citations

Journal Article•DOI•

OLAP Patterns: A pattern-based approach to multidimensional data analysis

[...]

01 Mar 2022-Data and Knowledge Engineering

TL;DR: In this article , the authors introduce a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language for OLAP query composition, which can be used to describe a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model.

...read moreread less

Abstract: Users of a business intelligence (BI) system employ an approach referred to as online analytical processing (OLAP) to view multidimensional data from different perspectives. Query languages, e.g., SQL or MDX, allow for flexible querying of multidimensional data but query formulation is often time-consuming and cognitively challenging for many users. Alternatives to using a query language, e.g., graphical OLAP clients, parameterized reports, or dashboards, are often not a full-blown alternative to using a query language. Experience in cooperative research projects with industry led to the following observations regarding the use of OLAP queries in practice. First, within the same organization, similar OLAP queries are repeatedly composed from scratch in order to satisfy similar information needs. Second, across different organizations and even domains, OLAP queries with similar structures are repeatedly composed from scratch. Finally, vague requirements regarding frequently composed OLAP queries in the early stages of a project potentially lead to rushed development in later stages, which can be alleviated by following best practices for OLAP query composition. In engineering, knowledge about best-practice solutions to frequently arising challenges is often documented and represented using patterns. In that spirit, an OLAP pattern describes a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model. This paper introduces a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language.

...read moreread less

Journal Article•DOI•

A privacy-preserving National Clinical Data Warehouse: Architecture and analysis

[...]

01 Mar 2022-Smart Health

TL;DR: In this paper , the anonymous National Clinical Data Warehouse (NCDW) framework is designed to reinforce research and analysis in a rapidly developing country, where the existing Electronic Health Records are stored in unconnected, heterogeneous sources with no unique patient identifier and consistency.

...read moreread less

Journal Article•DOI•

Performance of NoSQL Graph Implementations of Star vs. Snowflake Schemas

[...]

01 Jan 2022-IEEE Access

TL;DR: In this article , a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G) is proposed, which allows conventional star and snowflake schemas to fit into NoSQL graph databases.

...read moreread less

Abstract: Nowadays, the data used for decision-making come from a wide variety of sources which are difficult to manage using relational databases. To address this problem, many researchers have turned to Not only SQL (NoSQL) databases to provide scalability and flexibility for On-Line Analytical Processing (OLAP) systems. In this paper, we propose a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G). These rules allow conventional star and snowflake schemas to fit into NoSQL graph databases. We apply the proposed rules to implement star-like and snowflake-like graph data warehouses. We compare their performances to similar relational ones focusing on the data model, dimensionality, and size. The experimental results show large differences between relational and graph implementations of a data warehouse. A relational implementation performs better for queries on a couple of tables, but conversely, a graph implementation is better when queries involve many tables. Surprisingly the performances of a star-like and snowflake-like graph data warehouses are very close. Hence a snowflake schema could be used in order to easily consider new sub-dimensions in a graph data warehouse.

...read moreread less

Book Chapter•DOI•

Towards Unification of Statistical Reasoning, OLAP and Association Rule Mining: Semantics and Pragmatics

[...]

Rahul Kumar Sharma, Minakshi Kaushik, Sijo Arakkal Peious, Mahtab Shahin, Amrendra Singh Yadav, Dirk Draheim - Show less +2 more

01 Jan 2022-Lecture Notes in Computer Science

TL;DR: In this paper , the semantic correspondences between decision support techniques, such as statistical reasoning, OLAP, and association rule mining, have been investigated and the unification of these techniques can serve as a foundation for designing next-generation multi-paradigm data mining tools.

...read moreread less

Abstract: Over the last decades, various decision support technologies have gained massive ground in practice and theory. Out of these technologies, statistical reasoning was used widely to elucidate insights from data. Later, we have seen the emergence of online analytical processing (OLAP) and association rule mining, which both come with specific rationales and objectives. Unfortunately, both OLAP and association rule mining have been introduced with their own specific formalizations and terminologies. This made and makes it always hard to reuse results from one domain in another. In particular, it is not always easy to see the potential of statistical results in OLAP and association rule mining application scenarios. This paper aims to bridge the artificial gaps between the three decision support techniques, i.e., statistical reasoning, OLAP, and association rule mining and contribute by elaborating the semantic correspondences between their foundations, i.e., probability theory, relational algebra, and the itemset apparatus. Based on the semantic correspondences, we provide that the unification of these techniques can serve as a foundation for designing next-generation multi-paradigm data mining tools.KeywordsData miningAssociation rule miningOnline analytical processingStatistical reasoning

...read moreread less

Journal Article•DOI•

Beyond TPC-DS, a benchmark for Big Data OLAP systems (BDOLAP-Bench)

[...]

Roberto Tardío, Alejandro Maté, Juan Trujillo

01 Feb 2022-Future Generation Computer Systems

TL;DR: In this paper , the authors proposed a new benchmark specifically designed for Big Data OLAP systems and based on the widely adopted TPC-DS benchmark, which can be used to evaluate and compare them using a benchmark.

...read moreread less

Journal Article•DOI•

An Innovative Model for Extracting OLAP Cubes from NOSQL Database Based on Scalable Naïve Bayes Classifier

[...]

Farnaz Davardoost, Amin Babazadeh Sangar, Kambiz Majidzadeh

11 Apr 2022-Mathematical Problems in Engineering

TL;DR: This article presents a model for extracting OLAP cubes from a document-oriented NOSQL database based on the Naïve Bayes classifier (NBC) and the MapReduce (MR) programming model, which is appropriate and suitable for large-scale data sets.

...read moreread less

Abstract: Due to unstructured and large amounts of data, relational databases are no longer suitable for data management. As a result, new databases known as NOSQL have been introduced. The issue is that such a database is difficult to analyze. Online analytical processing (OLAP) is the foundational technology for data analysis in business intelligence. Because these technologies were designed primarily for relational database systems, performing OLAP in NOSQL is difficult. We present a model for extracting OLAP cubes from a document-oriented NOSQL database in this article. A scalable Naïve Bayes classifier method was used for this purpose. The proposed solution is divided into three stages of preparation, Naïve Bayes, and NBMR. Our proposed algorithm, NBMR, is based on the Naïve Bayes classifier (NBC) and the MapReduce (MR) programming model. Each NOSQL database document with nearly the same attribute will belong to the same class, and as a result, OLAP cubes can be used to perform data analysis. Because the proposed model allows for distributed and parallel Naïve Bayes Classifier computing, it is appropriate and suitable for large-scale data sets. Our proposed model is a proper and efficient approach when considering the speed and reduced the number of required comparisons.

...read moreread less

Proceedings Article•DOI•

GHive: accelerating analytical query processing in apache hive via CPU-GPU heterogeneous computing

[...]

Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xiao Yan, Xinying Zheng, Qiaomu Shen, Danyun Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Runzhe Jiang, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo - Show less +15 more

07 Nov 2022

TL;DR: GHive as discussed by the authors enhances CPU-based Hive via CPU-GPU heterogeneous computing, which enables efficient data movement between CPU memory and GPU memory, and provides a complete set of SQL operators with extensively optimized GPU implementations.

...read moreread less

Abstract: As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).

...read moreread less

Book Chapter•DOI•

Automatic Machine Learning-Based OLAP Measure Detection for Tabular Data

[...]

Yuzhao Yang, Fatma Abdelhedi, Jérôme Darmont, Franck Ravat, Olivier Teste - Show less +1 more

01 Jan 2022

TL;DR: In this article , a machine learning-based method to detect measures by defining three categories of features for numerical columns is proposed, which is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.

...read moreread less

Abstract: Nowadays, it is difficult for companies and organisations without Business Intelligence (BI) experts to carry out data analyses. Existing automatic data warehouse design methods cannot treat with tabular data commonly defined without schema. Dimensions and hierarchies can still be deduced by detecting functional dependencies, but the detection of measures remains a challenge. To solve this issue, we propose a machine learning-based method to detect measures by defining three categories of features for numerical columns. The method is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.

...read moreread less

Book Chapter•DOI•

Mathematical Modeling of Information System Designing Master Plan of the Building Territory Based on OLAP Technology

[...]

01 Jan 2022

TL;DR: In this article , the authors proposed to use a multidimensional data model as a hypercube, the edges of which are sequences of values of the analyzed parameters for multi-dimensional analysis and design of the two subject areas.

...read moreread less

Abstract: AbstractThis study is devoted to the problem of the ability to represent multidimensional design of master plan of building territory based on OLAP technology. Building territory is viewed as complex information system which relate with other information systems of surrounding areas. Mathematical modeling of such information system for multidimensional design of master plan based on OLAP technology is proposed to solve this problem. It is proposed to use a multidimensional data model as a hypercube, the edges of which are sequences of values of the analyzed parameters for multidimensional analysis and design of the two subject areas. The first subject area contains background information about the topography, geodesy and geology of the building territory. The second subject area contains information about the surrounding areas to the building territory. OLAP cubes represent multidimensional information objects with associated properties. Each OLTP system is an implementation of a domain model and is designed by a multidimensional matrix, which is represented as an OLAP cube. The application of this concept in a Web-based environment is covered.KeywordsMathematical modelingMaster planBuilding territoryOLAP

...read moreread less

Proceedings Article•DOI•

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

[...]

01 May 2022

TL;DR: OLxPBench as discussed by the authors is a composite HTAP benchmark suite that includes real-time queries, semantically consistent schema, and domain-specific workloads for benchmarking, designing, and implementing HTAP systems.

...read moreread less

Abstract: As real-time analysis on the fresh data become in-creasingly compelling, more organizations deploy Hybrid Trans-actional/Analytical Processing (HTAP) systems to support real-time queries on data recently generated by online transaction processing. This paper argues that real-time queries, semantically consistent schema, and domain-specific workloads are essential in benchmarking, designing, and implementing HTAP systems. However, most state-of-the-art and state-of-the-practice bench-marks ignore those critical factors. Hence, at best, they are incommensurable and, at worst, misleading in benchmarking, designing, and implementing HTAP systems. This paper presents OLxPBench, a composite HTAP benchmark suite. OLxPBench proposes: (1) the abstraction of a hybrid transaction, performing a real-time query in-between an online transaction, to model widely-observed behavior pattern - making a quick decision while consulting real-time analysis; (2) a semantically consistent schema to express the relationships between OLTP and OLAP schema; (3) the combination of domain-specific and general benchmarks to characterize diverse application scenarios with varying resource demands. Our evaluations justify the three design decisions of OLxPBench and pinpoint the bottlenecks of two mainstream distributed HTAP DBMSs. International Open Benchmark Council (Bench Council) sets up the OLxP-Bench homepage at https://www.benchcouncil.org/olxpbench/. Its source code is available from https://github.com/BenchCouncil/olxpbench.git.

...read moreread less

Book Chapter•DOI•

A Proposed Big Data Architecture Using Data Lakes for Education Systems

[...]

Lamya Oukhouya¹•Institutions (1)

Université Ibn Zohr¹

01 Sep 2022

Journal Article•DOI•

Penerapan Online Analytical Processing (OLAP) dalam Pengelolaan Data Karyawan

[...]

Rima Tamara Aldisa

25 Feb 2022-JURIKOM (Jurnal Riset Komputer)

TL;DR: The Online Analytical Processing (OLAP) method is an approach method to provide answers to requests for dimensional analysis processes quickly, the result to be achieved is to be able to design applications that are expected to provide convenience for companies to carry out the process of recording employee data, employee attendance to presentation as mentioned in this paper .

...read moreread less

Abstract: At this time many companies manage employee data manually from the process of recording employee data, employee absences which can result in data errors. Because of several things, it is necessary to have a system that can record every incoming data as well as a method that can control it so as not to experience excess or deficiency. The method used is the Online Analytical Processing (OLAP) method. The Online Analytical Processing (OLAP) method is an approach method to provide answers to requests for dimensional analysis processes quickly, the result to be achieved is to be able to design applications that are expected to provide convenience for companies to carry out the process of recording employee data, employee attendance to presentation. reports quickly, precisely and accurately

...read moreread less

Journal Article•DOI•

Context-aware OLAP for textual data warehouses

[...]

Santanu Roy, Agostino Cortesi, Soumya Sen

01 Nov 2022

TL;DR: In this article , a context-aware model for textual data warehouses and defines OLAP operations that improve the efficiency of information management within decision support systems is introduced. But the model is limited to capturing contextual relationships when comparing only strongly related documents.

...read moreread less

Abstract: • This paper introduces a context-aware model for textual data warehouses and defines OLAP operations that improve the efficiency of information management within decision support systems. • The proposed model supports business applications that represent contextual information as well as where the dimensions are organized in hierarchical form to represent the same data in different abstract forms. • The model dynamically categorizes the documents using word embedding and agglomerative hierarchical clustering algorithms. • Documents are arranged according to the concept hierarchy existing among the contextual dimensions. • Extensive experimental results show the effiectiveness of our proposal in capturing semantic textual similarity and enhancing decision-making by speeding up of the execution time of OLAP operations. Decision Support Systems (DSS) that leverage business intelligence are based on numerical data and On-line Analytical Processing (OLAP) is often used to implement it. However, business decisions are increasingly dependent on textual data as well. Existing research work on textual data warehouses has the limitation of capturing contextual relationships when comparing only strongly related documents. This paper proposes an Information System (IS) based context-aware model that uses word embedding in conjunction with agglomerative hierarchical clustering algorithms to dynamically categorize documents in order to form the concept hierarchy. The results of the experimental evaluation provide evidence of the effectiveness of integrating textual data into a data warehouse and improving decision making through various OLAP operations.

...read moreread less

Journal Article•DOI•

Modeling and Querying Fuzzy SOLAP-Based Framework

[...]

James E. Brown¹•Institutions (1)

Yüzüncü Yıl University¹

11 Mar 2022

TL;DR: In this paper , the authors present the method used to model the FSOLAP and manage various types of complex and fuzzy spatio-temporal queries using the Fuzzy Logic and Spatial Online Analytical Processing (FSOLAP) framework.

...read moreread less

Abstract: Nowadays, with the rise of sensor technology, the amount of spatial and temporal data is increasing day by day. Modeling data in a structured way and performing effective and efficient complex queries has become more essential than ever. Online analytical processing (OLAP), developed for this purpose, provides appropriate data structures and supports querying multidimensional numeric and alphanumeric data. However, uncertainty and fuzziness are inherent in the data in many complex database applications, especially in spatiotemporal database applications. Therefore, there is always a need to support flexible queries and analyses on uncertain and fuzzy data, due to the nature of the data in these complex spatiotemporal applications. FSOLAP is a new framework based on fuzzy logic technologies and spatial online analytical processing (SOLAP). In this study, we use crisp measures as input for this framework, apply fuzzy operations to obtain the membership functions and fuzzy classes, and then generate fuzzy association rules. Therefore, FSOLAP does not need to use predefined sets of fuzzy inputs. This paper presents the method used to model the FSOLAP and manage various types of complex and fuzzy spatiotemporal queries using the FSOLAP framework. In this context, we describe how to handle non-spatial and fuzzy spatial queries, as well as spatiotemporal fuzzy query types. Additionally, while FSOLAP primarily includes historical data and associated queries and analyses, we also describe how to handle predictive fuzzy spatiotemporal queries, which typically require an inference mechanism.

...read moreread less

Journal Article•DOI•

OCP: an OLAP-based bus crowdedness smart-perceiving mechanism for urban transportation

[...]

Shiwen Gong, Dandan Liu, Zakirul Alam Bhuiyan, Xiaoguang Niu

30 Jun 2022-Connection Science

TL;DR: Li et al. as mentioned in this paper proposed an attribute-enriched and meta-path-based model with machine learning to capture similarity based on the object connectivity, visibility and features, which can be obtained from the participatory passenger's sensor data through deep-neural network-based posture recognition.

...read moreread less

Abstract: In this paper, we deal with the problem of similarity search about crowdedness for participatory-sensing buses for urban transportation. Similarity search is usually applied for measuring similarities in heterogeneous information networks. However, many models implement similarity search in a global setting, without taking object attributes into consideration. OCP, a novel OLAP-based crowdedness perception, is an attribute-enriched and meta-path-based model with machine learning to capture similarity based on the object connectivity, visibility and features. A set of common crowdedness attribute dimensions are defined across different types of objects, which can be obtained from the participatory passenger’s sensor data through deep-neural-network-based posture recognition. Accordingly, an object can be described as a series of node vectors from different dimensions. In such framework, OLAP is applied in analysing multiple resolutions and improving efficiency of similarity search. In addition, our data sources are based on participatory-sensing instead of using vehicle GPS systems. As more data be collected through participatory-sensing, more accurate crowdedness for a bus can be estimated. The experiment results further demonstrate the efficiency of our analytical approaches.

...read moreread less

Journal Article•DOI•

A foundation for spatio-textual-temporal cube analytics

[...]

Thackeray, Henry Robert¹•Institutions (1)

Munich Center for Quantum Science and Technology¹

01 Feb 2022-Information Systems

TL;DR: In this article , the Spatio-Textual-Temporal Cube (STTCube) structure is defined and formalized to enable combined effective and efficient analytical queries over STT data.

...read moreread less

Book Chapter•DOI•

Selection of OLAP Materialized Cube by Using a Fruit Fly Optimization (FFO) Approach: A Multidimensional Data Model

[...]

Anjana Yadav, Anand Tripathi

01 Jan 2022

Proceedings Article•DOI•

Designing an IoT-enabled Data Warehouse for Indoor Radon Time Series Analytics

[...]

Rolando Azevedo, Joaquim P. Silva, Nuno Lopes, António Curado, Leonel J. R. Nunes, Sérgio I. Lopes - Show less +2 more

22 Jun 2022

TL;DR: This article describes the development and implementation of a Data Analytical Processing Module of the RnMonitor platform, composed of an ETL process, a multidimensional database and an OLAP server.

...read moreread less

Abstract: Recently we have seen an increase in the frequency of radon concentration measurement campaigns, largely driven by directives adopted at a European level, and by increased public awareness of the problem. Many of the conducted assessments are simple concentration averages over a time range, where passive sensors are mostly used. Other assessments use portable devices that measure radon in a continuous mode, which can later be downloaded for analysis. However, other types of systems emerged, which continuously measure indoor radon concentrations, and using wireless communications make these data available in real time. This is the case of the RnMonitor platform, designed for online indoor radon monitoring in public buildings in northern Portugal using IoT devices with LoRa communication. The project provides a time series database to collect raw data and a multidimensional data warehouse to store a large amount of these data in the long term, and to make possible the use of OLAP analytical query tools to explore the data in a multidimensional way. Thus, this article describes the development and implementation of a Data Analytical Processing Module of the RnMonitor platform. This module is composed by an ETL process, a multidimensional database and an OLAP server.

...read moreread less

Journal Article•DOI•

ByteHTAP

[...]

Jianjun Chen, Yonghua Ding, Ye Liu, Fangshi Li, Li Hui Zhang, Kui Wei, Lixun Cao, Dan Zou, Yang Liu, Lei Zhang, Rui Shi, Weiping Ding, Kai-Gui Wu, Shangyu Luo, Jason Sun, Yu Liang - Show less +12 more

01 Aug 2022-Proceedings of The Vldb Endowment

TL;DR: The journey of building ByteHTAP, an HTAP system with high data freshness and strong data consistency, which adopts a separate-engine and shared-storage architecture and fully utilizes an existing ByteDance's OLTP system and an open source OLAP system.

...read moreread less

Abstract: In recent years, at ByteDance, we see more and more business scenarios that require performing complex analysis over freshly imported data, together with transaction support and strong data consistency. In this paper, we describe our journey of building ByteHTAP, an HTAP system with high data freshness and strong data consistency. It adopts a separate-engine and shared-storage architecture. Its modular system design fully utilizes an existing ByteDance's OLTP system and an open source OLAP system. This choice saves us a lot of resources and development time and allows easy future extensions such as replacing the query processing engine with other alternatives. ByteHTAP can provide high data freshness with less than one second delay, which enables many new business opportunities for our customers. Customers can also configure different data freshness thresholds based on their business needs. ByteHTAP also provides strong data consistency through global timestamps across its OLTP and OLAP system, which greatly relieves application developers from handling complex data consistency issues by themselves. In addition, we introduce some important performance optimizations to ByteHTAP, such as pushing computations to the storage layer and using delete bitmaps to efficiently handle deletes. Lastly, we will share our lessons and best practices in developing and running ByteHTAP in production.

...read moreread less

Proceedings Article•DOI•

A Query-Level Distributed Database Tuning System with Machine Learning

[...]

01 Aug 2022

TL;DR: Wang et al. as discussed by the authors proposed a query-level tuning system for distributed database with the machine learning method, which can efficiently recommend knobs according to the feature of the query and achieve higher performance under typical OLAP workload.

...read moreread less

Abstract: Knob tuning is important to improve the performance of database management system. However, the traditional manual tuning method by DBA is time-consuming and error-prone, and can not meet the requirements of different database instances. In recent years, the research on automatic knob tuning using machine learning algorithm has gradually sprung up, but most of them only support workload-level knob tuning, and the studies on query-level tuning is still in the initial stage. Furthermore, few works are focus on the knob tuning for distributed database. In this paper, we propose a query-level tuning system for distribute database with the machine learning method. This system can efficiently recommend knobs according to the feature of the query. We deployed our techniques onto CockroachDB, a distribute database, and experimental results show that our system achieves higher performance under typical OLAP workload. For all categories of queries, our system reduces the latency by 9.2% on average, and for some categories of queries, this system reduces the latency by more than 60%.

...read moreread less