scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2022"


Journal ArticleDOI
TL;DR: This paper investigates the performances of an MMDBMS when used to store multidimensional data for OLAP analyses, and proposes and compares three logical solutions implemented on the PostgreSQL multi-model DBMS.

10 citations


Proceedings ArticleDOI
12 Jun 2022
TL;DR: This work introduces flexible CXL memory expansion using a CXL type 3 prototype and evaluates its performance in an IMDBMS, showing that C XL memory devices interfaced with PCIe Gen5 are appropriate for memory expansion with nearly no throughput degradation in OLTP workloads and less than 8% throughput degraded in OLAP workloads.
Abstract: Limited memory volume is always a performance bottleneck in an in-memory database management system (IMDBMS) as the data size keeps increasing. To overcome the physical memory limitation, heterogeneous and disaggregated computing platforms are proposed, such as Gen-Z, CCIX, OpenCAPI, and CXL. In this work, we introduce flexible CXL memory expansion using a CXL type 3 prototype and evaluate its performance in an IMDBMS. Our evaluation shows that CXL memory devices interfaced with PCIe Gen5 are appropriate for memory expansion with nearly no throughput degradation in OLTP workloads and less than 8% throughput degradation in OLAP workloads. Thus, CXL memory is a good candidate for memory expansion with lower TCO in IMDBMSs.

10 citations


Journal ArticleDOI
TL;DR: In this article , the authors investigate the performance of a multi-model DBMS when used to store multidimensional data for OLAP analyses, and propose and compare three logical solutions implemented on the PostgreSQL Multi-Model DBMS: one that extends a star schema with JSON, XML, graph-based, and keyvalue data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced).

7 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a data placement strategy for a big data warehouse over a Hadoop cluster, which enhances the projection, selection, and star-join operations of an OLAP query, such that the system optimizer can perform a star join process locally, in only one spark stage without a shuffle phase.

6 citations


Journal ArticleDOI
TL;DR: This paper introduces COOL, a framework devised for COnversational OLap applications that interprets and translates a natural language dialogue into an OLAP session that starts with a GPSJ query and continues with the application of OLAP operators.

6 citations


Proceedings ArticleDOI
10 Jun 2022
TL;DR: Proteus is presented, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads and delivers superior HTAP performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.
Abstract: Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimized for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers. This paper presents Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. Using comprehensive HTAP workloads and state-of-the-art comparison systems, we demonstrate that Proteus delivers superior HTAP performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.

4 citations


Journal ArticleDOI
TL;DR: The proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data and improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for Eo data cube processing.
Abstract: Abstract Data management and analysis are challenging with big Earth observation (EO) data. Expanding upon the rising promises of data cubes for analysis-ready big EO data, we propose a new geospatial infrastructure layered over a data cube to facilitate big EO data management and analysis. Compared to previous work on data cubes, the proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data. GeoCube is developed in terms of three major efforts: formalize cube dimensions for multi-source geospatial data, process geospatial data query along these dimensions, and organize cube data for high-performance geoprocessing. This strategy improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for EO data cube processing. The paper highlights the major efforts and key research contributions to online analytical processing for dimension formalization, distributed cube objects for tiles, and artificial intelligence enabled prediction of computational intensity for data cube processing. Case studies with data from Landsat, Gaofen, and OpenStreetMap demonstrate the capabilities and applicability of the proposed infrastructure.

4 citations


Journal ArticleDOI

[...]

TL;DR: In this paper , the authors discuss SQLite in the context of this changing workload landscape and discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.
Abstract: In the two decades following its initial release, SQLite has become the most widely deployed database engine in existence. Today, SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed "the SQLite for analytics", has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares. Furthermore, we are aware of no work that attempts to identify root causes for SQLite's performance behavior on OLAP workloads. In this paper, we discuss SQLite in the context of this changing workload landscape. We describe how SQLite evolved from its humble beginnings to the full-featured database engine it is today. We evaluate the performance of modern SQLite on three benchmarks, each representing a different flavor of in-process data management, including transactional, analytical, and blob processing. We delve into analytical data processing on SQLite, identifying key bottlenecks and weighing potential solutions. As a result of our optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.

3 citations


Journal ArticleDOI
Jingping Liu1
TL;DR: In this paper , online analytical processing (OLAP) in combination with a support vector machine (SVM) classification algorithm is adopted, and then the algorithm constructs the linear optimal decision function in the feature space.
Abstract: Currently, the existing techniques of evaluating quality of teaching quality colleges and universities within China are based on classical and statistical methods. However, these methods are unable to accurately reflect the real and actual evaluation of teaching quality. Furthermore, in this era of computerization, education has also revamped itself and is not limited to the conventional lecturing approach. Nowadays, English has become one of the most important skills for foreigners as countries move toward internationalization resulting in a huge amount of data being collected in educational databases, which remains unused. Powerful tools are required to improve the quality of English teaching and reap the benefits of big data generated in classrooms. In this paper, online analytical processing (OLAP) in combination with a support vector machine (SVM) classification algorithm is adopted, and then the algorithm constructs the linear optimal decision function in the feature space. Through the training of sample data by the SVM algorithm, relatively high-quality classification results can be obtained on the target object, especially for high-dimensional cases. The proposed approach has an extremely efficient application value. Compared with the existing methods, the error of evaluation results can be greatly improved, which leads to further improvement of accuracy in terms of evaluation results.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors introduce COOL, a framework devised for COnversational OLAP applications that interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of operators.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors introduce a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language for OLAP query composition, which can be used to describe a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model.
Abstract: Users of a business intelligence (BI) system employ an approach referred to as online analytical processing (OLAP) to view multidimensional data from different perspectives. Query languages, e.g., SQL or MDX, allow for flexible querying of multidimensional data but query formulation is often time-consuming and cognitively challenging for many users. Alternatives to using a query language, e.g., graphical OLAP clients, parameterized reports, or dashboards, are often not a full-blown alternative to using a query language. Experience in cooperative research projects with industry led to the following observations regarding the use of OLAP queries in practice. First, within the same organization, similar OLAP queries are repeatedly composed from scratch in order to satisfy similar information needs. Second, across different organizations and even domains, OLAP queries with similar structures are repeatedly composed from scratch. Finally, vague requirements regarding frequently composed OLAP queries in the early stages of a project potentially lead to rushed development in later stages, which can be alleviated by following best practices for OLAP query composition. In engineering, knowledge about best-practice solutions to frequently arising challenges is often documented and represented using patterns. In that spirit, an OLAP pattern describes a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model. This paper introduces a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language.

Journal ArticleDOI
TL;DR: In this paper , the anonymous National Clinical Data Warehouse (NCDW) framework is designed to reinforce research and analysis in a rapidly developing country, where the existing Electronic Health Records are stored in unconnected, heterogeneous sources with no unique patient identifier and consistency.

Journal ArticleDOI
TL;DR: In this article , a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G) is proposed, which allows conventional star and snowflake schemas to fit into NoSQL graph databases.
Abstract: Nowadays, the data used for decision-making come from a wide variety of sources which are difficult to manage using relational databases. To address this problem, many researchers have turned to Not only SQL (NoSQL) databases to provide scalability and flexibility for On-Line Analytical Processing (OLAP) systems. In this paper, we propose a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G). These rules allow conventional star and snowflake schemas to fit into NoSQL graph databases. We apply the proposed rules to implement star-like and snowflake-like graph data warehouses. We compare their performances to similar relational ones focusing on the data model, dimensionality, and size. The experimental results show large differences between relational and graph implementations of a data warehouse. A relational implementation performs better for queries on a couple of tables, but conversely, a graph implementation is better when queries involve many tables. Surprisingly the performances of a star-like and snowflake-like graph data warehouses are very close. Hence a snowflake schema could be used in order to easily consider new sub-dimensions in a graph data warehouse.

Book ChapterDOI
TL;DR: In this paper , the semantic correspondences between decision support techniques, such as statistical reasoning, OLAP, and association rule mining, have been investigated and the unification of these techniques can serve as a foundation for designing next-generation multi-paradigm data mining tools.
Abstract: Over the last decades, various decision support technologies have gained massive ground in practice and theory. Out of these technologies, statistical reasoning was used widely to elucidate insights from data. Later, we have seen the emergence of online analytical processing (OLAP) and association rule mining, which both come with specific rationales and objectives. Unfortunately, both OLAP and association rule mining have been introduced with their own specific formalizations and terminologies. This made and makes it always hard to reuse results from one domain in another. In particular, it is not always easy to see the potential of statistical results in OLAP and association rule mining application scenarios. This paper aims to bridge the artificial gaps between the three decision support techniques, i.e., statistical reasoning, OLAP, and association rule mining and contribute by elaborating the semantic correspondences between their foundations, i.e., probability theory, relational algebra, and the itemset apparatus. Based on the semantic correspondences, we provide that the unification of these techniques can serve as a foundation for designing next-generation multi-paradigm data mining tools.KeywordsData miningAssociation rule miningOnline analytical processingStatistical reasoning

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a new benchmark specifically designed for Big Data OLAP systems and based on the widely adopted TPC-DS benchmark, which can be used to evaluate and compare them using a benchmark.

Journal ArticleDOI
TL;DR: This article presents a model for extracting OLAP cubes from a document-oriented NOSQL database based on the Naïve Bayes classifier (NBC) and the MapReduce (MR) programming model, which is appropriate and suitable for large-scale data sets.
Abstract: Due to unstructured and large amounts of data, relational databases are no longer suitable for data management. As a result, new databases known as NOSQL have been introduced. The issue is that such a database is difficult to analyze. Online analytical processing (OLAP) is the foundational technology for data analysis in business intelligence. Because these technologies were designed primarily for relational database systems, performing OLAP in NOSQL is difficult. We present a model for extracting OLAP cubes from a document-oriented NOSQL database in this article. A scalable Naïve Bayes classifier method was used for this purpose. The proposed solution is divided into three stages of preparation, Naïve Bayes, and NBMR. Our proposed algorithm, NBMR, is based on the Naïve Bayes classifier (NBC) and the MapReduce (MR) programming model. Each NOSQL database document with nearly the same attribute will belong to the same class, and as a result, OLAP cubes can be used to perform data analysis. Because the proposed model allows for distributed and parallel Naïve Bayes Classifier computing, it is appropriate and suitable for large-scale data sets. Our proposed model is a proper and efficient approach when considering the speed and reduced the number of required comparisons.

Proceedings ArticleDOI
07 Nov 2022
TL;DR: GHive as discussed by the authors enhances CPU-based Hive via CPU-GPU heterogeneous computing, which enables efficient data movement between CPU memory and GPU memory, and provides a complete set of SQL operators with extensively optimized GPU implementations.
Abstract: As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , a machine learning-based method to detect measures by defining three categories of features for numerical columns is proposed, which is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.
Abstract: Nowadays, it is difficult for companies and organisations without Business Intelligence (BI) experts to carry out data analyses. Existing automatic data warehouse design methods cannot treat with tabular data commonly defined without schema. Dimensions and hierarchies can still be deduced by detecting functional dependencies, but the detection of measures remains a challenge. To solve this issue, we propose a machine learning-based method to detect measures by defining three categories of features for numerical columns. The method is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , the authors proposed to use a multidimensional data model as a hypercube, the edges of which are sequences of values of the analyzed parameters for multi-dimensional analysis and design of the two subject areas.
Abstract: AbstractThis study is devoted to the problem of the ability to represent multidimensional design of master plan of building territory based on OLAP technology. Building territory is viewed as complex information system which relate with other information systems of surrounding areas. Mathematical modeling of such information system for multidimensional design of master plan based on OLAP technology is proposed to solve this problem. It is proposed to use a multidimensional data model as a hypercube, the edges of which are sequences of values of the analyzed parameters for multidimensional analysis and design of the two subject areas. The first subject area contains background information about the topography, geodesy and geology of the building territory. The second subject area contains information about the surrounding areas to the building territory. OLAP cubes represent multidimensional information objects with associated properties. Each OLTP system is an implementation of a domain model and is designed by a multidimensional matrix, which is represented as an OLAP cube. The application of this concept in a Web-based environment is covered.KeywordsMathematical modelingMaster planBuilding territoryOLAP

Proceedings ArticleDOI
01 May 2022
TL;DR: OLxPBench as discussed by the authors is a composite HTAP benchmark suite that includes real-time queries, semantically consistent schema, and domain-specific workloads for benchmarking, designing, and implementing HTAP systems.
Abstract: As real-time analysis on the fresh data become in-creasingly compelling, more organizations deploy Hybrid Trans-actional/Analytical Processing (HTAP) systems to support real-time queries on data recently generated by online transaction processing. This paper argues that real-time queries, semantically consistent schema, and domain-specific workloads are essential in benchmarking, designing, and implementing HTAP systems. However, most state-of-the-art and state-of-the-practice bench-marks ignore those critical factors. Hence, at best, they are incommensurable and, at worst, misleading in benchmarking, designing, and implementing HTAP systems. This paper presents OLxPBench, a composite HTAP benchmark suite. OLxPBench proposes: (1) the abstraction of a hybrid transaction, performing a real-time query in-between an online transaction, to model widely-observed behavior pattern - making a quick decision while consulting real-time analysis; (2) a semantically consistent schema to express the relationships between OLTP and OLAP schema; (3) the combination of domain-specific and general benchmarks to characterize diverse application scenarios with varying resource demands. Our evaluations justify the three design decisions of OLxPBench and pinpoint the bottlenecks of two mainstream distributed HTAP DBMSs. International Open Benchmark Council (Bench Council) sets up the OLxP-Bench homepage at https://www.benchcouncil.org/olxpbench/. Its source code is available from https://github.com/BenchCouncil/olxpbench.git.


Journal ArticleDOI
TL;DR: The Online Analytical Processing (OLAP) method is an approach method to provide answers to requests for dimensional analysis processes quickly, the result to be achieved is to be able to design applications that are expected to provide convenience for companies to carry out the process of recording employee data, employee attendance to presentation as mentioned in this paper .
Abstract: At this time many companies manage employee data manually from the process of recording employee data, employee absences which can result in data errors. Because of several things, it is necessary to have a system that can record every incoming data as well as a method that can control it so as not to experience excess or deficiency. The method used is the Online Analytical Processing (OLAP) method. The Online Analytical Processing (OLAP) method is an approach method to provide answers to requests for dimensional analysis processes quickly, the result to be achieved is to be able to design applications that are expected to provide convenience for companies to carry out the process of recording employee data, employee attendance to presentation. reports quickly, precisely and accurately

Journal ArticleDOI
01 Nov 2022
TL;DR: In this article , a context-aware model for textual data warehouses and defines OLAP operations that improve the efficiency of information management within decision support systems is introduced. But the model is limited to capturing contextual relationships when comparing only strongly related documents.
Abstract: • This paper introduces a context-aware model for textual data warehouses and defines OLAP operations that improve the efficiency of information management within decision support systems. • The proposed model supports business applications that represent contextual information as well as where the dimensions are organized in hierarchical form to represent the same data in different abstract forms. • The model dynamically categorizes the documents using word embedding and agglomerative hierarchical clustering algorithms. • Documents are arranged according to the concept hierarchy existing among the contextual dimensions. • Extensive experimental results show the effiectiveness of our proposal in capturing semantic textual similarity and enhancing decision-making by speeding up of the execution time of OLAP operations. Decision Support Systems (DSS) that leverage business intelligence are based on numerical data and On-line Analytical Processing (OLAP) is often used to implement it. However, business decisions are increasingly dependent on textual data as well. Existing research work on textual data warehouses has the limitation of capturing contextual relationships when comparing only strongly related documents. This paper proposes an Information System (IS) based context-aware model that uses word embedding in conjunction with agglomerative hierarchical clustering algorithms to dynamically categorize documents in order to form the concept hierarchy. The results of the experimental evaluation provide evidence of the effectiveness of integrating textual data into a data warehouse and improving decision making through various OLAP operations.

Journal ArticleDOI
11 Mar 2022
TL;DR: In this paper , the authors present the method used to model the FSOLAP and manage various types of complex and fuzzy spatio-temporal queries using the Fuzzy Logic and Spatial Online Analytical Processing (FSOLAP) framework.
Abstract: Nowadays, with the rise of sensor technology, the amount of spatial and temporal data is increasing day by day. Modeling data in a structured way and performing effective and efficient complex queries has become more essential than ever. Online analytical processing (OLAP), developed for this purpose, provides appropriate data structures and supports querying multidimensional numeric and alphanumeric data. However, uncertainty and fuzziness are inherent in the data in many complex database applications, especially in spatiotemporal database applications. Therefore, there is always a need to support flexible queries and analyses on uncertain and fuzzy data, due to the nature of the data in these complex spatiotemporal applications. FSOLAP is a new framework based on fuzzy logic technologies and spatial online analytical processing (SOLAP). In this study, we use crisp measures as input for this framework, apply fuzzy operations to obtain the membership functions and fuzzy classes, and then generate fuzzy association rules. Therefore, FSOLAP does not need to use predefined sets of fuzzy inputs. This paper presents the method used to model the FSOLAP and manage various types of complex and fuzzy spatiotemporal queries using the FSOLAP framework. In this context, we describe how to handle non-spatial and fuzzy spatial queries, as well as spatiotemporal fuzzy query types. Additionally, while FSOLAP primarily includes historical data and associated queries and analyses, we also describe how to handle predictive fuzzy spatiotemporal queries, which typically require an inference mechanism.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an attribute-enriched and meta-path-based model with machine learning to capture similarity based on the object connectivity, visibility and features, which can be obtained from the participatory passenger's sensor data through deep-neural network-based posture recognition.
Abstract: In this paper, we deal with the problem of similarity search about crowdedness for participatory-sensing buses for urban transportation. Similarity search is usually applied for measuring similarities in heterogeneous information networks. However, many models implement similarity search in a global setting, without taking object attributes into consideration. OCP, a novel OLAP-based crowdedness perception, is an attribute-enriched and meta-path-based model with machine learning to capture similarity based on the object connectivity, visibility and features. A set of common crowdedness attribute dimensions are defined across different types of objects, which can be obtained from the participatory passenger’s sensor data through deep-neural-network-based posture recognition. Accordingly, an object can be described as a series of node vectors from different dimensions. In such framework, OLAP is applied in analysing multiple resolutions and improving efficiency of similarity search. In addition, our data sources are based on participatory-sensing instead of using vehicle GPS systems. As more data be collected through participatory-sensing, more accurate crowdedness for a bus can be estimated. The experiment results further demonstrate the efficiency of our analytical approaches.

Journal ArticleDOI
TL;DR: In this article , the Spatio-Textual-Temporal Cube (STTCube) structure is defined and formalized to enable combined effective and efficient analytical queries over STT data.


Proceedings ArticleDOI
22 Jun 2022
TL;DR: This article describes the development and implementation of a Data Analytical Processing Module of the RnMonitor platform, composed of an ETL process, a multidimensional database and an OLAP server.
Abstract: Recently we have seen an increase in the frequency of radon concentration measurement campaigns, largely driven by directives adopted at a European level, and by increased public awareness of the problem. Many of the conducted assessments are simple concentration averages over a time range, where passive sensors are mostly used. Other assessments use portable devices that measure radon in a continuous mode, which can later be downloaded for analysis. However, other types of systems emerged, which continuously measure indoor radon concentrations, and using wireless communications make these data available in real time. This is the case of the RnMonitor platform, designed for online indoor radon monitoring in public buildings in northern Portugal using IoT devices with LoRa communication. The project provides a time series database to collect raw data and a multidimensional data warehouse to store a large amount of these data in the long term, and to make possible the use of OLAP analytical query tools to explore the data in a multidimensional way. Thus, this article describes the development and implementation of a Data Analytical Processing Module of the RnMonitor platform. This module is composed by an ETL process, a multidimensional database and an OLAP server.

Journal ArticleDOI
TL;DR: The journey of building ByteHTAP, an HTAP system with high data freshness and strong data consistency, which adopts a separate-engine and shared-storage architecture and fully utilizes an existing ByteDance's OLTP system and an open source OLAP system.
Abstract: In recent years, at ByteDance, we see more and more business scenarios that require performing complex analysis over freshly imported data, together with transaction support and strong data consistency. In this paper, we describe our journey of building ByteHTAP, an HTAP system with high data freshness and strong data consistency. It adopts a separate-engine and shared-storage architecture. Its modular system design fully utilizes an existing ByteDance's OLTP system and an open source OLAP system. This choice saves us a lot of resources and development time and allows easy future extensions such as replacing the query processing engine with other alternatives. ByteHTAP can provide high data freshness with less than one second delay, which enables many new business opportunities for our customers. Customers can also configure different data freshness thresholds based on their business needs. ByteHTAP also provides strong data consistency through global timestamps across its OLTP and OLAP system, which greatly relieves application developers from handling complex data consistency issues by themselves. In addition, we introduce some important performance optimizations to ByteHTAP, such as pushing computations to the storage layer and using delete bitmaps to efficiently handle deletes. Lastly, we will share our lessons and best practices in developing and running ByteHTAP in production.

Proceedings ArticleDOI
01 Aug 2022
TL;DR: Wang et al. as discussed by the authors proposed a query-level tuning system for distributed database with the machine learning method, which can efficiently recommend knobs according to the feature of the query and achieve higher performance under typical OLAP workload.
Abstract: Knob tuning is important to improve the performance of database management system. However, the traditional manual tuning method by DBA is time-consuming and error-prone, and can not meet the requirements of different database instances. In recent years, the research on automatic knob tuning using machine learning algorithm has gradually sprung up, but most of them only support workload-level knob tuning, and the studies on query-level tuning is still in the initial stage. Furthermore, few works are focus on the knob tuning for distributed database. In this paper, we propose a query-level tuning system for distribute database with the machine learning method. This system can efficiently recommend knobs according to the feature of the query. We deployed our techniques onto CockroachDB, a distribute database, and experimental results show that our system achieves higher performance under typical OLAP workload. For all categories of queries, our system reduces the latency by 9.2% on average, and for some categories of queries, this system reduces the latency by more than 60%.