Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

doi:10.1016/j.parco.2022.102918

Open AccessJournal ArticleDOI

Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

Yassine Ramdane, +4 more

- 01 Mar 2022 -

Parallel Computing

- Vol. 111, pp 102918-102918

Chats0

TLDR

Wang et al. as mentioned in this paper proposed a data placement strategy for a big data warehouse over a Hadoop cluster, which enhances the projection, selection, and star-join operations of an OLAP query, such that the system optimizer can perform a star join process locally, in only one spark stage without a shuffle phase.

About:

This article is published in Parallel Computing.The article was published on 2022-03-01 and is currently open access. It has received 6 citations till now. The article focuses on the topics: Computer science & Online analytical processing.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Security Scheduling Method of Cloud Network Big Data Cluster Based on Association Rule Algorithm

Te-Fu Peng, +1 more

- 01 Jan 2023 -

Lecture Notes in Computer Science

TL;DR: Wang et al. as discussed by the authors proposed a cloud network big data cluster security scheduling method based on association rule algorithm, which combines the obtained big data information parameters, set the link bandwidth time list structure, and solve the specific value of packet routing index according to the connection form of SDN scheduling system.

...read moreread less

Journal ArticleDOI

Decision-Tree-Based Horizontal Fragmentation Method for Data Warehouses

Nidia Rodríguez Mazahua, +4 more

- 28 Oct 2022 -

Applied Sciences

TL;DR: In this paper , a horizontal fragmentation method, called FTree, that uses decision trees to fragment data warehouses is presented to take advantage of the effectiveness that this technique provides in classification.

...read moreread less

Proceedings ArticleDOI

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

TL;DR: In this paper , an intelligent parking system architecture based on IOT technology is presented, and an improved artificial intelligence ACA is used in the parking space path planning, taking a parking lot as a real scene, under the simulation conditions.

...read moreread less

Journal ArticleDOI

Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

Qiang Wang, +3 more

- 28 Apr 2023 -

Sustainability

TL;DR: In this article , the authors proposed a data storage optimization model for smart grids based on Hadoop architecture, which combines the characteristics of distributed storage in cloud computing, the smart grid data are equivalent to a task-oriented data set.

...read moreread less

Proceedings ArticleDOI

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

Yong Ling Chu, +1 more

TL;DR: In this paper , an intelligent parking system architecture based on IOT technology is presented, in order to improve the parking efficiency of drivers in the parking lot, an improved artificial intelligence ACA is presented to improve parking space path planning, and taking a parking lot as a real scene, under the simulation conditions, the Intelligent parking system development environment is deployed.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Spark: cluster computing with working sets

Matei Zaharia, +4 more

TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.

...read moreread less

Proceedings ArticleDOI

Spark SQL: Relational Data Processing in Spark

Michael Armbrust, +10 more

TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.

...read moreread less

Proceedings ArticleDOI

Hive - a petabyte scale data warehouse using Hadoop

Ashish Thusoo, +8 more

TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.

...read moreread less

Journal ArticleDOI

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Azza Abouzeid, +4 more

TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

...read moreread less

Proceedings ArticleDOI

BlinkDB: queries with bounded errors and bounded response times on very large data

Sameer Agarwal, +5 more

TL;DR: BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.

...read moreread less

Collapse

Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

Citations

Security Scheduling Method of Cloud Network Big Data Cluster Based on Association Rule Algorithm

Decision-Tree-Based Horizontal Fragmentation Method for Data Warehouses

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

References

Spark: cluster computing with working sets

Spark SQL: Relational Data Processing in Spark

Hive - a petabyte scale data warehouse using Hadoop

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

BlinkDB: queries with bounded errors and bounded response times on very large data

Related Papers (5)

Exploiting versions for on-line data warehouse maintenance in MOLAP servers

Data Warehouse Striping: Improved Query Response Time.

Improved multi-level association rule in mining algorithm based on a multidimensional data cube

Data Warehousing and Knowledge Discovery: First International Conference, DaWaK'99 Florence, Italy, August 30 - September 1, 1999 Proceedings

Meta Galaxy: A Flexible and Efficient Cube Model for Data Retrieval in OLAP