scispace - formally typeset
Open AccessJournal ArticleDOI

Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

Reads0
Chats0
TLDR
Wang et al. as mentioned in this paper proposed a data placement strategy for a big data warehouse over a Hadoop cluster, which enhances the projection, selection, and star-join operations of an OLAP query, such that the system optimizer can perform a star join process locally, in only one spark stage without a shuffle phase.
About
This article is published in Parallel Computing.The article was published on 2022-03-01 and is currently open access. It has received 6 citations till now. The article focuses on the topics: Computer science & Online analytical processing.

read more

Citations
More filters
Book ChapterDOI

Security Scheduling Method of Cloud Network Big Data Cluster Based on Association Rule Algorithm

TL;DR: Wang et al. as discussed by the authors proposed a cloud network big data cluster security scheduling method based on association rule algorithm, which combines the obtained big data information parameters, set the link bandwidth time list structure, and solve the specific value of packet routing index according to the connection form of SDN scheduling system.
Journal ArticleDOI

Decision-Tree-Based Horizontal Fragmentation Method for Data Warehouses

TL;DR: In this paper , a horizontal fragmentation method, called FTree, that uses decision trees to fragment data warehouses is presented to take advantage of the effectiveness that this technique provides in classification.
Proceedings ArticleDOI

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

TL;DR: In this paper , an intelligent parking system architecture based on IOT technology is presented, and an improved artificial intelligence ACA is used in the parking space path planning, taking a parking lot as a real scene, under the simulation conditions.
Journal ArticleDOI

Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

TL;DR: In this article , the authors proposed a data storage optimization model for smart grids based on Hadoop architecture, which combines the characteristics of distributed storage in cloud computing, the smart grid data are equivalent to a task-oriented data set.
Proceedings ArticleDOI

Application of IoT and Artificial Intelligence Technology in Smart Parking Management

Yong Ling Chu, +1 more
TL;DR: In this paper , an intelligent parking system architecture based on IOT technology is presented, in order to improve the parking efficiency of drivers in the parking lot, an improved artificial intelligence ACA is presented to improve parking space path planning, and taking a parking lot as a real scene, under the simulation conditions, the Intelligent parking system development environment is deployed.
References
More filters
Proceedings Article

Spark: cluster computing with working sets

TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.
Proceedings ArticleDOI

Spark SQL: Relational Data Processing in Spark

TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.
Proceedings ArticleDOI

Hive - a petabyte scale data warehouse using Hadoop

TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
Journal ArticleDOI

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Proceedings ArticleDOI

BlinkDB: queries with bounded errors and bounded response times on very large data

TL;DR: BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.