scispace - formally typeset
Book ChapterDOI

Leveraging the Data Lake: Current State and Challenges

TLDR
This work investigates existing data lake literature and discusses various design and realization aspects for data lakes, such as governance or data models, to identify challenges and research gaps and identify a comprehensive strategy to realize data lakes.
Abstract
The digital transformation leads to massive amounts of heterogeneous data challenging traditional data warehouse solutions in enterprises. In order to exploit these complex data for competitive advantages, the data lake recently emerged as a concept for more flexible and powerful data analytics. However, existing literature on data lakes is rather vague and incomplete, and the various realization approaches that have been proposed neither cover all aspects of data lakes nor do they provide a comprehensive design and realization strategy. Hence, enterprises face multiple challenges when building data lakes. To address these shortcomings, we investigate existing data lake literature and discuss various design and realization aspects for data lakes, such as governance or data models. Based on these insights, we identify challenges and research gaps concerning (1) data lake architecture, (2) data lake governance, and (3) a comprehensive strategy to realize data lakes. These challenges still need to be addressed to successfully leverage the data lake in practice.

read more

Citations
More filters
Journal ArticleDOI

On data lake architectures and metadata management

TL;DR: This paper provides a comprehensive state of the art of the different approaches to data lake design, particularly on data lake architectures and metadata management, which are key issues in successful data lakes.
Journal ArticleDOI

On data lake architectures and metadata management

TL;DR: In this paper, the authors provide a comprehensive state of the art of different approaches to data lake design and particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes.
Journal ArticleDOI

There is no AI without data

TL;DR: In this article, the authors present industry experiences on the data challenges of AI and the call for a data ecosystem for industrial enterprises, and propose a framework for industrial data ecosystems based on AI.
Book ChapterDOI

HANDLE - A Generic Metadata Model for Data Lakes

TL;DR: This work presents HANDLE, a generic metadata model for data lakes, which supports the flexible integration of metadata, data lake zones, metadata on various granular levels, and any metadata categorization and enables comprehensive metadata management in data lakes.
Journal ArticleDOI

CEBA: A Data Lake for Data Sharing and Environmental Monitoring

TL;DR: In this article, the authors present a platform for environmental data named "Environmental Cloud for the Benefit of Agriculture" (CEBA) for sharing, searching, storing and visualizing heterogeneous scientific data related to the environment and agricultural researches.
References
More filters
Journal ArticleDOI

Service Innovation and Smart Analytics for Industry 4.0 and Big Data Environment

TL;DR: This paper addresses the trends of manufacturing service transformation in big data environment, as well as the readiness of smart predictive informatics tools to manage big data, thereby achieving transparency and productivity.
Book

Big Data: Principles and best practices of scalable realtime data systems

Nathan Marz, +1 more
TL;DR: Big Data describes a scalable, easy to understand approach to big data systems that can be built and run by a small team that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
Proceedings ArticleDOI

Constance: An Intelligent Data Lake System

TL;DR: Constance is a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources that discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities.
Proceedings Article

Data Wrangling: The Challenging Yourney from the Wild to the Lake.

TL;DR: This paper proposes that what is really needed is a curated data lake, where the lake contents have undergone a curation process that enable its use and deliver the promise of ad-hoc data accessibility to users beyond the enterprise IT staff.
Proceedings ArticleDOI

Managing data lakes in big data era: What's a data lake and why has it became popular in data management ecosystem

Huang Fang
TL;DR: The concept of a data lake is emerging as a popular way to organize and build the next generation of systems to master new big data challenges, but there are lots of concerns and questions for large enterprises to implement data lakes.