Book ChapterDOI
Leveraging the Data Lake: Current State and Challenges
Corinna Giebler,Christoph Gröger,Eva Hoos,Holger Schwarz,Bernhard Mitschang +4 more
- pp 179-188
TLDR
This work investigates existing data lake literature and discusses various design and realization aspects for data lakes, such as governance or data models, to identify challenges and research gaps and identify a comprehensive strategy to realize data lakes.Abstract:
The digital transformation leads to massive amounts of heterogeneous data challenging traditional data warehouse solutions in enterprises. In order to exploit these complex data for competitive advantages, the data lake recently emerged as a concept for more flexible and powerful data analytics. However, existing literature on data lakes is rather vague and incomplete, and the various realization approaches that have been proposed neither cover all aspects of data lakes nor do they provide a comprehensive design and realization strategy. Hence, enterprises face multiple challenges when building data lakes. To address these shortcomings, we investigate existing data lake literature and discuss various design and realization aspects for data lakes, such as governance or data models. Based on these insights, we identify challenges and research gaps concerning (1) data lake architecture, (2) data lake governance, and (3) a comprehensive strategy to realize data lakes. These challenges still need to be addressed to successfully leverage the data lake in practice.read more
Citations
More filters
Journal ArticleDOI
On data lake architectures and metadata management
TL;DR: This paper provides a comprehensive state of the art of the different approaches to data lake design, particularly on data lake architectures and metadata management, which are key issues in successful data lakes.
Journal ArticleDOI
On data lake architectures and metadata management
TL;DR: In this paper, the authors provide a comprehensive state of the art of different approaches to data lake design and particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes.
Journal ArticleDOI
There is no AI without data
TL;DR: In this article, the authors present industry experiences on the data challenges of AI and the call for a data ecosystem for industrial enterprises, and propose a framework for industrial data ecosystems based on AI.
Book ChapterDOI
HANDLE - A Generic Metadata Model for Data Lakes
TL;DR: This work presents HANDLE, a generic metadata model for data lakes, which supports the flexible integration of metadata, data lake zones, metadata on various granular levels, and any metadata categorization and enables comprehensive metadata management in data lakes.
Journal ArticleDOI
CEBA: A Data Lake for Data Sharing and Environmental Monitoring
TL;DR: In this article, the authors present a platform for environmental data named "Environmental Cloud for the Benefit of Agriculture" (CEBA) for sharing, searching, storing and visualizing heterogeneous scientific data related to the environment and agricultural researches.
References
More filters
Journal ArticleDOI
Service Innovation and Smart Analytics for Industry 4.0 and Big Data Environment
Jay Lee,Hung An Kao,Shanhu Yang +2 more
TL;DR: This paper addresses the trends of manufacturing service transformation in big data environment, as well as the readiness of smart predictive informatics tools to manage big data, thereby achieving transparency and productivity.
Book
Big Data: Principles and best practices of scalable realtime data systems
Nathan Marz,James Warren +1 more
TL;DR: Big Data describes a scalable, easy to understand approach to big data systems that can be built and run by a small team that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
Proceedings ArticleDOI
Constance: An Intelligent Data Lake System
TL;DR: Constance is a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources that discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities.
Proceedings Article
Data Wrangling: The Challenging Yourney from the Wild to the Lake.
TL;DR: This paper proposes that what is really needed is a curated data lake, where the lake contents have undergone a curation process that enable its use and deliver the promise of ad-hoc data accessibility to users beyond the enterprise IT staff.
Proceedings ArticleDOI
Managing data lakes in big data era: What's a data lake and why has it became popular in data management ecosystem
TL;DR: The concept of a data lake is emerging as a popular way to organize and build the next generation of systems to master new big data challenges, but there are lots of concerns and questions for large enterprises to implement data lakes.