Book ChapterDOI
A DaQL to Monitor Data Quality in Machine Learning Applications
Lisa Ehrlinger,Verena Haunschmid,Davide Palazzini,Christian Lettner +3 more
- pp 227-237
TLDR
DaQL is presented, a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models and is demonstrated and evaluated within an industrial real-world machine learning application at Siemens.Abstract:
Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.read more
Citations
More filters
Journal ArticleDOI
AI System Engineering—Key Challenges and Lessons Learned
Lukas Fischer,Lisa Ehrlinger,Verena Geist,Rudolf Ramler,Florian Sobiezky,Werner Zellinger,David Brunner,Mohit Kumar,Bernhard Moser +8 more
TL;DR: In this paper, the main challenges of deep learning systems are discussed together with the lessons learned from past and ongoing research along the development cycle of machine learning systems, taking into account intrinsic conditions of nowadays deep learning models, data and software quality issues and human-centered artificial intelligence postulates.
Journal ArticleDOI
Design matters in patient-level prediction: evaluation of a cohort vs. case-control design when developing predictive models in observational healthcare datasets
TL;DR: It is shown that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case- control designs over-represent the outcome class leading to miscalibration.
Proceedings ArticleDOI
A Layered Quality Framework for Machine Learning-driven Data and Information Models.
Shelernaz Azimi,Claus Pahl +1 more
TL;DR: This work introduces a framework that presents a range of quality factors for data and resulting machine-learning generated information models that takes into account the different types of machine learning information models as well as the value types that these model provide.
Journal ArticleDOI
Constructing Dependable Data-Driven Software With Machine Learning
Claus Pahl,Shelernaz Azimi +1 more
TL;DR: This work investigates machine learning-driven construction techniques combined with pattern-based architecture, showing that system dependability is linked to data and function quality.
Book ChapterDOI
Applying AI in Practice: Key Challenges and Lessons Learned
Lukas Fischer,Lisa Ehrlinger,Verena Geist,Rudolf Ramler,Florian Sobieczky,Werner Zellinger,Bernhard Moser +6 more
TL;DR: The analysis outlines a fundamental theory-practice gap which superimposes the challenges of AI system engineering at the level of data quality assurance, model building, software engineering and deployment.
References
More filters
Journal ArticleDOI
Beyond accuracy: what data quality means to data consumers
Richard Y. Wang,Diane M. Strong +1 more
TL;DR: Using this framework, IS managers were able to better understand and meet their data consumers' data quality needs and this research provides a basis for future studies that measure data quality along the dimensions of this framework.
Journal ArticleDOI
The impact of poor data quality on the typical enterprise
TL;DR: The typical executive is already besieged by too many problems, low customer satisfaction, high costs, a data warehouse project that is late, and so forth, so this article aims to increase awareness by providing a summary of the impacts of poor data quality on a typical enterprise.
Journal ArticleDOI
A Review of Methods for Missing Data
TL;DR: Model-based methods such as maximum likelihood using the EM algorithm and multiple imputation hold more promise for dealing with difficulties caused by missing data.
Book
Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework
TL;DR: The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality.
The Effects of Data Quality on Machine Learning Algorithms.
TL;DR: This research focuses on research into the effects of data quality upon these algorithms in an effort to demonstrate that data quality is a large factor in the outcomes of these algorithms and should be given more respect in their design.