scispace - formally typeset
Book ChapterDOI

A DaQL to Monitor Data Quality in Machine Learning Applications

TLDR
DaQL is presented, a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models and is demonstrated and evaluated within an industrial real-world machine learning application at Siemens.
Abstract
Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.

read more

Citations
More filters
Journal ArticleDOI

AI System Engineering—Key Challenges and Lessons Learned

TL;DR: In this paper, the main challenges of deep learning systems are discussed together with the lessons learned from past and ongoing research along the development cycle of machine learning systems, taking into account intrinsic conditions of nowadays deep learning models, data and software quality issues and human-centered artificial intelligence postulates.
Journal ArticleDOI

Design matters in patient-level prediction: evaluation of a cohort vs. case-control design when developing predictive models in observational healthcare datasets

TL;DR: It is shown that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case- control designs over-represent the outcome class leading to miscalibration.
Proceedings ArticleDOI

A Layered Quality Framework for Machine Learning-driven Data and Information Models.

TL;DR: This work introduces a framework that presents a range of quality factors for data and resulting machine-learning generated information models that takes into account the different types of machine learning information models as well as the value types that these model provide.
Journal ArticleDOI

Constructing Dependable Data-Driven Software With Machine Learning

TL;DR: This work investigates machine learning-driven construction techniques combined with pattern-based architecture, showing that system dependability is linked to data and function quality.
Book ChapterDOI

Applying AI in Practice: Key Challenges and Lessons Learned

TL;DR: The analysis outlines a fundamental theory-practice gap which superimposes the challenges of AI system engineering at the level of data quality assurance, model building, software engineering and deployment.
References
More filters
Journal ArticleDOI

Beyond accuracy: what data quality means to data consumers

TL;DR: Using this framework, IS managers were able to better understand and meet their data consumers' data quality needs and this research provides a basis for future studies that measure data quality along the dimensions of this framework.
Journal ArticleDOI

The impact of poor data quality on the typical enterprise

TL;DR: The typical executive is already besieged by too many problems, low customer satisfaction, high costs, a data warehouse project that is late, and so forth, so this article aims to increase awareness by providing a summary of the impacts of poor data quality on a typical enterprise.
Journal ArticleDOI

A Review of Methods for Missing Data

TL;DR: Model-based methods such as maximum likelihood using the EM algorithm and multiple imputation hold more promise for dealing with difficulties caused by missing data.
Book

Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework

TL;DR: The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality.

The Effects of Data Quality on Machine Learning Algorithms.

TL;DR: This research focuses on research into the effects of data quality upon these algorithms in an effort to demonstrate that data quality is a large factor in the outcomes of these algorithms and should be given more respect in their design.
Related Papers (5)