Topic

Data quality

About: Data quality is a research topic. Over the lifetime, 17235 publications have been published within this topic receiving 331716 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•

Data Mining Concepts and Techniques

[...]

S. Gnanapriya, R. Suganya, G. Sumithra Devi, M. Suresh Kumar

01 Jan 2010-Data mining and knowledge engineering

TL;DR: Data mining is the search for new, valuable, and nontrivial information in large volumes of data, a cooperative effort of humans and computers that is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining which produces new, nontrivials information based on the available data set.

...read moreread less

Abstract: Understand the need for analyses of large, complex, information-rich data sets. Identify the goals and primary tasks of the data-mining process. Describe the roots of data-mining technology. Recognize the iterative character of a data-mining process and specify its basic steps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers. In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining, which produces new, nontrivial information based on the available data set.

...read moreread less

4,646 citations

Journal Article•DOI•

Beyond accuracy: what data quality means to data consumers

[...]

Richard Y. Wang, Diane M. Strong

01 Mar 1996-Journal of Management Information Systems

TL;DR: Using this framework, IS managers were able to better understand and meet their data consumers' data quality needs and this research provides a basis for future studies that measure data quality along the dimensions of this framework.

...read moreread less

Abstract: Poor data quality (DQ) can have substantial social and economic impacts. Although firms are improving data quality with practical approaches and tools, their improvement efforts tend to focus narrowly on accuracy. We believe that data consumers have a much broader data quality conceptualization than IS professionals realize. The purpose of this paper is to develop a framework that captures the aspects of data quality that are important to data consumers.A two-stage survey and a two-phase sorting study were conducted to develop a hierarchical framework for organizing data quality dimensions. This framework captures dimensions of data quality that are important to data consumers. Intrinsic DQ denotes that data have quality in their own right. Contextual DQ highlights the requirement that data quality must be considered within the context of the task at hand. Representational DQ and accessibility DQ emphasize the importance of the role of systems. These findings are consistent with our understanding that high-quality data should be intrinsically good, contextually appropriate for the task, clearly represented, and accessible to the data consumer.Our framework has been used effectively in industry and government. Using this framework, IS managers were able to better understand and meet their data consumers' data quality needs. The salient feature of this research study is that quality attributes of data are collected from data consumers instead of being defined theoretically or based on researchers' experience. Although exploratory, this research provides a basis for future studies that measure data quality along the dimensions of this framework.

...read moreread less

4,069 citations

Journal Article•DOI•

How good are my data and what is the resolution

[...]

Philip R. Evans¹, Garib N. Murshudov¹•Institutions (1)

Laboratory of Molecular Biology¹

01 Jul 2013-Acta Crystallographica Section D-biological Crystallography

TL;DR: The new scaling program AIMLESS is described and tests of refinements at different resolutions are compared with analyses from the scaling step.

...read moreread less

Abstract: Following integration of the observed diffraction spots, the process of `data reduction' initially aims to determine the point-group symmetry of the data and the likely space group. This can be performed with the program POINTLESS. The scaling program then puts all the measurements on a common scale, averages measurements of symmetry-related reflections (using the symmetry determined previously) and produces many statistics that provide the first important measures of data quality. A new scaling program, AIMLESS, implements scaling models similar to those in SCALA but adds some additional analyses. From the analyses, a number of decisions can be made about the quality of the data and whether some measurements should be discarded. The effective `resolution' of a data set is a difficult and possibly contentious question (particularly with referees of papers) and this is discussed in the light of tests comparing the data-processing statistics with trials of refinement against observed and simulated data, and automated model-building and comparison of maps calculated with different resolution limits. These trials show that adding weak high-resolution data beyond the commonly used limits may make some improvement and does no harm.

...read moreread less

3,596 citations

Journal Article•DOI•

An overview of data warehousing and OLAP technology

[...]

Surajit Chaudhuri¹, Umeshwar Dayal²•Institutions (2)

Microsoft¹, Hewlett-Packard²

01 Mar 1997

TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.

...read moreread less

Abstract: Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.

...read moreread less

2,835 citations

Book•

Principles of Geographical Information Systems for Land Resources Assessment

[...]

Peter A. Burrough

21 Aug 1986

TL;DR: Geographical information systems Data structures for thematic maps Digital elevation models Data input, verification, storage, and output Methods of data analysis and spatial modelling Data quality, errors, and natural variation: sources of error Errors arising through processing.

...read moreread less

Abstract: Geographical information systems Data structures for thematic maps Digital elevation models Data input, verification, storage, and output Methods of data analysis and spatial modelling Data quality, errors, and natural variation: sources of error Errors arising through processing The nature of boundaries Classification methods Methods of spatial interpolation Choosing a geographical information system Appendices Index.

...read moreread less

2,577 citations

Collapse

Network Information

Performance

Metrics

18,536

Papers

389,292

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	457
2022	851
2021	1,027
2020	1,094
2019	1,206

Data quality

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics