Topic

Test data

About: Test data is a research topic. Over the lifetime, 22460 publications have been published within this topic receiving 260060 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning algorithm validation with a limited sample size

[...]

Andrius Vabalas¹, Emma Gowen¹, Ellen Poliakoff¹, Alexander J. Casson¹•Institutions (1)

University of Manchester¹

07 Nov 2019-PLOS ONE

TL;DR: The authors' simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000, while Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size.

...read moreread less

Abstract: Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.

...read moreread less

622 citations

Journal Article•DOI•

Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy

[...]

Lorenzo Bruzzone¹, Mattia Marconcini¹•Institutions (1)

University of Trento¹

01 May 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available.

...read moreread less

Abstract: This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main novel contributions are proposed: 1) a domain adaptation support vector machine (DASVM) technique which extends the formulation of support vector machines (SVMs) to the domain adaptation framework and 2) a circular indirect accuracy assessment strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available. Experimental results, obtained on a series of two-dimensional toy problems and on two real data sets related to brain computer interface and remote sensing applications, confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy.

...read moreread less

599 citations

Journal Article•DOI•

Software reliability analysis models

[...]

Mitsuru Ohba¹•Institutions (1)

IBM¹

17 Aug 1984-Ibm Journal of Research and Development

TL;DR: Improvements to conventional software reliability analysis models by making the assumptions on which they are based more realistic are discussed, including the delayed S-shaped growth model, the inflection S- shaped model, and the hyperexponential model.

...read moreread less

Abstract: This paper discusses improvements to conventional software reliability analysis models by making the assumptions on which they are based more realistic. In an actual project environment, sometimes no more information is available than reliability data obtained from a test report. The models described here are designed to resolve the problems caused by this constraint on the availability of reliability data. By utilizing the technical knowledge about a program, a test, and test data, we can select an appropriate software reliability analysis model for accurate quality assessment. The delayed S-shaped growth model, the inflection S-shaped model, and the hyperexponential model are proposed.

...read moreread less

596 citations

Journal Article•DOI•

Test‐data generation using genetic algorithms

[...]

Roy P. Pargas¹, Mary Jean Harrold², Robert R. Peck¹•Institutions (2)

Clemson University¹, Ohio State University²

01 Dec 1999-Software Testing, Verification & Reliability

TL;DR: This paper presents a technique that uses a genetic algorithm for automatic test‐data generation, a heuristic that mimics the evolution of natural species in searching for the optimal solution to a problem.

...read moreread less

Abstract: This paper presents a technique that uses a genetic algorithm for automatic test-data generation. A genetic algorithm is a heuristic that mimics the evolution of natural species in searching for the optimal solution to a problem. In the test-data generation application, the solution sought by the genetic algorithm is test data that causes execution of a given statement, branch, path, or definition–use pair in the program under test. The test-data-generation technique was implemented in a tool called TGen, in which parallel processing was used to improve the performance of the search. To experiment with TGen, a random test-data generator called Random was also implemented. Both Tgen and Random were used to experiment with the generation of test-data for statement and branch coverage of six programs. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

586 citations

Journal Article•DOI•

On Testing Non-Testable Programs

[...]

Elaine J. Weyuker¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

01 Nov 1982-The Computer Journal

TL;DR: 'The belief that the tester is routinely able to determine whether or not the test output is correct is the oracle assumption.

...read moreread less

Abstract: It is widely accepted that the fundamental limitation of using program testing techniques to determine the correctness of a program is the inability to extrapolate from the correctness of results for a proper subset of the input domain to the program's correctness for all elements of the domain. In particular, for any proper subset of the domain there are infinitely many programs which produce the correct output on those elements, but produce an incorrect output for some other domain element. None the less we routinely test programs to increase our confidence in their correctness, and a great deal of research is currently being devoted to improving the effectiveness of program testing. These efforts fall into three primary categories: (1) the development of a sound theoretical basis for testing; (2) devising and improving testing methodologies, particularly mechanizable ones; (3) the definition of accurate measures of and criteria for test data adequacy. Almost all of the research on software testing therefore focuses on the development and analysis of input data. In particular there is an underlying assumption that once this phase is complete, the remaining tasks are straightforward. These consist of running the program on the selected data, producing output which is then examined to determine the program's correctness on the test data. The mechanism which checks this correctness is known as an oracle, and the belief that the tester is routinely able to determine whether or not the test output is correct is the oracle assumption.' • 2

...read moreread less

582 citations

Collapse

Network Information

Performance

Metrics

22,911

Papers

294,943

Citations

No. of papers in the topic in previous years
Year	Papers
2023	143
2022	328
2021	728
2020	1,254
2019	1,577
2018	1,401

Test data

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics