scispace - formally typeset
Search or ask a question
Topic

Test data

About: Test data is a research topic. Over the lifetime, 22460 publications have been published within this topic receiving 260060 citations.


Papers
More filters
Journal ArticleDOI
07 Nov 2019-PLOS ONE
TL;DR: The authors' simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000, while Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size.
Abstract: Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.

622 citations

Journal ArticleDOI
TL;DR: Experimental results confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available.
Abstract: This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main novel contributions are proposed: 1) a domain adaptation support vector machine (DASVM) technique which extends the formulation of support vector machines (SVMs) to the domain adaptation framework and 2) a circular indirect accuracy assessment strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available. Experimental results, obtained on a series of two-dimensional toy problems and on two real data sets related to brain computer interface and remote sensing applications, confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy.

599 citations

Journal ArticleDOI
Mitsuru Ohba1
TL;DR: Improvements to conventional software reliability analysis models by making the assumptions on which they are based more realistic are discussed, including the delayed S-shaped growth model, the inflection S- shaped model, and the hyperexponential model.
Abstract: This paper discusses improvements to conventional software reliability analysis models by making the assumptions on which they are based more realistic. In an actual project environment, sometimes no more information is available than reliability data obtained from a test report. The models described here are designed to resolve the problems caused by this constraint on the availability of reliability data. By utilizing the technical knowledge about a program, a test, and test data, we can select an appropriate software reliability analysis model for accurate quality assessment. The delayed S-shaped growth model, the inflection S-shaped model, and the hyperexponential model are proposed.

596 citations

Journal ArticleDOI
TL;DR: This paper presents a technique that uses a genetic algorithm for automatic test‐data generation, a heuristic that mimics the evolution of natural species in searching for the optimal solution to a problem.
Abstract: This paper presents a technique that uses a genetic algorithm for automatic test-data generation. A genetic algorithm is a heuristic that mimics the evolution of natural species in searching for the optimal solution to a problem. In the test-data generation application, the solution sought by the genetic algorithm is test data that causes execution of a given statement, branch, path, or definition–use pair in the program under test. The test-data-generation technique was implemented in a tool called TGen, in which parallel processing was used to improve the performance of the search. To experiment with TGen, a random test-data generator called Random was also implemented. Both Tgen and Random were used to experiment with the generation of test-data for statement and branch coverage of six programs. Copyright © 1999 John Wiley & Sons, Ltd.

586 citations

Journal ArticleDOI
TL;DR: 'The belief that the tester is routinely able to determine whether or not the test output is correct is the oracle assumption.
Abstract: It is widely accepted that the fundamental limitation of using program testing techniques to determine the correctness of a program is the inability to extrapolate from the correctness of results for a proper subset of the input domain to the program's correctness for all elements of the domain. In particular, for any proper subset of the domain there are infinitely many programs which produce the correct output on those elements, but produce an incorrect output for some other domain element. None the less we routinely test programs to increase our confidence in their correctness, and a great deal of research is currently being devoted to improving the effectiveness of program testing. These efforts fall into three primary categories: (1) the development of a sound theoretical basis for testing; (2) devising and improving testing methodologies, particularly mechanizable ones; (3) the definition of accurate measures of and criteria for test data adequacy. Almost all of the research on software testing therefore focuses on the development and analysis of input data. In particular there is an underlying assumption that once this phase is complete, the remaining tasks are straightforward. These consist of running the program on the selected data, producing output which is then examined to determine the program's correctness on the test data. The mechanism which checks this correctness is known as an oracle, and the belief that the tester is routinely able to determine whether or not the test output is correct is the oracle assumption.' • 2

582 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Deep learning
79.8K papers, 2.1M citations
81% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Image processing
229.9K papers, 3.5M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023143
2022328
2021728
20201,254
20191,577
20181,401