scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2003"


Journal ArticleDOI
TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
Abstract: With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

9,239 citations


Journal ArticleDOI
TL;DR: The Positive False Discovery Rate (pFDR) as mentioned in this paper is a modified version of the false discovery rate (FDR), which is used for exploratory analyses in which one is interested in finding several significant results among many tests.
Abstract: Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive false discovery rate” (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “q-value” is introduced and investigated, which is a natural “Bayesian posterior p-value,” or rather the pFDR analogue of the p-value. 1. Introduction. When testing a single hypothesis, one is usually concerned with controlling the false positive rate while maximizing the probability of detecting an effect when one really exists. In statistical terms, we maximize the power conditional on the Type I error rate being at or below some level. The field of multiple hypothesis testing tries to extend this basic paradigm to the situation where several hypotheses are tested simultaneously. One must define an appropriate compound error measure according to the rate of false positives one is willing to encounter. Then a procedure is developed that allows one to control the error rate at a desired level, while maintaining the power of each test as much as possible. The most commonly controlled quantity when testing multiple hypotheses is the family wise error rate (FWER), which is the probability of yielding one or more false positives out of all hypotheses tested. The most familiar example of this is the Bonferroni method. If there are m hypothesis tests, each test is controlled so that the probability of a false positive is less than or equal to α/m for some chosen value of α. It then follows that the overall FWER is less than or equal to α .M any

1,952 citations


Journal ArticleDOI
TL;DR: It is concluded that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.
Abstract: Association studies offer a potentially powerful approach to identify genetic variants that influence susceptibility to common disease1,2,3,4, but are plagued by the impression that they are not consistently reproducible5,6. In principle, the inconsistency may be due to false positive studies, false negative studies or true variability in association among different populations4,5,6,7,8. The critical question is whether false positives overwhelmingly explain the inconsistency. We analyzed 301 published studies covering 25 different reported associations. There was a large excess of studies replicating the first positive reports, inconsistent with the hypothesis of no true positive associations (P < 10−14). This excess of replications could not be reasonably explained by publication bias and was concentrated among 11 of the 25 associations. For 8 of these 11 associations, pooled analysis of follow-up studies yielded statistically significant replication of the first report, with modest estimated genetic effects. Thus, a sizable fraction (but under half) of reported associations have strong evidence of replication; for these, false negative, underpowered studies probably contribute to inconsistent replication. We conclude that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.

1,928 citations


Journal ArticleDOI
TL;DR: It is found that Bonferroni-related tests offer little improvement over Bonferronsi, while the permutation method offers substantial improvement over the random field method for low smoothness and low degrees of freedom.
Abstract: Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is ...

1,146 citations


Journal ArticleDOI
TL;DR: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities.
Abstract: Motivation: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. Results: Am odel is introduced that frequently describes very accurately the distribution of a set of p-values arising from an array analysis. The model is used to obtain an estimated distribution that is easily expressed as a mixture of null and alternative densities. Given a threshold of significance, the estimated distribution is partitioned into regions corresponding to the occurrences of false positives, false negatives, true positives, and true negatives. Availability: An S-plus function library is available from

443 citations


Proceedings ArticleDOI
Robert O'Callahan1, Jong-Deok Choi1
11 Jun 2003
TL;DR: A formalization of locksetbased and happens-before-based approaches in a common framework is presented, allowing us to prove a "folk theorem" that happens- before detection reports fewer false positives than lockset-based detection (but can report more false negatives), and to prove that two key optimizations are correct.
Abstract: We present a new method for dynamically detecting potential data races in multithreaded programs. Our method improves on the state of the art in accuracy, in usability, and in overhead. We improve accuracy by combining two previously known race detection techniques -- lockset-based detection and happens-before-based detection -- to obtain fewer false positives than lockset-based detection alone. We enhance usability by reporting more information about detected races than any previous dynamic detector. We reduce overhead compared to previous detectors -- particularly for large applications such as Web application servers -- by not relying on happens-before detection alone, by introducing a new optimization to discard redundant information, and by using a "two phase" approach to identify error-prone program points and then focus instrumentation on those points. We justify our claims by presenting the results of applying our tool to a range of Java programs, including the widely-used Web application servers Resin and Apache Tomcat. Our paper also presents a formalization of locksetbased and happens-before-based approaches in a common framework, allowing us to prove a "folk theorem" that happens-before detection reports fewer false positives than lockset-based detection (but can report more false negatives), and to prove that two key optimizations are correct.

442 citations


Journal ArticleDOI
TL;DR: A network-based statistical algorithm is presented that allows for derive functions of unannotated proteins from large-scale interaction data and is able to recover almost all (≈89%) of the original associations.
Abstract: Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (≈89%) of the original associations.

298 citations


Journal ArticleDOI
TL;DR: Some of the many diagnostic efficiency statistics that can be derived from a 2 × 2 table, including the overall correct classification rate, kappa, phi, the odds ratio, positive and negative predictive power and some variants of them, and likelihood ratios, are reviewed.
Abstract: Tests can be used either diagnostically (i.e., to confirm or rule out the presence of a condition in people suspected of having it) or as a screening instrument (determining who in a large group of people has the condition and often when those people are unaware of it or unwilling to admit to it). Tests that may be useful and accurate for diagnosis may actually do more harm than good when used as a screening instrument. The reason is that the proportion of false negatives may be high when the prevalence is high, and the proportion of false positives tends to be high when the prevalence of the condition is low (the usual situation with screening tests). My first aim of this article is to discuss the effects of the base rate, or prevalence, of a disorder on the accuracy of test results. My second aim is to review some of the many diagnostic efficiency statistics that can be derived from a 2 x 2 table, including the overall correct classification rate, kappa, phi, the odds ratio, positive and negative predictive power and some variants of them, and likelihood ratios. In the last part of this article, I review the recent Standards for Reporting of Diagnostic Accuracy guidelines (Bossuyt et al., 2003) for reporting the results of diagnostic tests and extend them to cover the types of tests used by psychologists.

287 citations


01 Jan 2003
TL;DR: This approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage, which is a measure of statistical significance called the q-value associated with each tested feature in addition to the traditional p-value.
Abstract: With the increase in genome-wide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genome-wide data set are tested against some null hypothesis, where many features are expected to be significant. Here we propose an approach to statistical significance in the analysis of genome-wide data sets, based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true findings and the number of false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q-value is associated with each tested feature in addition to the traditional p-value. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

201 citations


Journal ArticleDOI
Per Broberg1
TL;DR: A method for finding an optimal test statistic with which to rank genes with respect to differential expression is outlined, which allows generation of top gene lists that give few false positives and few false negatives.
Abstract: In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method.

122 citations


Journal ArticleDOI
TL;DR: A novel method is presented that is well suited for TFBS profiles, and measures that help in judging profile quality, based on both sensitivity and selectivity of a profile are developed, which can be efficiently computed.
Abstract: Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as position-weight matrices. Past research has focused on the significance of profile scores (the ability to avoid false positives), but this alone is not enough: The profile must also possess the power to detect the true positive signals. Several completed genomes are now available, and the search for TFBSs is moving to a large scale; so discriminating signal from noise becomes even more challenging. Since TFBS profiles are usually estimated from only a few experimentally confirmed instances, careful regularization is an important issue. We present a novel method that is well suited for this situation. We further develop measures that help in judging profile quality, based on both sensitivity and selectivity of a profile. It is shown that these quality measures can be efficiently computed, and we propose statistically well-founded methods to choose score thresholds. Our findings are applied to the TRANSFAC database of transcription factor binding sites. The results are disturbing: If we insist on a significance level of 5% in sequences of length 500, only 19% of the profiles detect a true signal instance with 95% success probability under varying background sequence compositions.

01 Jan 2003
TL;DR: In this article, the authors used a Bayesian model to show how the potential for a false positive affects the evidentiary value of DNA evidence and the sufficiency of the DNA evidence to meet traditional legal standards for conviction.
Abstract: Errors in sample handling or test interpretation may cause false positives in forensic DNA testing. This article uses a Bayesian model to show how the potential for a false positive affects the evidentiary value of DNA evidence and the sufficiency of DNA evidence to meet traditional legal standards for conviction. The Bayesian analysis is contrasted with the "false positive fallacy," an intuitively appealing but erroneous alternative interpretation. The findings show the importance of having accurate information about both the random match probability and the false positive probability when evaluating DNA evidence. It is argued that ignoring or underestimating the potential for a false positive can lead to serious errors of interpretation, particularly when the suspect is identified through a "DNA dragnet" or database search, and that ignorance of the true rate of error creates an important element of uncertainty about the value of DNA evidence.

Proceedings ArticleDOI
11 Jul 2003
TL;DR: Experimental results show that the filtering using a naive Bayes classifier greatly improves precision with slight loss of recall, resulting in a much better F-score.
Abstract: Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine learning based approaches. However, dictionary based approaches have two serious problems: (1) a large number of false recognitions mainly caused by short names. (2) low recall due to spelling variation. In this paper, we tackle the former problem by using a machine learning method to filter out false positives. We also present an approximate string searching method to alleviate the latter problem. Experimental results using the GE-NIA corpus show that the filtering using a naive Bayes classifier greatly improves precision with slight loss of recall, resulting in a much better F-score.

Journal ArticleDOI
TL;DR: In this paper, the authors used a Bayesian model to show how the potential for a false positive affects the evidentiary value of DNA evidence and the sufficiency of the DNA evidence to meet traditional legal standards for conviction.
Abstract: Errors in sample handling or test interpretation may cause false positives in forensic DNA testing. This article uses a Bayesian model to show how the potential for a false positive affects the evidentiary value of DNA evidence and the sufficiency of DNA evidence to meet traditional legal standards for conviction. The Bayesian analysis is contrasted with the "false positive fallacy," an intuitively appealing but erroneous alternative interpretation. The findings show the importance of having accurate information about both the random match probability and the false positive probability when evaluating DNA evidence. It is argued that ignoring or underestimating the potential for a false positive can lead to serious errors of interpretation, particularly when the suspect is identified through a "DNA dragnet" or database search, and that ignorance of the true rate of error creates an important element of uncertainty about the value of DNA evidence.

01 Jan 2003
TL;DR: A novel technique for the detection of QRS complexes in electrocardiographic signals that is based on a feature obtained by counting the number of zero crossings per segment, which provides a computationally efficient solution to the QRS detection problem.
Abstract: Summary There is a novel technique for the detection of QRS complexes in electrocardiographic signals that is based on a feature obtained by counting the number of zero crossings per segment. It is well-known that zero crossing methods are robust against noise and are particularly useful for finite precision arithmetic. The new detection method inherits this robustness and provides a high degree of detection performance even in cases of very noisy electrocardiographic signals. Furthermore, due to the simplicity of detecting and counting zero crossings, the proposed technique provides a computationally efficient solution to the QRS detection problem. The excellent performance of the algorithm is confirmed by a sensitivity of 99.70% (277 false negatives) and a positive predictivity of 99.57% (390 false positives) against the MIT-BIH arrhythmia database.

Journal ArticleDOI
TL;DR: It is concluded unreservedly that indiscriminate coagulation testing is not useful in a surgical or a medical setting, due to the limited sensitivity and specificity of the tests, coupled with the low prevalence of bleeding disorders resulting in a high number of false positives, poor positive predictive value for bleeding and numerous false negatives resulting in false reassurance.
Abstract: Coagulation testing is employed widely prior to open surgery and invasive procedures. This is based on the assumption that such testing is of clinical value in the prediction of bleeding. In order to improve the clinical understanding of the potential limitations of first-line coagulation tests used in this way, we have systematically reviewed the literature that addresses the value of routine coagulation testing in helping to predict bleeding risk. We conclude unreservedly that indiscriminate coagulation testing is not useful in a surgical or a medical setting. This is due to the limited sensitivity and specificity of the tests, coupled with the low prevalence of bleeding disorders resulting in a high number of false positives, poor positive predictive value for bleeding and numerous false negatives resulting in false reassurance. Since most abnormal results can be predicted and most cases of significant bleeding disorder identified from a complete clinical assessment, the employment of selective laboratory testing is more cost-effective and represents evidence-based clinical practice.

Journal ArticleDOI
TL;DR: The description and application of a new, overlap-integral comparison method and the quantification of human vs. human accuracies that can be used as goals for algorithms are described and applied.

Journal ArticleDOI
TL;DR: An algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns by using an anomaly detection algorithm that would characterize each anomalous pattern with a rule.
Abstract: This article presents an, algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features. Thus, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak. Instead, we would like to detect groups with specific characteristics that have a recent pattern of illness that is anomalous relative to historical patterns. We propose using an anomaly detection algorithm that would characterize each anomalous pattern with a rule. The significance of each rule would be carefully evaluated using the Fisher exact test and a randomization test. In this study, we compared our algorithm with a standard detection algorithm by measuring the number of false positives and the timeliness of detection. Simulated data, produced by a simulator that creates the effects of an epidemic on a city, were used for evaluation. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.

Journal ArticleDOI
TL;DR: It is proved that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections, and introduced a computationally efficient form of forward stepwise regression against the FDR methods.
Abstract: It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity.


Journal ArticleDOI
TL;DR: This work considers a framework for detection and judgment of evidence of well-characterized hazards, using the concepts of sensitivity, specificity, positive predictive value, and negative predictive value that are well established for medical diagnosis.
Abstract: Risk management, done well, should be inherently precautionary. Adopting an appropriate degree of precaution with respect to feared health and environmental hazards is fundamental to risk management. The real problem is in deciding how precautionary to be in the face of inevitable uncertainties, demanding that we understand the equally inevitable false positives and false negatives from screening evidence. We consider a framework for detection and judgment of evidence of well-characterized hazards, using the concepts of sensitivity, specificity, positive predictive value, and negative predictive value that are well established for medical diagnosis. Our confidence in predicting the likelihood of a true danger inevitably will be poor for rare hazards because of the predominance of false positives; failing to detect a true danger is less likely because false negatives must be rarer than the danger itself. Because most controversial environmental hazards arise infrequently, this truth poses a dilemma for risk management.

Journal ArticleDOI
TL;DR: The use of receiver operating characteristic methodology is described to evaluate the implications of the different types of regulatory policy for invasive weeds that may be adopted once weed risk assessment is in place.

Patent
01 Jul 2003
TL;DR: In this article, phase-space dissimilarity analysis of data from biomedical equipment, mechanical devices, and other physical processes is used to forewarn of critical events via phase space analysis.
Abstract: This invention teaches further method improvements to forewarn of critical events via phase-space dissimilarity analysis of data from biomedical equipment, mechanical devices, and other physical processes. One improvement involves conversion of time-serial data into equiprobable symbols. A second improvement is a method to maximize the channel-consistent total-true rate of forewarning from a plurality of data channels over multiple data sets from the same patient or process. This total-true rate requires resolution of the forewarning indications into true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) relative to a forewarning window 20. A third improvement is the use of various objective functions, as derived from the phase-space dissimilarity measures, to give the best forewarning indication. A fourth improvement uses various search strategies over the phase­space analysis parameters to maximize said objective functions. A fifth improvement shows the usefulness of the method for various biomedical and machine applications.

Proceedings ArticleDOI
TL;DR: In this paper, it was shown that the maximal number of detections that can be performed in a geometrical search is bounded by the maximum false positive detection probability required by the watermark application.
Abstract: One way of recovering watermarks in geometrically distorted images is by performing a geometrical search. In addition to the computational cost required for this method, this paper considers the more important problem of false positives. The maximal number of detections that can be performed in a geometrical search is bounded by the maximum false positive detection probability required by the watermark application. We show that image and key dependency in the watermark detector leads to different false positive detection probabilities for geometrical searches for different images and keys. Furthermore, the image and key dependency of the tested watermark detector increases the random-image-random-key false positive detection probability, compared to the Bernoulli experiment that was used as a model.

Journal ArticleDOI
TL;DR: Ultrasound (US) is increasingly used for detecting lesions due to cystic and alveolar echinococcosis (CE and AE) and portable US scanners facilitate community based mass screening surveys in remote rural communities.

Journal ArticleDOI
TL;DR: This paper designed several modified false belief tasks to eliminate a confound present in the traditional tasks, which allowed children to answer correctly without reasoning about beliefs, by using a "perceptual access" approach to knowing in which they reason that a person who has not seen the true state of affairs will not know and will act incorrectly.
Abstract: We designed several modified false belief tasks to eliminate a confound present in the traditional tasks. The confound would allow children to answer correctly without reasoning about beliefs, by using a "perceptual access" approach to knowing in which they reason that a person who has not seen the true state of affairs will not know and will act incorrectly. The modified tasks incorporated 3 response alternatives (knowledge of the real state of affairs, the false belief, and an irrelevant or unjustified belief), and a yes-no question asked of each alternative. They included versions of the common Maxi, Smarties(r), representational change, and appearance-reality tasks, plus a new ("plate") task. In 3 studies (N = 164), children at both 5 and 6 years performed substantially worse on modified tasks compared to traditional versions and gave perceptual access responses in addition to belief-based and reality-based responses. These findings call into question the validity of the traditional false belief task ...

Journal ArticleDOI
TL;DR: Repeated nested PCR tests for PSA and appropriate handling of the data allow numeric quantification of the performance of the assay and differentiation between analytical false and true positives at a predefined accuracy.
Abstract: Background: Inappropriate quality management of reverse transcription-PCR (RT-PCR) assays for the detection of blood-borne prostate cancer (PCa) cells hampers clinical conclusions. Improvement of the RT-PCR methodology for prostate-specific antigen (PSA) mRNA should focus on an appropriate numeric definition of the performance of the assay and correction for PSA mRNA that is not associated with PCa cells. Methods and Results: Repeated (RT-)PCR tests for PSA mRNA in single blood specimens from PCa patients and PCa-free controls, performed by four international institutions, showed a large percentage (≈50%) of divergent test results. The best estimates of the mean, λ (SD), of the expected Poisson frequency distributions of the number of positive tests among five replicate assays of samples from PCa-free individuals were 1.0 (0.2) for 2 × 35 PCR cycles and 0.2 (0.1) for 2 × 25 PCR cycles. Assessment of the numeric value of the mean can be considered as a new indicator of the performance of a RT-PCR assay for PSA mRNA under clinical conditions. Moreover, it determines the required number of positive test repetitions to differentiate between true and false positives for circulating prostate cells. At a predefined diagnostic specificity of ≥98%, repeated PCRs with λ of either 1.0 or 0.2 require, respectively, more than three or more than one positive tests to support the conclusion that PSA mRNA-containing cells are present. Conclusions: Repeated nested PCR tests for PSA and appropriate handling of the data allow numeric quantification of the performance of the assay and differentiation between analytical false and true positives at a predefined accuracy. This new approach may contribute to introduction of PSA RT-PCR assays in clinical practice.

Journal Article
TL;DR: A prospective survey of horses with colic referred to a university hospital was undertaken to elaborate on a simple clinical decision support system capable of predicting whether or not horses require surgical intervention.
Abstract: A prospective survey of horses with colic referred to a university hospital was undertaken to elaborate on a simple clinical decision support system capable of predicting whether or not horses require surgical intervention. Cases were classified as requiring surgical intervention or not on the basis of intraoperative findings or necropsy reports. Logistic regression analysis was applied to identify predictors with the strongest association with treatment needed. The classification and regression tree (CART) methodology was used to combine the variables in a simple classification system. The performance of the elaborated algorithms, as diagnostic instruments, was recorded as test sensitivity and specificity. The CART method generated 5 different classification trees with a similar basic structure consisting of: degree of pain, peritoneal fluid colour, and rectal temperature. The tree, constructed at a prevalence of 15% surgical cases, appeared to be the best proposal made by CART. In this classification tree, further discrimination of cases was obtained by including the findings of rectal examination and packed cell volume. When regarded as a test system, the sensitivity and specificity was 52% and 95%, respectively, corresponding to positive and negative predictive values of 68% and 91%. The variables examined in the present study did not provide a safe clinical decision rule. The classification tree constructed at 15% surgical cases was considered feasible, the proportion of horses incorrectly predicted to be without need of immediate surgery (false negatives) was small, whereas the proportion of horses incorrectly predicted to be in need of immediate surgery (false positives) was large. Some of the false positive horses were amenable to surgical treatment, although these cases did not conform to the strict definition of a surgical case. A less rigorous definition of a surgical case than that used in the present study would lower the percentage of false positives.

Journal ArticleDOI
TL;DR: A logistic regression model is proposed to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings and can be used for both informed decision making at the individual level, as well as planning of health services.
Abstract: When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives. By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings. We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36). Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services.

Patent
31 Dec 2003
TL;DR: The combination of multiple independent mammography tests, performed effectively at the same time and co-registered, can produce substantially more reliable detection performance than that of the individual tests as discussed by the authors.
Abstract: X-ray mammography has been the standard for breast cancer screening for three decades, but offers poor statistical reliability; it also requires a radiologist for interpretation, employs ionizing radiation, and is expensive. The combination of multiple independent tests, performed effectively at the same time and co-registered, can produce substantially more reliable detection performance than that of the individual tests. The mufti-sensor approach offers greatly improved reliability for detection of early breast tumors, with few false positives, and also can be designed to support machine decision, thus enabling screening by general practitioners and clinicians; it avoids ionizing radiation, and can ultimately be relatively inexpensive.