scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 1977"


Journal ArticleDOI
TL;DR: The reduced omissions in Focused Search were the result of less stringent criteria for reporting the presence of abnormal findings, rather than an enhanced ability to detect abnormalities.
Abstract: Selected difficult chest radiographs were interpreted by 10 radiologists and then reread in a Focused Search condition that directed readers' attention to film regions containing frequently omitted findings. The percentage of true abnormalities reported at any level of confidence increased from 49.3 to 68.3 between the usual and Focused Search conditions. However, the corresponding percentage of significant false positives also increased from 4.6 to 10.6. The separate ROC curves from each condition could be superimposed and the data fit by a single ROC curve. Thus, the reduced omissions in Focused Search were the result of less stringent criteria for reporting the presence of abnormal findings, rather than an enhanced ability to detect abnormalities. There was no evidence that the original omissions were abnormalities simply overlooked in faulty initial searches.

62 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that false negatives affect costs and why false positives are not so important, and the conclusions drawn here are worthy of consideration in the inspection of industrial product, in situations where the final test is relatively very expensive and where a cheap screening test can be contrived.
Abstract: A study will be conducted to estimate the overall proportion of people that are affected with some defined psychopathology. The final determination of the psychiatric and other medical characteristics of a person will be made by a psychiatrist. A plan to use the services of trained interviewers to screen and separate into two classes (with and without apparent psychopathology) a large preliminary sample in order to conserve the time of the psychiatrist, by letting him test mainly cases that are almost surely afflicted with psychopathology, is appealing wherever the cost per case is much lower for the screening than for the psychiatric examination. It is not generally appreciated, however, that the screening-test, to be economical, must be relatively cheap and must admit only a low proportion of false negatives. This principle is not new, but illustrative calculations that show how false negatives affect costs, and why false positives are not so important, are hard to find in the literature (Kish, 1965). Guidance in any problem comes from calculations based on the appropriate theory. The purpose here is to present some theory and a simple illustration encountered in recent practice. The conclusions drawn here will be valid within a moderately wide band of conditions that border on those used here for illustration. Conditions far afield from those studied here might require fresh calculations by use of the appropriate costs and proportions in the equations that follow, or in modifications thereof. The conclusions drawn here are worthy of consideration in the inspection of industrial product, in situations where the final test is relatively very expensive, and where a cheap screening test can be contrived. It is presumed that a demographic screening has already taken place in which a roster is made of each family by age of person. People of age 60 or over can be serialized. These serial numbers constitute the frame. The statistical procedure for screening (sometimes called two-phase sampling) may be described briefly in two steps. Step 1 (1st phase). Screening. Draw from the frame a preliminary sample of N' people. Interview by a cheap test every person in the preliminary sample. Allot each person interviewed to one of two strata: Stratum 1: negative on screening (no psychopathology indicated). Stratum 2: positive on screening (psychopathology indicated). Step 2 (2nd phase). Psychiatric interviews. A psychiatrist interviews samples from both strata. His decisions are final. Some people in Stratum 1, the psychiatrist will find, are pathologic. These are false negatives. Conversely, he will find that some people put into Stratum 2 are in his judgment not pathologic. These are false positives. The final sample for the psychiatrist is drawn partly from Stratum 1 and partly from Stratum 2. The selections from each stratum are made by simple random sampling, one person at a time. Textbooks on statistical procedures describe two main ways to draw for the

58 citations


Journal ArticleDOI
TL;DR: In this article, a psychometric analysis of the criterion problem in neo-Piagetian concept development research is conducted, and it is shown that both forms of error have the same effect on the null hypotheses tested in such studies.
Abstract: BRAINERD, CHARLES J. Response Criteria in Concept Development Research. CHILD DEVELOPMENT, 1977, 48, 360-366. A psychometric analysis of the criterion problem in neo-Piagetian concept development research is conducted. It has previously been supposed that the effects of false negative and false positive criterion errors on the findings of concept development studies are different. But it is shown that, in fact, both forms of error have the same effect on the null hypotheses tested in such studies. It is concluded that the correct solution to the criterion problem is to select the criterion with the lowest error rate, regardless of whether the actual errors being committed are false positives or false negatives. Existing data are consistent with the assumption that error rates for judgments-only criteria are lower than corresponding error rates for judgments-plus-explanations criteria.

49 citations


Journal ArticleDOI
TL;DR: The bone marrow acid phosphatase test is a test of poor specificity and should not be used as the sole test on which vital decisions regarding management of the patient are based.

17 citations


Journal ArticleDOI
TL;DR: Methods for producing estimates of error rates in cervical cell classification are investigated and classification performance curves calculated using these methods are given for several classification schemes used to classify 1500 cervical cells.
Abstract: The performance of a cell recognition system on unknown data is often estimated in terms of its error rates on a test set. This paper investigates methods for producing estimates of error rates in cervical cell classification. Classification performance curves calculated using these methods are given for several classification schemes used to classify 1500 cervical cells.

11 citations


Proceedings ArticleDOI
27 Dec 1977
TL;DR: If the observer is able to formulate a trade-off between false negatives and false positives which would, in his mind, equalize the penalty for being wrong in either case, he can develop a diagnostic strategy that will be reproducible and that will maximize the utility of a test as he sees it.
Abstract: The clinical value of radionuclide images depends on many factors, some controllable and some not. Noncontrollable factors include signal-to-background ratio, contrast gradient, and just-perceptible density difference. Several controllable factors affect the level and proportion of false negative and false positive results: the number of counts in the image, the signal-to-noise ratio, film gamma, contrast, viewing distance, computer and other image manipulation techniques, and selection of the criterion for calling a study abnormal. It is not always true that the more counts an image has, the better it is. For typical clinical situations the number of counts required in an image ranges from 50,000 to 2,000,000, depending on signal-to-noise ratio and film gamma. The clinical value of the images does not necessarily increase as the number of counts increases, the improved resolution and SNR being balanced off by patient motion, field nonuniformity, film nonuniformity, and loss of apparent contrast at high count density. Signal-to-noise ratio varies between two and five in typical clinical situations, and can be improved by better counting statistics. The signal-to-noise ratio is to be distinguished from the signal-to-background ratio, which is not a controllable parameter. Correct viewing distance is an often overlooked parameter. The eye acts as a bandpass filter, perceiving objects with greatest sensitivity when the object occupies between 5 and 10 minutes of visual arc. The correct viewing distance depends on the size of the image being examined, which in the case of Polaroid images varies between 0.1 and 1 cm, corresponding to optimum viewing distances of 30 cm to 3 m. Some organs, such as the liver, may contain filling defects that are smaller than the lower limit of resolution of the imaging system. A technique has been developed in which the observed fluctuation of count rate over the liver is compared with the expected fluctuation (as derived from the number of counts ner unit area). When the observed fluctuation is significantly greater than the expected fluctuation, liver uptake is said to be "nonuniform" despite its appearance to the naked eye. Work is still in progress in this area but initial results have shown a reduction in false negatives in widespread fine metastases and degenerative liver disease in the absence of focal defects as seen on liver scans. Finally, the incidence of false negatives and false positives is affected by the choice of selection criterion by which one decides when to call an image "abnormal." If one knows the approximate distribution of values (such as relative counts per unit area) in normal and abnormal studies, one can maximize the utility of the diagnostic study by selecting an appropriate criterion that is based on the utility of false negative and false positive outcomes. In other words, if the observer is able to formulate a trade-off between false negatives and false positives which would, in his mind, equalize the penalty for being wrong in either case, he can develop a diagnostic strategy that will be reproducible and that will maximize the utility of a test as he sees it. Another observer having a different trade-off can maximize his own utility using a different criterion, and the two observers can estimate what fraction of the normal and abnormal population they would disagree on, based on their separate criteria.© (1977) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

6 citations


Journal ArticleDOI
TL;DR: In western societies, where the prevalence of hepatoma is low, a higher, less sensitive but more specific diagnostic S-AFP concentration is appropriate and is preferable in terms of cost benefit.
Abstract: A rational comparison of different serum concentrations of alpha1-fetoprotein (S-AFP) in the diagnosis of hepatoma must be made. We took data on the sensitivity and specificity of different diagnostic S-AFP concentrations from the literature and evaluated them statistically and by Bayesian analysis. In our patients (hepatoma prevalence 0.028) a sensitive diagnostic concentration (30-50 ng/ml) will misdiagnose hepatoma so often that a positive test will indicate hepatoma in only 10% of cases. A positive test at a specific diagnostic concentration (500 ng/ml) indicates hepatoma in 100% of cases and is preferable in terms of cost benefit. Although the lower concentration will diagnose a larger proportion of patients with hepatoma (74% compared with 59%) the 'costs' of excluding false positives are considerable (A$2545 per extra case with 2.5% of patients suffering significant morbidity). In western societies, where the prevalence of hepatoma is low, a higher, less sensitive but more specific diagnostic S-AFP concentration is appropriate.

5 citations


Journal ArticleDOI
TL;DR: A comparison of the primary data of the true positives, the false positives and all the negatives revealed the necessity of more stringent referral criteria, and it is predicted that the systematic application of these criteria would result in an increase in the rate of true positives.
Abstract: . Data presented in a previous paper pointed to the necessity for improving the overall sensitivity of the psychological screening program. The present report indicates possibilities for such an improvement without changing the screening methods. A comparison of the primary data of the true positives, the false positives and all the negatives (non-referred) revealed the necessity of more stringent referral criteria. It is predicted that the systematic application of these criteria would result in an increase in the rate of true positives form 2.8 to 4.8% of the screened population. In addition a strategy aiming at a reduction of the costs without deteriorating the effectiveness and based on a differential application of the various elements of the screening program is presented.

2 citations


Journal Article
TL;DR: The thrombocytopenia test is a useful exposure test but should be regarded as a supplement to skin tests and other methods of allergy diagnosis.
Abstract: 100 thrombocytopenia tests (TT) from 73 patients with suspected food allergies were critically examined to establish the value of this method compared with case history, skin tests and, where possible, with the RAST for determination of specific IgE antibodies. 85% of the tests most probably, or definitely, provided the correct result. 10% of the results were either false positives or false negatives, and 5% were inconclusive. A flare-up was observed after 18% of the tests. No positive TT for milk, as the most frequent allergen, was observed in a control group of 20 healthy persons. The thrombocytopenia test is a useful exposure test but should be regarded as a supplement to skin tests and other methods of allergy diagnosis.

1 citations