scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2000"


Journal ArticleDOI
TL;DR: An automated method to locate and outline blood vessels in images of the ocular fundus that uses local and global vessel features cooperatively to segment the vessel network is described.
Abstract: Describes an automated method to locate and outline blood vessels in images of the ocular fundus. Such a tool should prove useful to eye care specialists for purposes of patient screening, treatment evaluation, and clinical study. The authors' method differs from previously known methods in that it uses local and global vessel features cooperatively to segment the vessel network. The authors evaluate their method using hand-labeled ground truth segmentations of 20 images. A plot of the operating characteristic shows that the authors' method reduces false positives by as much as 15 times over basic thresholding of a matched filter response (MFR), at up to a 75% true positive rate. For a baseline, they also compared the ground truth against a second hand-labeling, yielding a 90% true positive and a 4% false positive detection rate, on average. These numbers suggest there is still room for a 15% true positive rate improvement, with the same false positive rate, over the authors' method. They are making all their images and hand labelings publicly available for interested researchers to use in evaluating related methods.

2,206 citations


Journal ArticleDOI
17 Jun 2000-BMJ
TL;DR: Women are aware of false positives and seem to view them as an acceptable consequence of screening mammography, in contrast, most women are unaware that screening can detect cancers that may never progress but feel that such information would be relevant.
Abstract: Objective: To determine women9s attitudes to and knowledge of both false positive mammography results and the detection of ductal carcinoma in situ after screening mammograph. Design: Cross sectional survey. Setting: United States. Participants: 479 women aged 18–97 years who did not report a history of breast cancer. Main outcome measures: Attitudes to and knowledge of false positive results and the detection of ductal carcinoma in situ after screening mammography. Results: Women were aware that false positive results do occur. Their median estimate of the false positive rate for 10 years of annual screening was 20% (25th percentile estimate, 10%; 75th percentile estimate, 45%). The women were highly tolerant of false positives: 63% thought that 500 or more false positives per life saved was reasonable and 37% would tolerate 10 000 or more. Women who had had a false positive result (n=76) expressed the same high tolerance: 39% would tolerate 10 000 or more false positives. 62% of women did not want to take false positive results into account when deciding about screening. Only 8% of women thought that mammography could harm a woman without breast cancer, and 94% doubted the possibility of non-progressive breast cancers. Few had heard about ductal carcinoma in situ, a cancer that may not progress, but when informed, 60% of women wanted to take into account the possibility of it being detected when deciding about screening. Conclusions: Women are aware of false positives and seem to view them as an acceptable consequence of screening mammography. In contrast, most women are unaware that screening can detect cancers that may never progress but feel that such information would be relevant. Education should perhaps focus less on false positives and more on the less familiar outcome of detection of ductal carcinoma in situ.

216 citations


Journal ArticleDOI
TL;DR: A new algorithm for the automatic clustering of protein sequence datasets has been developed that represents all similarity relationships within the dataset in a binary matrix and can hence quickly and accurately cluster large protein datasets into families.
Abstract: Motivation: Efficient, accurate and automatic clustering of large protein sequence datasets, such as complete proteomes, into families, according to sequence similarity. Detection and correction of false positive and negative relationships with subsequent detection and resolution of multi-domain proteins. Results: A new algorithm for the automatic clustering of protein sequence datasets has been developed. This algorithm represents all similarity relationships within the dataset in a binary matrix. Removal of false positives is achieved through subsequent symmetrification of the matrix using a Smith‐Waterman dynamic programming alignment algorithm. Detection of multi-domain protein families and further false positive relationships within the symmetrical matrix is achieved through iterative processing of matrix elements with successive rounds of Smith‐Waterman dynamic programming alignments. Recursive single-linkage clustering of the corrected matrix allows efficient and accurate family representation for each protein in the dataset. Initial clusters containing multi-domain families, are split into their constituent clusters using the information obtained by the multidomain detection step. This algorithm can hence quickly and accurately cluster large protein datasets into families. Problems due to the presence of multi-domain proteins are minimized, allowing more precise clustering information to be obtained automatically. Availability: GeneRAGE (version 1.0) executable binaries for most platforms may be obtained from the authors on request. The system is available to academic users free of charge under license.

196 citations


Journal ArticleDOI
TL;DR: The search strategy using the Mesh term "sensitivity and specificity" (exploded) and the text words "specificity," "false negative," and "accuracy" has both higher sensitivity and specificity than the previously published strategy.

169 citations


Journal ArticleDOI
TL;DR: Four simple strategies are identified that can assist in determining whether a protein is likely to have been selected in a two-hybrid screen because of indirect metabolic effects, including altered growth rate and cell permeability, that bias perceived activity of LacZ reporters.
Abstract: While many novel associations predicted by two-hybrid library screens reflect actual biological associations of two proteins in vivo, at times the functional co-relevance of two proteins scored as interacting in the two-hybrid system is unlikely. The reason for this positive score remains obscure, which leads to designating such clones as false positives. After investigating the effect of overexpressing a series of putative false positives in yeast, we determined that expression of some of these clones induces an array of biological effects in yeast, including altered growth rate and cell permeability, that bias perceived activity of LacZ reporters. Based on these observations, we identify four simple strategies that can assist in determining whether a protein is likely to have been selected in a two-hybrid screen because of indirect metabolic effects.

87 citations


01 Jan 2000
TL;DR: This work automatically construct multiple neural network classifiers which can detect unknown Win32 viruses, following a technique described in previous work on boot virus heuristics, by combining the individual classifier outputs using a voting procedure.
Abstract: Heuristic classifiers which distinguish between uninfected and infected members of some class of program objects have usually been constructed by hand. We automatically construct multiple neural network classifiers which can detect unknown Win32 viruses, following a technique described in previous work (Kephart et al, 1995) on boot virus heuristics. These individual classifiers have a false positive rate too high for real-world deployment. We find that, by combining the individual classifier outputs using a voting procedure, the risk of false positives is reduced to an arbitrarily low level, with only a slight increase in the false negative rate. Regular heuristics retraining on updated sets of exemplars (both infected and uninfected) is practical if the false positive rate is low enough.

84 citations


Journal ArticleDOI
TL;DR: A simple analytical criterion is provided for deciding whether a human or automation is best for a failure detection task, based on expected-value decision theory in much the same way as is signal detection.
Abstract: A simple analytical criterion is provided for deciding whether a human or automation is best for a failure detection task. The method is based on expected-value decision theory in much the same way as is signal detection. It requires specification of the probabilities of misses (false negatives) and false alarms (false positives) for both human and automation being considered, as well as factors independent of the choice--namely, costs and benefits of incorrect and correct decisions as well as the prior probability of failure. The method can also serve as a basis for comparing different modes of automation. Some limiting cases of application are discussed, as are some decision criteria other than expected value. Actual or potential applications include the design and evaluation of any system in which either humans or automation are being considered.

53 citations


Journal ArticleDOI
TL;DR: Analysis of the distribution of proximal false positives indicated that the splice signals used by the algorithms are not strong enough to discriminate particularly those false predictions that occur within +/- 25 nt around the real sites.
Abstract: The performance of computational tools that can predict human splice sites are reviewed using a test set of EST-confirmed splice sites. The programs (namely HMMgene, NetGene2, HSPL, NNSPLICE, SpliceView and GeneID-3) differ from one another in the degree of discriminatory information used for prediction. The results indicate that, as expected, HMMgene and NetGene2 (which use global as well as local coding information and splice signals) followed by HSPL (which uses local coding information and splice signals) performed better than the other three programs (which use only splice signals). For the former three programs, one in every three false positive splice sites was predicted in the vicinity of true splice sites while only one in every 12 was expected to occur in such a region by chance. The persistence of this observation for programs (namely FEXH, GRAIL2, MZEF, GeneID-3, HMMgene and GENSCAN) that can predict all the potential exons (including optimal and sub-optimal) was assessed. In a high proportion (>50%) of the partially correct predicted exons, the incorrect exon ends were located in the vicinity of the real splice sites. Analysis of the distribution of proximal false positives indicated that the splice signals used by the algorithms are not strong enough to discriminate particularly those false predictions that occur within ± 25 nt around the real sites. It is therefore suggested that specialised statistics that can discriminate real splice sites from proximal false positives be incorporated in gene prediction programs.

48 citations


Journal ArticleDOI
TL;DR: In the paper, a heuristic algorithm based on tabu search is proposed that has low complexity and high accuracy for both types of errors: false negatives and false positives.

47 citations


Journal ArticleDOI
TL;DR: This work proposes, within the Diagnosis Model of garden-path processing, that reanalysis triggered by a Case mismatch guides the parser more effectively toward the correct structure, and explains why Case and number features differ in these two ways in their effects on sentence processing.
Abstract: Meng and Bader have presented evidence that a Case conflict is a more effective cue for garden-path reanalysis than a number conflict is, for German wh-sentences with subject–object ambiguities. The preferred first-pass analysis has the wh-trace in subject position, although object position is correct. In a speeded grammaticality judgment task, perceivers accepted Case-disambiguated examples more often and more rapidly than number-disambiguated examples, although comprehension questions indicated that both were eventually understood correctly. For ungrammatical sentences, a Case mismatch error resulted in more false positive grammaticality judgments than a number mismatch error. We offer an explanation for why Case and number features differ in these two ways in their effects on sentence processing. We propose, within the Diagnosis Model of garden-path processing, that reanalysis triggered by a Case mismatch guides the parser more effectively toward the correct structure. Case is a positive symptom, which carries information about the new structure that must be built. By contrast, a number mismatch is a negative symptom; it invalidates the incorrect structure without showing how to rebuild it. This difference in the transparency of garden-path repair can also account for the greater overacceptance of Case-disambiguated ungrammatical sentences. The speeded grammaticality judgment task is designed to encourage hasty responses. Usually, these are hasty rejections of garden path sentences that, on calmer reflection, the parser would find acceptable. Conversely, over-hasty acceptance could occur if some initial progress is made in resolving a grammatical problem. Thus, a higher rate of false positives on ungrammaticals is to be expected where reanalysis proceeds successfully for a while before blocking.

45 citations


Journal ArticleDOI
TL;DR: Immunoscintigraphy with 99mTc-Fab' fragments in combination with TEE improves diagnostic accuracy compared with TTE/TEE in patients with subacute infective endocarditis.

Klaus Julisch1
01 Jan 2000
TL;DR: This talk will present a hybrid approach to building filters that autonomously remove false positives to relieve the security personnel and examine the ability of these filters to capture the alarm patterns that a given view reveals.
Abstract: Many of today's Intrusion Detection Systems (IDSs) suffer from high rates of false positives [1]. False positives are a severe problem because investigating them takes time and energy. Even worse, if the load of false positives is too high, security personnel might become negligent and start to ignore alarms. Improving this situation is difficult [1, 2]. One possible solution is to use highly specialized IDSs [3, 4] that excel at detecting one narrow class of intrusions and rely upon other IDSs for detecting what is out of their scope. Even though this is probably the way to go in the long run, for the time being, the necessary IDSs as well as the infrastructure to manage them are still in their infancy. Therefore, it has been suggested to build filters that autonomously remove false positives to relieve the security personnel [5]. This is the approach I will discuss in my talk. Note that a filter can be considered a second-level IDS. Accordingly, there are two fundamental ways to build a filter: Either one uses knowledge about how to detect noteworthy alarms (knowledge-based approach), or one models the normal alarm behavior and flags everything that stands out from the norm (behavior-based approach). S. Manganaris et al. have used the second approach to build a filter [5]. I will present a hybrid approach. The rationale for a hybrid approach stems from first experiments I have conducted on nearly 40 MB of NetRanger [6] alarm data collected from five different sensors over a period of ten days. As a general rule, I observed that the five most frequent alarms account for approximately 95% of all the alarms a given NetRanger sensor triggers. Modeling such alarm behavior is difficult because all the important characteristics of normal alarm behavior get lost in a flood of highly dominant and repetitive alarms. Thus, it is unlikely that the behavior-based approach to filtering is adequate for this kind of alarm streams. To solve this problem, I apply a knowledge-based pre-filter that handles the most frequent alarms. As NetRanger alarms contain the context in which they were triggered, it is frequently possible to exploit this context to identify false alarms. Furthermore, external information such as the system administrator's knowledge of the network can provide helpful guidance for building filters. Nevertheless, building a knowledge-based filter is a labor-intensive knowledgeengineering task. Fortunately, this only has to be done for the most frequent alarms. Indeed, using a knowledge-based pre-filter results in a much smaller alarm stream, which is suited for post-processing by a behavior-based filter. To make this filter as effective as possible, it proved to be advantageous not to model the alarm stream per se. Instead, different views of the alarm stream are modeled separately. A view rearranges the alarm stream in order to emphasize one particular aspect, such as which alarms were triggered for which IP connection. With respect to modeling, I currently investigate several models in terms of their ability to capture the alarm patterns that a given view reveals. Frequent episode rules [7] turned out to be a very versatile model, but other models are also under investigation. Specifically, my talk will start off by giving examples of knowledge-based filters. Next, the views and models will be discussed that proved to be most effective for building behavior-based filters. Thereupon, it will be address how well hybrid filters work for NetRanger and other IDSs. The talk will be concluded by a discussion of the extent to which filters are site-specific and how frequently they have to be adjusted to changes in the computing environment.

Journal ArticleDOI
TL;DR: One particular approach is given to partition such a heterogeneous group into internally more homogeneous subgroups, using Kendall's coefficient of concordance W, and such group partition and "purification" will help subsequent inferential methods to deal more efficiently with false positives.

Journal ArticleDOI
TL;DR: The method allows efficient identification of the true signals in a genome scan, uses the smallest possible sample sizes, saves the excess to confirm those findings, controls both types of error, and provides one elegant solution to the debate over the best way to balance between false positives and negatives in genome scans.
Abstract: Inflation of type I error occurs when conducting a large number of statistical tests in genome-wide linkage scans. Stringent α-levels protect against the high numbers of expected false positives but at the cost of more false negatives. A more balanced tradeoff is provided by the theory of sequential analysis, which can be used in a genome scan even when the data are collected using a fixed-sample design. Sequential tests allow complete, simultaneous control of both the type I and II errors of each individual test while using the smallest possible sample size for analysis. For fixed samples, the excess N “saved” can be used in a confirmatory, replication phase of the original findings. Using the theory of sequential multiple decision procedures [Bechhoffer et al., 1968], we can replace the series of individual marker tests with a new single, simultaneous genome-wide test that has multiple possible outcomes and partitions all markers into two subsets: the “signal” versus the “noise,” with an a priori specifiable genome-wide error rate. These tests are demonstrated for the Haseman-Elston approach, are applied to real data, and are contrasted with traditional fixed-sampling tests in Monte Carlo simulations of repeated genome-wide scans. The method allows efficient identification of the true signals in a genome scan, uses the smallest possible sample sizes, saves the excess to confirm those findings, controls both types of error, and provides one elegant solution to the debate over the best way to balance between false positives and negatives in genome scans. Genet. Epidemiol. 19:301–322, 2000. © 2000 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: A screening protocol based on the sequential use of the cotransformation approach followed by the genetic method for verifying true two-hybrid interactions is reported, which has screened a cDNA library and been able to isolate true positives from the yeast two- Hybrid screen.

Proceedings ArticleDOI
06 Jun 2000
TL;DR: The result indicates that the new method is effective in reducing false positives due to normal anatomic structures, and thus can improve the performance of the CAD scheme for detection of pulmonary nodules in chest radiographs.
Abstract: We have developed a novel method called local contralateral subtraction for reduction of false positives reported by a computer-aided diagnosis (CAD) scheme for detection of lung nodules in chest radiographs. Our method is based on the removal of normal structures in the regions of interest (ROIs), based on symmetry between the left and right lungs. In our method, two ROIs were extracted, one from the position where a candidate of a nodule is located, and the other from the anatomically corresponding location in the opposite lung, which contains similar normal structures. We employed a wavelet-based multiresolution image registration method to match the two ROIs, and subtraction was performed. A signal- to-noise ratio (SNR) between a central region and the adjacent background region was calculated for quantification of the remaining structures in the subtracted ROI. The SNR was then used for distinction between nodules and false positives. In an analysis of 550 ROIs consisting of 51 nodules and 499 false positives reported as detected nodules by our current CAD scheme, we were able to eliminate 44% of false positives with loss of only one nodule with this new method. This result indicates that our new method is effective in reducing false positives due to normal anatomic structures, and thus can improve the performance of our CAD scheme for detection of pulmonary nodules in chest radiographs.

Journal ArticleDOI
TL;DR: In a sequential SP-based test, the pass/fail cutoff score of the screening test should be stringent, to considerably reduce testing time, while keeping the percentage of false positives at an acceptably low level.
Abstract: Purpose Educators who use standardized-patient-based (SP-based) tests may save resources by using sequential testing. In this approach, students take a short screening test; only those who fail take a second test. This study investigated whether sequential testing increases efficiency with only a minor decrease of validity. Method In 1994-95, first- through fourth-year (Group 1) and sixth-year (Group 2) medical students at the University of Maastricht took SP-based tests. Each test took two days. In a simulation experiment based on the data from those tests, the authors considered the first day as the screening test and the second day as the second test. They investigated efficiency and validity as a function of the cutoff score of the screening test. They developed and evaluated a new method to determine the optimum cutoff score of the screening test, a method based on minimization of the loss represented by the (weighted) numbers of false positives and negatives in the screening test. Results The negative predictive value (probability that a student would fail the complete test if he or she had failed the screening test) was low ( 96%). Accordingly, stringent pass/fail cutoff scores in the screening test (75% for Group 1 and 80% for Group 2) produced optimum results. Using those cutoff values, only 26% and 11% of 'the students would have had to take the complete test to get a "true" score, while only 0.2% and 0.0% of the students who passed the screening test went on to fail the complete test (false positives). Conclusions In a sequential SP-based test, the pass/fail cutoff score of the screening test should be stringent. This can considerably reduce testing time (30% to 40%), while keeping the percentage of false positives at an acceptably low level of less than 0.2%. As an alternative to receiver operator characteristic analysis, minimization of the loss function was found to be an appropriate method to determine the optimum cutoff value of the screening test.

Journal ArticleDOI
01 Jul 2000
TL;DR: A simple analytical criterion is provided for deciding whether a human or automation is best for a failure detection task, based on expected value decision theory in much the same way as is signal detection.
Abstract: A simple analytical criterion is provided for deciding whether a human or automation is best for a failure detection task. The method is based on expected value decision theory in much the same way as is signal detection. It requires specification of the probabilities of misses (false negatives) and false alarms (false positives) for both human and automation being considered, as well as factors independent of the choice, namely costs and benefits of incorrect and correct decisions as well as the prior probability of failure. The method can also serve as a basis for comparing different modes of automation. Some limiting cases of application are discussed as well as some decision criteria other than expected value.

Journal ArticleDOI
TL;DR: The trade off between exposure to undiagnosed diabetes and false positive results is quantified to inform the debate about screening for type 2 diabetes and the balance is dependent on characteristics of the disease and the screening programme.
Abstract: Objectives—The aims of this study were to quantify the proportion of people diagnosed as having type 2 diabetes by standard 75 g oral glucose tolerance test, in a hypothetical screening programme, who would actually be false positives (false positive percentage), and the eVect on the false positive percentage of varying the time between repeat screens. We also calculated the duration in person years of exposure to undiagnosed disease in the population for each screening interval. Setting—Ely, Cambridgeshire, UK. Methods—We used the glucose tolerance data from 965 participants of the Ely Diabetes Project, who were tested 4.5 years apart, to calculate the population’s between and within person variance for 2 hour plasma glucose, and constructed a probability matrix of observed v true glucose tolerance categories.The progression of the population between glucose tolerance categories was modelled assuming exponential times to progression. Results—After one year, 47.5% of test positives were disease free: almost half of those labelled with diabetes would not have the disease. Fo ra5y earinterval, the false positive percentage was 27.6%, but the population would have been exposed to undiagnosed diabetes for 144 person years. Conclusions—Screening can be associated with both benefit and harm; the balance is dependent on characteristics of the disease and the screening programme. This study has quantified the trade oV between exposure to undiagnosed diabetes and false positive results to inform the debate about screening for type 2 diabetes. (J Med Screen 2000;7:91‐96)

Patent
Thomas Mcgee1, Nevenka Dimitrova1
15 Dec 2000
TL;DR: In this paper, a video indexing method and device for selecting keyframes from each detected scene in the video is presented, which determines whether a scene change has occurred between two frames of video or whether the change between the two frames is merely a uniform change in luminance values.
Abstract: A video indexing method and device for selecting keyframes from each detected scene in the video. The method and device determines whether a scene change has occurred between two frames of video or whether the change between the two frames is merely a uniform change in luminance values.


01 Jan 2000
TL;DR: In this paper, a new NAA test for disease due to active cytomegalovirus (CMV) was proposed and compared to an insensitive, older test, for which the test statistics are being determined.
Abstract: Nucleic acid testing has arrived in the diagnostic microbiology laboratory, and it has brought along new questions about the statistical evaluation of tests. Few microbiologists are fond of statistics, but we should pay close attention to the use of statistics in the evaluation of new tests: patient care depends on it. Many of the molecular diagnostic tests used in microbiology include amplification of bacterial or viral nucleic acids. Tests such as PCR and ligase chain reaction depend on amplification of nucleic acid before the detection stage of the test. Nucleic acid amplification (NAA) tests have become common for Mycobacterium tuberculosis, Neisseria gonorrhea, Chlamydia trachomatis, and human immunodeficiency virus (HIV) (6‐8, 14). The signal amplification of PCR is extraordinarily efficient, so that even a single organism may be detected, at least in theory. Moreover, because nucleic acid is detected, replication of the bacteria or virus is not needed. Even dead bugs can be detected. These are strong reasons for thinking that NAA tests may be more sensitive than conventional methods, particularly for detection of bacteria or viruses that are difficult to grow. The great sensitivity of NAA tests may increase the risk of false-positive results (15). The difficulty in evaluating the new tests arises from this quandary: how can a new test, expected to be highly sensitive, be compared to an insensitive, older test? Specifically, what can be done when samples are negative by an insensitive culture method but positive by an NAA test? Many investigators have chosen to perform further testing specifically on this puzzling group of samples; this practice is known as discrepant analysis (4). Let’s take a hypothetical example. Suppose that a new NAA test for disease due to active cytomegalovirus (CMV) is to be evaluated (in this article the “new test” is a test under evaluation, for which the test statistics are being determined). Culture of CMV on cell lines is used as the “gold standard” (the test against which the new test is measured). The results for 1,000 samples tested are given in Fig. 1A. The sensitivity of the new test is equal to the true positives (TP) divided by the sum of the TP and the false negatives (FN) (12), as in the example: sensitivity 5 TP/(TP 1 FN) 5 155/(155 1 15) 5 91.2%. The specificity of the new test is equal to the true negatives (TN) divided by the sum of the TN and the false positives (FP) (12), as in the example, specificity 5 TN/(TN 1 FP) 5 790/(790 1 40) 5 95.2%. If the specificity of the gold standard test is thought to be excellent (near 100%), the investigators would conclude that the discrepant results in which the NAA test was negative but culture was positive were indeed false negatives for the NAA test. These discrepant results would be accepted, and no further analysis would be done on the samples. While discrepant analysis could include further testing on the culture-positive, NAA-negative samples with a third test, this seems to be uncommon in microbiology (11, 13). The more problematic discrepant results are the 40 samples in which the NAA test is positive but the gold standard test is negative. If the investigators believe that the NAA test is more sensitive than the old test, they might do an additional test on these discrepant samples. Suppose that a CMV antigen assay is done using these 40 samples and that the antigen test is positive for CMV in 38 of the 40 retested samples. Using the results of antigen assay to create a new “polished” gold standard, the authors would then analyze the data as shown in Fig. 1B. The sensitivity of the NAA test would now be 92.8% (a gain of 1.6%), and the specificity would be 99.7% (a gain of 4.5%). Is this a reasonable approach? To answer this question, consider what would have happened if a ridiculous test were used to resolve the 40 discrepant results. If a fair coin were tossed to resolve each of the 40 problematic results, 20 of the discrepant results would become “true” positives and 20 would remain “true” negatives. The apparent sensitivity and specificity of the new test would become 92.1 and 97.5%, respectively (improving by 0.9 and 2.3%). In fact, any test used to resolve the 40 discrepant results can only improve or leave unchanged