scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2008"


Journal ArticleDOI
TL;DR: A simple correction for multiple testing is demonstrated, which can easily be calculated from the pairwise correlation and gives far more realistic estimates for the effective number of tests than previous formulae.
Abstract: The interpretation of the results of large association studies encompassing much or all of the human genome faces the fundamental statistical problem that a correspondingly large number of single nucleotide polymorphisms markers will be spuriously flagged as significant. A common method of dealing with these false positives is to raise the significance level for the individual tests for association of each marker. Any such adjustment for multiple testing is ultimately based on a more or less precise estimate for the actual overall type I error probability. We estimate this probability for association tests for correlated markers and show that it depends in a nonlinear way on the significance level for the individual tests. This dependence of the effective number of tests is not taken into account by existing multiple-testing corrections, leading to widely overestimated results. We demonstrate a simple correction for multiple testing, which can easily be calculated from the pairwise correlation and gives far more realistic estimates for the effective number of tests than previous formulae. The calculation is considerably faster than with other methods and hence applicable on a genome-wide scale. The efficacy of our method is shown on a constructed example with highly correlated markers as well as on real data sets, including a full genome scan where a conservative estimate only 8% above the permutation estimate is obtained in about 1% of computation time. As the calculation is based on pairwise correlations between markers, it can be performed at the stage of study design using public databases.

253 citations


Journal ArticleDOI
TL;DR: A package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks and are more reliable estimators of confidence than a global Poisson p-value are presented.
Abstract: High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation. Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds. The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/ .

237 citations


Journal ArticleDOI
TL;DR: It is shown that Bloom's analysis is incorrect and give a correct analysis of randomized data structure for membership queries dating back to 1970.

184 citations


Journal ArticleDOI
TL;DR: The preliminary results indicate that while LIBS is able to discriminate biomaterials with similar elemental compositions at standoff distances based on differences in key intensity ratios, further work is needed to reduce the number of false positives/negatives by refining the PLS-DA model to include a sufficient range of material classes and carefully selecting a detection threshold.
Abstract: Laser-induced breakdown spectroscopy (LIBS) is a promising technique for real-time chemical and biological warfare agent detection in the field. We have demonstrated the detection and discrimination of the biological warfare agent surrogates Bacillus subtilis (BG) (2% false negatives, 0% false positives) and ovalbumin (0% false negatives, 1% false positives) at 20 meters using standoff laser-induced breakdown spectroscopy (ST-LIBS) and linear correlation. Unknown interferent samples (not included in the model), samples on different substrates, and mixtures of BG and Arizona road dust have been classified with reasonable success using partial least squares discriminant analysis (PLS-DA). A few of the samples tested such as the soot (not included in the model) and the 25% BG:75% dust mixture resulted in a significant number of false positives or false negatives, respectively. Our preliminary results indicate that while LIBS is able to discriminate biomaterials with similar elemental compositions at standoff distances based on differences in key intensity ratios, further work is needed to reduce the number of false positives/negatives by refining the PLS-DA model to include a sufficient range of material classes and carefully selecting a detection threshold. In addition, we have demonstrated that LIBS can distinguish five different organophosphate nerve agent simulants at 20 meters, despite their similar stoichiometric formulas. Finally, a combined PLS-DA model for chemical, biological, and explosives detection using a single ST-LIBS sensor has been developed in order to demonstrate the potential of standoff LIBS for universal hazardous materials detection.

150 citations


Proceedings ArticleDOI
24 Jun 2008
TL;DR: This paper presents a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults, and uses training inputs to create likely invariants based on value ranges of selected program variables and then uses them to identify faults at runtime.
Abstract: In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to silent data corruptions(SDCs). In this paper, we present a system that uses invariants to improve the coverage and latency of existing detection techniques for permanent faults. The basic idea is to use training inputs to create likely invariants based on value ranges of selected program variables and then use them to identify faults at runtime. Likely invariants, however, can have false positives which makes them challenging to use for permanent faults. We use our on-line diagnosis framework for detecting false positives at runtime and limit the number of false positives to keep the associated overhead minimal. Experimental results using microarchitecture level fault injections in full-system simulation show 28.6% reduction in the number of undetected faults and 74.2% reduction in the number of SDCs over existing techniques, with reasonable overhead for checking code.

142 citations


Journal ArticleDOI
TL;DR: The impact of this method, especially for heterogeneous human populations, is to reduce the false-positive rate, inflate other spuriously small p values, and have little impact on the p values associated with true positive loci.
Abstract: Resources being amassed for genome-wide association (GWA) studies include "control databases" genotyped with a large-scale SNP array. How to use these databases effectively is an open question. We develop a method to match, by genetic ancestry, controls to affected individuals (cases). The impact of this method, especially for heterogeneous human populations, is to reduce the false-positive rate, inflate other spuriously small p values, and have little impact on the p values associated with true positive loci. Thus, it highlights true positives by downplaying false positives. We perform a GWA by matching Americans with type 1 diabetes (T1D) to controls from Germany. Despite the complex study design, these analyses identify numerous loci known to confer risk for T1D.

135 citations


Journal ArticleDOI
TL;DR: The observed low rates of positives provide empirical evidence that the type I error rate is well controlled by current commonly used correction procedures in imaging genetics, at least in the context of the imaging paradigms the authors have used.

122 citations


Journal ArticleDOI
TL;DR: Bayesian logic and empirical data suggested that association studies in complex disease should involve at least 2000 cases and 2000 controls, at which level they predicted that p values of less than 5×10 −7 would more commonly signify true positives than false positives.
Abstract: Genome-wide association studies involve several hundred thousand markers and, even when quality control is scrupulous, are invariably confounded by residual uncorrected errors that can falsely inflate the apparent difference between cases and controls (so-called genomic inflation). As a consequence such studies inevitably generate false positives alongside genuine associations. By use of Bayesian logic and empirical data, the Wellcome Trust Case Control Consortium suggested that association studies in complex disease should involve at least 2000 cases and 2000 controls, at which level they predicted that p values of less than 5×10 −7 would more commonly signify true positives than false positives.

106 citations


Journal ArticleDOI
TL;DR: Results show that selection can be detected in vivo with high specificity using the new method proposed here, allowing greater insight into the existence and direction of antigen-driven selection.
Abstract: Statistical methods based on the relative frequency of replacement mutations in B lymphocyte Ig V region sequences have been widely used to detect the forces of selection that shape the B cell repertoire. However, current methods produce an unexpectedly high frequency of false positives and are sensitive to intrinsic biases of somatic hypermutation that can give the appearance of selection. The new statistical test proposed here provides a better trade-off between sensitivity and specificity compared with previous approaches. The low specificity of existing methods was shown in silico to result from an interaction between the effects of positive and negative selection. False detection of positive selection was confirmed in vivo through a re-analysis of published sequence data from diffuse large B cell lymphomas, highlighting the need for re-analysis of some existing studies. The sensitivity of the proposed method to detect selection was validated using new Ig transgenic mouse models in which positive selection was expected to be a significant force, as well as with a simulationbased approach. Previous concerns that intrinsic biases of somatic hypermutation could give the appearance of selection were addressed by extending the current mutation models to more fully account for the impact of microsequence on relative mutability and to include transition bias. High specificity was confirmed using a large set of non-productively rearranged Ig sequences. These results show that selection can be detected in vivo with high specificity using the new method proposed here, allowing greater insight into the existence and direction of antigen-driven selection.

88 citations


Journal ArticleDOI
TL;DR: An application-independent framework for accurately identifying compromised sensor nodes is proposed and an alert reasoning algorithms to identify compromised nodes are developed that are optimal in the sense that it identifies the largest number of compromised nodes without introducing false positives.
Abstract: Sensor networks are often subject to physical attacks. Once a node's cryptographic key is compromised, an attacker may completely impersonate it and introduce arbitrary false information into the network. Basic cryptographic mechanisms are often not effective in this situation. Most techniques to address this problem focus on detecting and tolerating false information introduced by compromised nodes. They cannot pinpoint exactly where the false information is introduced and who is responsible for it.In this article, we propose an application-independent framework for accurately identifying compromised sensor nodes. The framework provides an appropriate abstraction of application-specific detection mechanisms and models the unique properties of sensor networks. Based on the framework, we develop alert reasoning algorithms to identify compromised nodes. The algorithm assumes that compromised nodes may collude at will. We show that our algorithm is optimal in the sense that it identifies the largest number of compromised nodes without introducing false positives. We evaluate the effectiveness of the designed algorithm through comprehensive experiments.

85 citations


Journal ArticleDOI
TL;DR: The authors advocate investigators to seek information on the measurement process and request all observed data from laboratories (including the data below the threshold) to determine appropriate treatment of those data.
Abstract: Summary Epidemiological investigations of health effects related to chronic low-level exposures or other circumstances often face the difficult task of dealing with levels of biomarkers that are hard to detect and/or quantify. In these cases instrumentation may not adequately measure biomarker levels. Reasons include a failure of instruments to detect levels below a certain value or, alternatively, interference by error or ‘noise’. Current laboratory practice determines a ‘limit of detection (LOD)’, or some other detection threshold, as a function of the distribution of instrument ‘noise’. Although measurements are produced above and below this threshold in many circumstances, rather than numerical data, all points observed below this threshold may be reported as ‘not detected’. The focus of this process of determination of the LOD is instrument noise and avoiding false positives. Moreover, uncertainty is assumed to apply only to the lowest values, which are treated differently from above-threshold values, thereby potentially creating a false dichotomy. In this paper we discuss the application of thresholds to measurement of biomarkers and illustrate how conventional approaches, though appropriate for certain settings, may fail epidemiological investigations. Rather than automated procedures that subject observed data to a standard threshold, the authors advocate investigators to seek information on the measurement process and request all observed data from laboratories (including the data below the threshold) to determine appropriate treatment of those data.

Journal ArticleDOI
TL;DR: A newly developed easy‐to‐use software tool enabling quality evaluation by generating composite target‐decoy databases usable with all relevant protein search engines, which enables to reliably determine peptides and proteins of high quality, even for nonexperienced users.
Abstract: One of the major challenges for large scale proteomics research is the quality evaluation of results. Protein identification from complex biological samples or experimental setups is often a manual and subjective task which lacks profound statistical evaluation. This is not feasible for high-throughput proteomic experiments which result in large datasets of thousands of peptides and proteins and their corresponding mass spectra. To improve the quality, reliability and comparability of scientific results, an estimation of the rate of erroneously identified proteins is advisable. Moreover, scientific journals increasingly stipulate that articles containing considerable MS data should be subject to stringent statistical evaluation. We present a newly developed easy-to-use software tool enabling quality evaluation by generating composite target-decoy databases usable with all relevant protein search engines. This tool, when used in conjunction with relevant statistical quality criteria, enables to reliably determine peptides and proteins of high quality, even for nonexperienced users (e.g. laboratory staff, researchers without programming knowledge). Different strategies for building decoy databases are implemented and the resulting databases are characterized and compared. The quality of protein identification in high-throughput proteomics is usually measured by the false positive rate (FPR), but it is shown that the false discovery rate (FDR) delivers a more meaningful, robust and comparable value.

Journal ArticleDOI
TL;DR: An improved classification approach is proposed for automatic oil spill detection in synthetic aperture radar images and an automatic confidence estimator has been developed to allow the user to tune the system with respect to the tradeoff between the number of true positive alarms and thenumber of false positives.
Abstract: An improved classification approach is proposed for automatic oil spill detection in synthetic aperture radar images The performance of statistical classifiers and support vector machines is compared Regularized statistical classifiers prove to perform the best on this problem To allow the user to tune the system with respect to the tradeoff between the number of true positive alarms and the number of false positives, an automatic confidence estimator has been developed Combining the regularized classifier with confidence estimation leads to acceptable performance

Patent
06 Nov 2008
TL;DR: In this article, the authors present a system and methods for detecting anomalies in network traffic based on time-series activity of network traffic and significant changes can then be used to determine the cause and the impact of the detected anomaly on the network traffic.
Abstract: Some embodiments of the present invention provide systems and methods for detecting anomalies in network traffic. Some embodiments detect anomalies based on time-series activity in network traffic. Upon detection of an anomaly, significant changes can be analyzed to identify abnormal changes in network traffic across different network entities. The identified changes can then be used to determine the cause and the impact of the detected anomaly on the network traffic.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a hybrid statistical approach which uses Data Mining and Decision Tree Classification to reduce misclassification of false positives and distinguish between attacks and false positives for the data of KDD Cup 99.
Abstract: Although intelligent intrusion and detection strategies are used to detect any false alarms within the network critical segments of network infrastructures, reducing false positives is still a major challenge. Up to this moment, these strategies focus on either detection or response features, but often lack of having both features together. Without considering those features together, intrusion detection systems probably will not be able to highly detect on low false alarm rates. To offset the abovementioned constraints, this paper proposes a strategy to focus on detection involving statistical analysis of both attack and normal traffics based on the training data of KDD Cup 99. This strategy also includes a hybrid statistical approach which uses Data Mining and Decision Tree Classification. As a result, the statistical analysis can be manipulated to reduce misclassification of false positives and distinguish between attacks and false positives for the data of KDD Cup 99. Therefore, this strategy can be used to evaluate and enhance the capability of the IDS to detect and at the same time to respond to the threats and benign traffic in critical segments of network, application and database infrastructures.

Journal ArticleDOI
TL;DR: Improved assay design and evaluation methods presented herein will expedite adoption of real-time PCR in the clinical lab and produce new signatures that are predicted to have higher sensitivity and specificity.
Abstract: Background: In recent years real-time PCR has become a leading technique for nucleic acid detection and quantification. These assays have the potential to greatly enhance efficiency in the clinical laboratory. Choice of primer and probe sequences is critical for accurate diagnosis in the clinic, yet current primer/probe signature design strategies are limited, and signature evaluation methods are lacking. Methods: We assessed the quality of a signature by predicting the number of true positive, false positive and false negative hits against all available public sequence data. We found real-time PCR signatures described in recent literature and used a BLAST search based approach to collect all hits to the primer-probe combinations that should be amplified by real-time PCR chemistry. We then compared our hits with the sequences in the NCBI taxonomy tree that the signature was designed to detect. Results: We found that many published signatures have high specificity (almost no false positives) but low sensitivity (high false negative rate). Where high sensitivity is needed, we offer a revised methodology for signature design which may designate that multiple signatures are required to detect all sequenced strains. We use this methodology to produce new signatures that are predicted to have higher sensitivity and specificity. Conclusion: We show that current methods for real-time PCR assay design have unacceptably low sensitivities for most clinical applications. Additionally, as new sequence data becomes available, old assays must be reassessed and redesigned. A standard protocol for both generating and assessing the quality of these assays is therefore of great value. Real-time PCR has the capacity to greatly improve clinical diagnostics. The improved assay design and evaluation methods presented herein will expedite adoption of this technique in the clinical lab.

Proceedings ArticleDOI
01 Apr 2008
TL;DR: iTrustPage is an anti-phishing tool that does not rely completely on automation to detect phishing, instead, iTrustPage relies on user input and external repositories of information to prevent users from filling out phishing Web forms.
Abstract: Despite the many solutions proposed by industry and the research community to address phishing attacks, this problem continues to cause enormous damage. Because of our inability to deter phishing attacks, the research community needs to develop new approaches to anti-phishing solutions. Most of today's anti-phishing technologies focus on automatically detecting and preventing phishing attacks. While automation makes anti-phishing tools user-friendly, automation also makes them suffer from false positives, false negatives, and various practical hurdles. As a result, attackers often find simple ways to escape automatic detection.This paper presents iTrustPage - an anti-phishing tool that does not rely completely on automation to detect phishing. Instead, iTrustPage relies on user input and external repositories of information to prevent users from filling out phishing Web forms. With iTrustPage, users help to decide whether or not a Web page is legitimate. Because iTrustPage is user-assisted, iTrustPage avoids the false positives and the false negatives associated with automatic phishing detection. We implemented iTrustPage as a downloadable extension to FireFox. After being featured on the Mozilla website for FireFox extensions, iTrustPage was downloaded by more than 5,000 users in a two week period. We present an analysis of our tool's effectiveness and ease of use based on our examination of usage logs collected from the 2,050 users who used iTrustPage for more than two weeks. Based on these logs, we find that iTrustPage disrupts users on fewer than 2% of the pages they visit, and the number of disruptions decreases over time.

Journal ArticleDOI
TL;DR: generalized lambda distributions are used to model frequency distributions of database search scores computed by MASCOT, X!TANDEM with k-score plug-in, OMSSA, and InsPecT to estimate p values and false discovery rates with high accuracy.

Journal ArticleDOI
TL;DR: It is shown that the results of the DCA and of a self-organizing map when applied to the detection of SYN port scans are comparable, and both produce false positives for the same processes.
Abstract: The dendritic cell algorithm (DCA) is an immune-inspired algorithm, developed for the purpose of anomaly detection. The algorithm performs multi-sensor data fusion and correlation which results in a ‘context aware’ detection system. Previous applications of the DCA have included the detection of potentially malicious port scanning activity, where it has produced high rates of true positives and low rates of false positives. In this work we aim to compare the performance of the DCA and of a self-organizing map (SOM) when applied to the detection of SYN port scans, through experimental analysis. A SOM is an ideal candidate for comparison as it shares similarities with the DCA in terms of the data fusion method employed. It is shown that the results of the two systems are comparable, and both produce false positives for the same processes. This shows that the DCA can produce anomaly detection results to the same standard as an established technique.

Journal ArticleDOI
TL;DR: The results show that the combined use of SOR and the Yoshino code number allows personal identification with a small probability of false positives (p < 10−6), even when kinship is taken into account.
Abstract: The aims of this study were to verify if frontal sinuses can uniquely identify individuals belonging to family groups using Cameriere methods and to test if kinship can affect the proportion of erroneous identifications. For this purpose, we compared the proportion of false-positive identifications in a sample of 99 individuals within 20 families with a control sample of 98 other individuals without kinship. The results show that the combined use of SOR and the Yoshino code number allows personal identification with a small probability of false positives (p < 10(-6)), even when kinship is taken into account. The present research confirms the importance of studying anthropological frameworks for identification, which leads to reliable methods and allows for both quick and economic procedures.

Proceedings Article
01 Jan 2008
TL;DR: A novel architecture for word-spotting which is trained from a small number of examples to classify an utterance as containing a target keyword or not and how to train a support vector machine classifier to separate between keyword and nonkeyword patch feature responses is described.
Abstract: We present a novel architecture for word-spotting which is trained from a small number of examples to classify an utterance as containing a target keyword or not. The word-spotting architecture relies on a novel feature set consisting of a set of ordered spectro-temporal patches which are extracted from exemplar mel-spectra of target keywords. A local pooling operation across frequency and time is introduced which endows the extracted patch features with the flexibility to match novel unseen keywords. Finally, we describe how to train a support vector machine classifier to separate between keyword and nonkeyword patch feature responses. We present preliminary results indicating that our word-spotting architecture achieves a detection rate of 70-95% with false positive rates of about 0.252 false positives per minute.

Proceedings ArticleDOI
05 May 2008
TL;DR: The authors' experimental results show promising detection rates while maintaining false positives at very low rates, and a novel anomaly detection scheme using the correlation information contained in groups of network traffic samples.
Abstract: During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks. However, having a relatively high false alarm rate, anomaly detection has not been wildly used in real networks. In this paper, we have proposed a novel anomaly detection scheme using the correlation information contained in groups of network traffic samples. Our experimental results show promising detection rates while maintaining false positives at very low rates.

Proceedings ArticleDOI
20 Jul 2008
TL;DR: An extended race detection technique based on a combination of lockset analysis and the happens-before relation is described, which provides more accurate warnings and significantly reduces the number of false positives, while limiting thenumber of false negatives.
Abstract: Multi-core chips enable parallel processing for general purpose applications. Unfortunately, parallel programs may contain synchronization defects. Such defects are difficult to detect due to nondeterministic interleavings of parallel threads. Current tools for detecting these defects produce numerous false alarms, thereby concealing the true defects. This paper describes an extended race detection technique based on a combination of lockset analysis and the happens-before relation. The approach provides more accurate warnings and significantly reduces the number of false positives, while limiting the number of false negatives. The technique is implemented in Helgrind+, an extension of the open source dynamic race detector Helgrind. Experimental results with several applications and benchmarks demonstrate a significant reduction in false alarms at a moderate runtime increase.

Journal ArticleDOI
TL;DR: A novel algorithm for the detection of masses in mammographic computer-aided diagnosis systems using a Bayesian detection methodology providing a mathematical sound framework, flexible enough to include additional information and the use of a two-dimensional principal components analysis approach to facilitate false positive reduction.
Abstract: The purpose of this article is to present a novel algorithm for the detection of masses in mammographic computer-aided diagnosis systems. Four key points provide the novelty of our approach: (1) the use of eigenanalysis for describing variation in mass shape and size; (2) a Bayesian detection methodology providing a mathematical sound framework, flexible enough to include additional information; (3) the use of a two-dimensional principal components analysis approach to facilitate false positive reduction; and (4) the incorporation of breast density information, a parameter correlated with the performance of most mass detection algorithms and which is not considered in existing approaches. To study the performance of the system two experiments were carried out. The first is related to the ability of the system to detect masses, and thus, free-response receiver operating characteristic analysis was used, showing that the method is able to give high accuracy at a high specificity (80% detection at 1.40 false positives per image). Second, the ability of the system to highlight the pixels belonging to a mass is studied using receiver operating characteristic analysis, resulting in A z = 0.89 ± 0.04 . In addition, the robustness of the approach is demonstrated in an experiment where we used the Digital Database for Screening Mammographydatabase for training and the MammographicImage Analysis Society database for testing the algorithm.

Journal ArticleDOI
TL;DR: A simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches by combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI -BLAST provides a better discriminator for true and false hits.
Abstract: Motivation: The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile position specific score matrix (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST, and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge. Results: We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a ‘gold-standard’ test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement. Contact: bundschuh@mps.ohio-state.edu

Journal ArticleDOI
TL;DR: In this commentary, the views on how mass spectrometry (MS) could be applied to the discovery of elusive biomarkers are set forth.
Abstract: A biomarker is defined as a biological substance (i.e., protein, metabolite, specific post-translational modification) that can be used to detect a disease, measure its progression or the effects of a treatment. Importantly, a biomarker should be readily accessible (i.e., present within body fluids); it must also provide sufficient sensitivity and specificity to accurately distinguish between true positives, false positives, and false negatives. Even more importantly, detection of the biomarker should provide clinical benefits to the patient (i.e., improved survival and/or quality of life). Due to recent technical advances in biomolecular mass spectrometry, a great deal of effort has gone into the discovery of biomarkers at an international level. In this commentary we set forth our views on how mass spectrometry (MS) could be applied to the discovery of elusive biomarkers (Figure 1).

Journal ArticleDOI
TL;DR: The modified LPLS-regression method proposed here may take background knowledge on variables into account, thereby increasing the accuracy of estimates and reducing the number of false positives, and the potential gain is better variable selection and prediction.

Journal ArticleDOI
TL;DR: A wavelet-based postprocessor can substantially reduce the false positive rate of the CTC CAD for this important polyp size range, and may be important for clinical patient management.
Abstract: Computed tomographic colonography (CTC) computer aided detection (CAD) is a new method to detect colon polyps. Colonic polyps are abnormal growths that may become cancerous. Detection and removal of colonic polyps, particularly larger ones, has been shown to reduce the incidence of colorectal cancer. While high sensitivities and low false positive rates are consistently achieved for the detection of polyps sized 1 cm or larger, lower sensitivities and higher false positive rates occur when the goal of CAD is to identify “medium”-sized polyps, 6 – 9 mm in diameter. Such medium-sized polyps may be important for clinical patient management. We have developed a wavelet-based postprocessor to reduce false positives for this polyp size range. We applied the wavelet-based postprocessor to CTC CAD findings from 44 patients in whom 45 polyps with sizes of 6 – 9 mm were found at segmentally unblinded optical colonoscopy and visible on retrospective review of the CT colonography images. Prior to the application of the wavelet-based postprocessor, the CTC CAD system detected 33 of the polyps (sensitivity 73.33%) with 12.4 false positives per patient, a sensitivity comparable to that of expert radiologists. Fourfold cross validation with 5000 bootstraps showed that the wavelet-based postprocessor could reduce the false positives by 56.61% ( p 0.001 ) , to 5.38 per patient (95% confidence interval [4.41, 6.34]), without significant sensitivity degradation ( 32 ∕ 45 , 71.11%, 95% confidence interval [66.39%, 75.74%], p = 0.1713 ). We conclude that this wavelet-based postprocessor can substantially reduce the false positive rate of our CTC CAD for this important polyp size range.

Proceedings ArticleDOI
31 Oct 2008-Area
TL;DR: Evaluation is carried out on publicly available on-line videos showing that adaptive multiple model outperforms static methods in classification precision and suppression of false positives.
Abstract: We propose a straightforward skin detection method for online videos. To overcome varying illumination circumstances and a variety of skin colors, we introduce a multiple model approach which can be carried out independently per model. The color models are initiated by skin detection based on face detection and adapted in real time. Our approach outperforms static approaches both in precision and runtime. If we detect a face in a scene, the number of false positives can be diminished significantly. Evaluation is carried out on publicly available on-line videos showing that adaptive multiple model outperforms static methods in classification precision and suppression of false positives.

Journal ArticleDOI
TL;DR: A new function is proposed that evaluates the goodness of an attribute by considering the significance of error types and can successfully choose an attribute that suppresses false positives from the given attribute set and the effectiveness of using it is confirmed experimentally.
Abstract: Machine learning or data mining technologies are often used in network intrusion detection systems. An intrusion detection system based on machine learning utilizes a classifier to infer the current state from the observed traffic attributes. The problem with learning-based intrusion detection is that it leads to false positives and so incurs unnecessary additional operation costs. This paper investigates a method to decrease the false positives generated by an intrusion detection system that employs a decision tree as its classifier. The paper first points out that the information-gain criterion used in previous studies to select the attributes in the tree-constructing algorithm is not effective in achieving low false positive rates. Instead of the information-gain criterion, this paper proposes a new function that evaluates the goodness of an attribute by considering the significance of error types. The proposed function can successfully choose an attribute that suppresses false positives from the given attribute set and the effectiveness of using it is confirmed experimentally. This paper also examines the more trivial leaf rewriting approach to benchmark the proposed method. The comparison shows that the proposed attribute evaluation function yields better solutions than the leaf rewriting approach.