scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2010"


Journal ArticleDOI
TL;DR: It is found that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives, and it is important to use reliable alignment methods.
Abstract: The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. Previous simulations examining the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-positive rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examination of two previous studies suggests that alignment errors may impact the analysis of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.

252 citations


Journal ArticleDOI
TL;DR: In this article, a procedure called BLENDER was proposed to model the photometry in terms of a "blend" rather than a planet orbiting a star, where a blend may consist of a background or foreground eclipsing binary (or star-planet pair) whose eclipses are attenuated by the light of the candidate and possibly other stars within the photometric aperture.
Abstract: Light curves from the Kepler Mission contain valuable information on the nature of the phenomena producing the transit-like signals. To assist in exploring the possibility that they are due to an astrophysical false positive, we describe a procedure (BLENDER) to model the photometry in terms of a "blend" rather than a planet orbiting a star. A blend may consist of a background or foreground eclipsing binary (or star-planet pair) whose eclipses are attenuated by the light of the candidate and possibly other stars within the photometric aperture. We apply BLENDER to the case of Kepler-9, a target harboring two previously confirmed Saturn-size planets (Kepler-9b and Kepler-9c) showing transit timing variations, and an additional shallower signal with a 1.59-day period suggesting the presence of a super-Earth-size planet. Using BLENDER together with constraints from other follow-up observations we are able to rule out all blends for the two deeper signals, and provide independent validation of their planetary nature. For the shallower signal we rule out a large fraction of the false positives that might mimic the transits. The false alarm rate for remaining blends depends in part (and inversely) on the unknown frequency of small-size planets. Based on several realistic estimates of this frequency we conclude with very high confidence that this small signal is due to a super-Earth-size planet (Kepler-9d) in a multiple system, rather than a false positive. The radius is determined to be 1.64 (+0.19/-0.14) R(Earth), and current spectroscopic observations are as yet insufficient to establish its mass.

250 citations


Proceedings ArticleDOI
16 May 2010
TL;DR: This paper presents an automatic technique for extracting optimally discriminative specifications, which uniquely identify a class of programs, which can be used by a behavior-based malware detector.
Abstract: Fueled by an emerging underground economy, malware authors are exploiting vulnerabilities at an alarming rate. To make matters worse, obfuscation tools are commonly available, and much of the malware is open source, leading to a huge number of variants. Behavior-based detection techniques are a promising solution to this growing problem. However, these detectors require precise specifications of malicious behavior that do not result in an excessive number of false alarms. In this paper, we present an automatic technique for extracting optimally discriminative specifications, which uniquely identify a class of programs. Such a discriminative specification can be used by a behavior-based malware detector. Our technique, based on graph mining and concept analysis, scales to large classes of programs due to probabilistic sampling of the specification space. Our implementation, called Holmes, can synthesize discriminative specifications that accurately distinguish between programs, sustaining an 86% detection rate on new, unknown malware, with 0 false positives, in contrast with 55% for commercial signature-based antivirus (AV) and 62-64% for behavior-based AV (commercial or research).

226 citations


Journal ArticleDOI
TL;DR: Results indicated that the addition of WIF progress monitoring and dynamic assessment, but not running records or oral reading fluency, significantly decreased false positives.
Abstract: The purposes of this study were (a) to identify measures that when added to a base 1(st)-grade screening battery help eliminate false positives and (b) to investigate gains in efficiency associated with a 2-stage gated screening procedure. We tested 355 children in the fall of 1(st) grade, and assessed for reading difficulty at the end of 2(nd) grade. The base screening model, included measures of phonemic awareness, rapid naming skill, oral vocabulary, and initial word identification fluency (WIF). Short-term WIF progress monitoring (intercept and slope), dynamic assessment, running records, and oral reading fluency were each considered as an additional screening measure in contrasting models. Results indicated that the addition of WIF progress monitoring and dynamic assessment, but not running records or oral reading fluency, significantly decreased false positives. The 2-stage gated screening process using phonemic decoding efficiency in the first stage significantly reduced the number of children requiring the full screening battery.

186 citations


Journal ArticleDOI
TL;DR: Under a wide range of scenarios, bayescan appears to be more efficient than the other methods, detecting a usually high percentage of true selective loci as well as less than 1% of outliers (false positives) under a fully neutral model.
Abstract: We carried out a simulation study to compare the efficiency of three alternative programs (DFDIST, DETSELD and BAYESCAN) to detect loci under directional selection from genome-wide scans using dominant markers. We also evaluated the efficiency of correcting for multiple testing those methods that use a classical probability approach. Under a wide range of scenarios, we conclude that BAYESCAN appears to be more efficient than the other methods, detecting a usually high percentage of true selective loci as well as less than 1% of outliers (false positives) under a fully neutral model. In addition, the percentage of outliers detected by this software is always correlated with the true percentage of selective loci in the genome. Our results show, nevertheless, that false positives are common even with a combination of methods and multitest correction, suggesting that conclusions obtained from this approach should be taken with extreme caution.

184 citations


Journal ArticleDOI
TL;DR: A new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data.
Abstract: In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR) and significant reduce false positives (FP) for different types of network intrusions using limited computational resources.

167 citations


Journal ArticleDOI
TL;DR: Association or linkage disequilibrium mapping has become a very popular method for dissecting the genetic basis of complex traits in plants as mentioned in this paper, which can be a relatively detailed mapping resolution and is far less time consuming since no mapping populations need to be generated.
Abstract: Association or linkage disequilibrium mapping has become a very popular method for dissecting the genetic basis of complex traits in plants. The benefits of association mapping, compared with traditional quantitative trait locus mapping, is, for example, a relatively detailed mapping resolution and that it is far less time consuming since no mapping populations need to be generated. The surge of interest in association mapping has been fueled by recent developments in genomics that allows for rapid identification and scoring of genetic markers which has traditionally limited mapping experiments. With the decreasing cost of genotyping future emphasis will likely focus on phenotyping, which can be both costly and time consuming but which is crucial for obtaining reliable results in association mapping studies. In addition, association mapping studies are prone to the identification of false positives, especially if the experimental design is not rigorously controlled. For example, population structure has long been known to induce many false positives and accounting for population structure has become one of the main issues when implementing association mapping in plants. Also, with increasing numbers of genetic markers used, the problem becomes separating true from false positive and this highlights the need for independent validation of identified association. With these caveats in mind, association mapping nevertheless shows great promise for helping us understand the genetic basis of complex traits of both economic and ecological importance.

156 citations


Journal ArticleDOI
03 Sep 2010-PLOS ONE
TL;DR: A comparison of eight tests representative of variance modeling strategies in gene expression data shows two tests show significant improvement compared to the t-test, in particular to deal with small sample sizes and limma presents several practical advantages, so it is advocated its application to analyze gene expressionData.
Abstract: High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.

155 citations


01 Jan 2010
TL;DR: With these caveats in mind, association mapping shows great promise for helping us understand the genetic basis of complex traits of both economic and ecological importance.
Abstract: Association or linkage disequilibrium mapping has become a very popular method for dissecting the genetic basis of complex traits in plants. The benefits of association mapping, compared with traditional quantitative trait locus mapping, is, for example, a relatively detailed mapping resolution and that it is far less time consuming since no mapping populations need to be generated. The surge of interest in association mapping has been fueled by recent developments in genomics that allows for rapid identification and scoring of genetic markers which has traditionally limited mapping experiments. With the decreasing cost of genotyping future emphasis will likely focus on phenotyping, which can be both costly and time consuming but which is crucial for obtaining reliable results in association mapping studies. In addition, association mapping studies are prone to the identification of false positives, especially if the experimental design is not rigorously controlled. For example, population structure has long been known to induce many false positives and accounting for population structure has become one of the main issues when implementing association mapping in plants. Also, with increasing numbers of genetic markers used, the problem becomes separating true from false positive and this highlights the need for independent validation of identified association. With these caveats in mind, association mapping nevertheless shows great promise for helping us understand the genetic basis of complex traits of both economic and ecological importance.

151 citations


Journal ArticleDOI
TL;DR: This study presents novel Y2H vectors that significantly decrease the number of false negatives and help to mitigate the false positive problem and suggests that future interaction screens should use such vector combinations on a routine basis.
Abstract: Yeast two-hybrid (Y2H) screens have been among the most powerful methods to detect and analyze protein-protein interactions. However, they suffer from a significant degree of false negatives, i.e. true interactions that are not detected, and to a certain degree from false positives, i.e. interactions that appear to take place only in the context of the Y2H assay. While the fraction of false positives remains difficult to estimate, the fraction of false negatives in typical Y2H screens is on the order of 70-90%. Here we present novel Y2H vectors that significantly decrease the number of false negatives and help to mitigate the false positive problem. We have constructed two new vectors (pGBKCg and pGADCg) that allow us to make both C-terminal fusion proteins of DNA-binding and activation domains. Both vectors can be combined with existing vectors for N-terminal fusions and thus allow four different bait-prey combinations: NN, CC, NC, and CN. We have tested all ~4,900 pairwise combinations of the 70 Varicella-Zoster-Virus (VZV) proteins for interactions, using all possible combinations. About ~20,000 individual Y2H tests resulted in 182 NN, 89 NC, 149 CN, and 144 CC interactions. Overlap between screens ranged from 17% (NC-CN) to 43% (CN-CC). Performing four screens (i.e. permutations) instead of one resulted in about twice as many interactions and thus much fewer false negatives. In addition, interactions that are found in multiple combinations confirm each other and thus provide a quality score. This study is the first systematic analysis of such N- and C-terminal Y2H vectors. Permutations of C- and N-terminal Y2H vectors dramatically increase the coverage of interactome studies and thus significantly reduce the number of false negatives. We suggest that future interaction screens should use such vector combinations on a routine basis, not the least because they provide a built-in quality score for Y2H interactions that can provide a measure of reproducibility without additional assays.

138 citations


Journal ArticleDOI
TL;DR: In this paper, the effect of low statistical power on the likelihood that a statistically significant finding is actually false positive was investigated. But, the authors pointed out that, where there is a high probability that the null hypothesis is true, statistically significant findings are even more likely to be falsely positive.
Abstract: It is well recognised that low statistical power increases the probability of type II error, that is it reduces the probability of detecting a difference between groups, where a difference exists. Paradoxically, low statistical power also increases the likelihood that a statistically significant finding is actually falsely positive (for a given p-value). Hence, ethical concerns regarding studies with low statistical power should include the increased risk of type I error in such studies reporting statistically significant effects. This paper illustrates the effect of low statistical power by comparing hypothesis testing with diagnostic test evaluation using concepts familiar to clinicians, such as positive and negative predicative values. We also note that, where there is a high probability that the null hypothesis is true, statistically significant findings are even more likely to be falsely positive.

Journal ArticleDOI
TL;DR: This work puts forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives.
Abstract: Current analysis of event-related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio-temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about 'when' and 'where' these differences appear in the spatio-temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local-fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local-fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed.

Journal ArticleDOI
TL;DR: A post-processing filter is proposed to reduce false positives in network-based intrusion detection systems and can significantly reduce the number and percentage of false positives produced by Snort(C) (Roesch, 1999).

Journal ArticleDOI
TL;DR: Recent years have seen the development of straightforward decision analytic techniques that evaluate prediction models in terms of their consequences, and hold the promise of determining whether clinical implementation of prediction models would do more good than harm.

Journal ArticleDOI
TL;DR: This paper will focus on the structural classes and known mechanisms of nonleadlike false positives, together with experimental and computational methods for identifying such compounds.
Abstract: High-throughput screening (HTS) is one of the most powerful approaches available for identifying new lead compounds for the growing catalogue of validated drug targets. However, just as virtual and experimental HTS have accelerated lead identification and changed drug discovery, they have also introduced a large number of peculiar molecules. Some of these have turned out to be interesting for further optimization, others to be dead ends when attempts are made to optimize their activity, typically after a great deal of time and resources have been devoted. Such false positive hits are still one of the key problems in the field of HTS and in the early stages of drug discovery in general. Many studies have been devoted to understanding the origins of false-positives, and the findings have been incorporated in filters and methods that can predict and eliminate problematic molecules from further consideration. This paper will focus on the structural classes and known mechanisms of nonleadlike false positives, together with experimental and computational methods for identifying such compounds.

Journal ArticleDOI
TL;DR: Psychiatry has so far failed to systematically adjust its diagnostic practices to confront the problem of false positives, and the degree of concern, systematicity and thoroughness with which recent revisions of the DSM have attended to the challenge of avoiding false positive diagnoses is considered.
Abstract: Background: In psychiatry's transformation from primarily an asylum-based profession to a community-oriented profession, false positive diagnoses that mistakenly classify normal intense reactions to stress as mental disorders became a major challenge to the validity of psychiatric diagnosis. The shift to symptom-based operationalized diagnostic criteria in DSM-III further exacerbated this difficulty because of the contextually based nature of the distinction between normal distress and mental disorder, which often display similar symptoms. The problem has particular urgency because the DSM's symptom-based criteria are often applied in studies and screening instruments outside of the clinical context and by non-mental-health professionals.Aims: To consider, through selected examples, the degree of concern, systematicity and thoroughness – and the degree of success – with which recent revisions of the DSM have attended to the challenge of avoiding false positive diagnoses.Method: Conceptual analysis of sele...

Journal ArticleDOI
TL;DR: In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks.
Abstract: In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR) and significant reduce false positives (FP) for different types of network intrusions using limited computational resources.

Journal ArticleDOI
TL;DR: The problem of ”false research findings” in medical research has focused much attention in the last few years (Ioannidis, 2005), but the same types of problems occur in biostatistics and bioinformatics research.
Abstract: The problem of ”false research findings” in medical research has focused much attention in the last few years (Ioannidis, 2005). One of the main problems, termed as ”fishing for significance” in the present letter, is that researchers often (consciously or subconsciously) report results that are in fact the product of an intensive optimization, i.e. of multiple comparisons. Such results are typically unlikely to be reproduced in an independent study and have a high probability to be false (Ioannidis, 2005). The ”fishing for significance” problem is enhanced by the so-called ”publication bias”: positive results have a much higher chance to get published than negative results, as already acknowledged fifty years ago (Sterling, 1959). In a word, many false positive results are produced through multiple comparisons, and false positives have higher chance to get published than true negatives. Moreover, the difficulty to publish negative results obviously encourages authors to find something positive in their study by performing numerous analyses until one of them yields positive results by chance, i.e. to fish for significance. Although this issue is by far less acknowledged and publicly admitted than in the medical context, the same types of problems occur in biostatistics and bioinformatics research.

Journal ArticleDOI
TL;DR: The proposed approach applied to the security domain of anomaly based network intrusion detection correctly classifies different types of attacks of KDD99 benchmark dataset with high classification rates in short response time and reduce false positives using limited computational resources.
Abstract: Recently, research on intrusion detection in computer systems has received much attention to the computational intelligence society. Many intelligence learning algorithms applied to the huge volume of complex and dynamic dataset for the construction of efficient intrusion detection systems (IDSs). Despite of many advances that have been achieved in existing IDSs, there are still some difficulties, such as correct classification of large intrusion detection dataset, unbalanced detection accuracy in the high speed network traffic, and reduce false positives. This paper presents a new approach to the alert classification to reduce false positives in intrusion detection using improved self adaptive Bayesian algorithm (ISABA). The proposed approach applied to the security domain of anomaly based network intrusion detection, which correctly classifies different types of attacks of KDD99 benchmark dataset with high classification rates in short response time and reduce false positives using limited computational resources.

Proceedings ArticleDOI
19 Feb 2010
TL;DR: A novel culling algorithm that uses deforming non-penetration filters to improve the performance of continuous collision detection (CCD) algorithms and can reduce the number of false positives significantly and improve the overall performance of CCD algorithms by 1.5--8.2x.
Abstract: We present a novel culling algorithm that uses deforming non-penetration filters to improve the performance of continuous collision detection (CCD) algorithms. The underlying idea is to use a simple and effective filter that reduces both the number of false positives and the elementary tests between the primitives. This filter is derived from the coplanarity condition and can be easily combined with other methods used to accelerate CCD. We have implemented the algorithm and tested its performance on many non-rigid simulations. In practice, we can reduce the number of false positives significantly and improve the overall performance of CCD algorithms by 1.5--8.2x.

01 Jan 2010
TL;DR: In this paper, a culling algorithm that uses deforming non-penetration filters to improve the performance of continuous collision detection (CCD) algorithms is presented, which reduces both the number of false positives and the elementary tests between the primitives.
Abstract: We present a novel culling algorithm that uses deforming non-penetration filters to improve the performance of continuous collision detection (CCD) algorithms. The underlying idea is to use a simple and effective filter that reduces both the number of false positives and the elementary tests between the primitives. This filter is derived from the coplanarity condition and can be easily combined with other methods used to accelerate CCD. We have implemented the algorithm and tested its performance on many non-rigid simulations. In practice, we can reduce the number of false positives significantly and improve the overall performance of CCD algorithms by 1.5--8.2x.

Journal ArticleDOI
TL;DR: Results indicate the proposed method, which consisted of a rule-based method, a level set method, and a support vector machine, would be useful for assisting neuroradiologists in assessing the MS in clinical practice.

Journal ArticleDOI
TL;DR: Current EIA(A/B) tests for CDI are of inadequate sensitivity and should be replaced; however, this may result in apparent changes in CDI rates that would need to be explained in national surveillance statistics.

Proceedings ArticleDOI
01 Sep 2010
TL;DR: In this article, the authors extend Atia and Saligrama's results to a model containing both false positives and false negatives, and develop simple information theoretic bounds on the number of tests required.
Abstract: An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, developing simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds.

Journal ArticleDOI
TL;DR: The direct costs for breast-related procedures following false positive screening mammograms may contribute substantially to US healthcare spending.
Abstract: Objective:We sought to estimate the direct cost, from the perspective of the health insurer or purchaser, of breast-care services in the year following a false positive screening mammogram compared with a true negative examination.Design:We identified 21,125 women aged 40 to 80 years enrolled in an

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel Bloom filter scheme, which increases the ratio of bits set to a value larger than one without decreasing the ratios of bits sets to zero, which can reduce the number of exposed false negatives as well as decrease the likelihood of false positives.
Abstract: Bloom filter is effective, space-efficient data structure for concisely representing a data set and supporting approximate membership queries. Traditionally, researchers often believe that it is possible that a Bloom filter returns a false positive, but it will never return a false negative under well-behaved operations. By investigating the mainstream variants, however, we observe that a Bloom filter does return false negatives in many scenarios. In this work, we show that the undetectable incorrect deletion of false positive items and detectable incorrect deletion of multiaddress items are two general causes of false negative in a Bloom filter. We then measure the potential and exposed false negatives theoretically and practically. Inspired by the fact that the potential false negatives are usually not fully exposed, we propose a novel Bloom filter scheme, which increases the ratio of bits set to a value larger than one without decreasing the ratio of bits set to zero. Mathematical analysis and comprehensive experiments show that this design can reduce the number of exposed false negatives as well as decrease the likelihood of false positives. To the best of our knowledge, this is the first work dealing with both the false positive and false negative problems of Bloom filter systematically when supporting standard usages of item insertion, query, and deletion operations.

Proceedings ArticleDOI
23 May 2010
TL;DR: A detection approach based on time-series decomposition, which divides the original time series into trend and random components and applies a double autocorrelation technique and an improved cumulative sum technique to the trend andrandom components to detect anomalies in both components is proposed.
Abstract: Recently, many new types of distributed denial of service (DDoS) attacks have emerged, posing a great challenge to intrusion detection systems. In this paper, we introduce a new type of DDoS attacks called stealthy DDoS attacks, which can be launched by sophisticated attackers. Such attacks are different from traditional DDoS attacks in that they cannot be detected by previous detection methods effectively. In response to this type of DDoS attacks, we propose a detection approach based on time-series decomposition, which divides the original time series into trend and random components. It then applies a double autocorrelation technique and an improved cumulative sum technique to the trend and random components, respectively, to detect anomalies in both components. By separately examining each component and synthetically evaluating the overall results, the proposed approach can greatly reduce not only false positives and negatives but also detection latency. In addition, to make our method more generally applicable, we apply an adaptive sliding-window to our real-time algorithm. We evaluate the performance of the proposed approach using real Internet traces, demonstrating its effectiveness.

Proceedings ArticleDOI
02 May 2010
TL;DR: It is shown that this approach can suggest candidate false positives reported by static analysis and provide input vectors that expose actual vulnerabilities, to be used as test cases in security testing.
Abstract: Cross site scripting is considered the major threat to the security of web applications. Removing vulnerabilities from existing web applications is a manual expensive task that would benefit from some level of automatic assistance. Static analysis represents a valuable support for security review, by suggesting candidate vulnerable points to be checked manually. However, potential benefits are quite limited when too many false positives, safe portions of code classified as vulnerable, are reported.In this paper, we present a preliminary investigation on the integration of static analysis with genetic algorithms. We show that this approach can suggest candidate false positives reported by static analysis and provide input vectors that expose actual vulnerabilities, to be used as test cases in security testing.

Journal ArticleDOI
Andreas Diekmann1, Ben Jann1
TL;DR: It is very doubtful whether the Benford distribution is an appropriate tool to discriminate between manipulated and non-manipulated estimates.
Abstract: Is Benford's law a good instrument to detect fraud in reports of statistical and scientific data? For a valid test the probability of "false positives" and "false negatives" has to be low. However, it is very doubtful whether the Benford distribution is an appropriate tool to discriminate between manipulated and non-manipulated estimates. Further research should focus more on the validity of the test and test results should be interpreted more carefully.

Patent
08 Jan 2010
TL;DR: In this paper, the authors present an anti-malware system that reduces the likelihood of detecting a false positive by comparing files on hosts suspected of containing malware to control versions of those files.
Abstract: An anti-malware system that reduces the likelihood of detecting a false positive. The system is applied in an enterprise network in which a server receives reports of suspected malware from multiple hosts. Files on hosts suspected of containing malware are compared to control versions of those files. A match between a suspected file and a control version is used as an indication that the malware report is a false positive. Such an indication may be used in conjunction with other information, such as whether other hosts similarly report suspect files that match control versions or whether the malware report is generated by a recently changed component of the anti-malware system.