scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2017"



Posted Content
TL;DR: This work presents a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x100,000 pixels and achieves image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides.
Abstract: Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.

518 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the meaning and limitations of a p-value and propose a simple alternative (the minimum Bayes factor) for a robust, transparent research culture in financial economics, and offer some thoughts on the importance of risk-taking (from the perspective of authors and editors).
Abstract: Given the competition for top journal space, there is an incentive to produce “significant” results. With the combination of unreported tests, lack of adjustment for multiple tests, and direct and indirect p-hacking, many of the results being published will fail to hold up in the future. In addition, there are basic issues with the interpretation of statistical significance. Increasing thresholds may be necessary, but still may not be sufficient: if the effect being studied is rare, even t > 3 will produce a large number of false positives. Here I explore the meaning and limitations of a p-value. I offer a simple alternative (the minimum Bayes factor). I present guidelines for a robust, transparent research culture in financial economics. Finally, I offer some thoughts on the importance of risk-taking (from the perspective of authors and editors) to advance our field. SUMMARY Empirical research in financial economics relies too much on p-values, which are poorly understood in the first place. Journals want to publish papers with positive results and this incentivizes researchers to engage in data mining and “p-hacking.” The outcome will likely be an embarrassing number of false positives—effects that will not be repeated in the future. The minimum Bayes factor (which is a function of the p-value) combined with prior odds provides a simple solution that can be reported alongside the usual p-value. The Bayesianized p-value answers the question: What is the probability that the null is true? The same technique can be used to answer: What threshold of t-statistic do I need so that there is only a 5% chance that the null is true? The threshold depends on the economic plausibility of the hypothesis.

253 citations


Journal ArticleDOI
TL;DR: It is shown that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Abstract: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

247 citations


Journal ArticleDOI
TL;DR: New algorithms that carry out the sequential construction of EICs and detection of E IC peaks are developed and evidence that these new algorithms detect significantly fewer false positives is presented.
Abstract: False positive and false negative peaks detected from extracted ion chromatograms (EIC) are an urgent problem with existing software packages that preprocess untargeted liquid or gas chromatography–mass spectrometry metabolomics data because they can translate downstream into spurious or missing compound identifications. We have developed new algorithms that carry out the sequential construction of EICs and detection of EIC peaks. We compare the new algorithms to two popular software packages XCMS and MZmine 2 and present evidence that these new algorithms detect significantly fewer false positives. Regarding the detection of compounds known to be present in the data, the new algorithms perform at least as well as XCMS and MZmine 2. Furthermore, we present evidence that mass tolerance in m/z should be favored rather than mass tolerance in ppm in the process of constructing EICs. The mass tolerance parameter plays a critical role in the EIC construction process and can have immense impact on the detection ...

232 citations


Posted Content
TL;DR: The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of0.39%.
Abstract: In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN's have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN's: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN's classify the test data as the object, disregarding features' relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN's would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a recent paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset.

163 citations


Journal ArticleDOI
TL;DR: A computational investigation of the various types of statistical errors than can occur in studies of reading behavior using Monte Carlo simulations shows that, contrary to conventional wisdom, false positives are increased to unacceptable levels when no corrections are applied.

103 citations


Journal ArticleDOI
TL;DR: Simulation results show that this lightweight anomaly detection outperforms current anomaly detection techniques, since in scaling mode it requires low energy consumption to detect the attacks with high detection and low false positive rates, almost 93% and 2%, respectively.
Abstract: The Internet of Things (IoT) technology incorporates a large number of heterogeneous devices connected to untrusted networks. Nevertheless, securing IoT devices is a fundamental issue due to the relevant information handled in IoT networks. The intrusion detection system (IDS) is the most commonly used technique to detect intruders and acts as a second wall of defense when cryptography is broken. This is achieved by combining the advantages of anomaly and signature detection techniques, which are characterized by high detection rates and low false positives, respectively. To achieve a high detection rate, the anomaly detection technique relies on a learning algorithm to model the normal behavior of a node, and when a new attack pattern (often known as signature) is detected, it will be modeled with a set of rules. This latter is used by the signature detection technique for attack confirmation. Activating the anomaly detection technique simultaneously at each low-resource IoT device and all the time could generate a high-energy consumption. Thereby, we propose a game theoretic technique to activate anomaly detection technique only when a new attack's signature is expected to occur; hence, a balance between detection and false positive rates, and energy consumption is achieved. Even by combining between these two detection techniques, we observed that the number of false positives is still non null (almost equal to 5%). Thereby, to decrease further the false positive rate, a reputation model based on game theory is proposed. Simulation results show that this lightweight anomaly detection outperforms current anomaly detection techniques, since in scaling mode (i.e., when the number of IoT devices and attackers are high) it requires low energy consumption to detect the attacks with high detection and low false positive rates, almost 93% and 2%, respectively.

87 citations


Journal ArticleDOI
TL;DR: A two-stage occupancy-detection model is developed and examined for the analysis of species detection data with false-positive and false-negative errors at multiple levels, and it is demonstrated that the model is not identifiable if only survey data prone to false positives are available.
Abstract: Summary Accurate knowledge of species occurrence is fundamental to a wide variety of ecological, evolutionary and conservation applications. Assessing the presence or absence of species at sites is often complicated by imperfect detection, with different mechanisms potentially contributing to false-negative and/or false-positive errors at different sampling stages. Ambiguities in the data mean that estimation of relevant parameters might be confounded unless additional information is available to resolve those uncertainties. Here, we consider the analysis of species detection data with false-positive and false-negative errors at multiple levels. We develop and examine a two-stage occupancy-detection model for this purpose. We use profile likelihoods for identifiability analysis and estimation, and study the types of additional data required for reliable estimation. We test the model with simulated data, and then analyse data from environmental DNA (eDNA) surveys of four Australian frog species. In our case study, we consider that false positives may arise due to contamination at the water sample and quantitative PCR-sample levels, whereas false negatives may arise due to eDNA not being captured in a field sample, or due to the sensitivity of laboratory tests. We augment our eDNA survey data with data from aural surveys and laboratory calibration experiments. We demonstrate that the two-stage model with false-positive and false-negative errors is not identifiable if only survey data prone to false positives are available. At least two sources of extra information are required for reliable estimation (e.g. records from a survey method with unambiguous detections, and a calibration experiment). Alternatively, identifiability can be achieved by setting plausible bounds on false detection rates as prior information in a Bayesian setting. The results of our case study matched our simulations with respect to data requirements, and revealed false-positive rates greater than zero for all species. We provide statistical modelling tools to account for uncertainties in species occurrence survey data when false negatives and false positives could occur at multiple sampling stages. Such data are often needed to support management and policy decisions. Dealing with these uncertainties is relevant for traditional survey methods, but also for promising new techniques, such as eDNA sampling.

82 citations


Journal ArticleDOI
TL;DR: This paper seeks to explicate and adopt a parametric approach through linear mixed‐effects (LME) modeling for studying the ISC values, building on the previous correlation framework, with the benefit that the LME platform offers wider adaptability, more powerful interpretations, and quality control checking capability than nonparametric methods.

75 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: ChromaTag as discussed by the authors uses opponent colors to limit and quickly reject initial false detections and grayscale for precise localization for real-time fiducial marker detection, which is significantly faster than current marker detection algorithms.
Abstract: Current fiducial marker detection algorithms rely on marker IDs for false positive rejection. Time is wasted on potential detections that will eventually be rejected as false positives. We introduce ChromaTag, a fiducial marker and detection algorithm designed to use opponent colors to limit and quickly reject initial false detections and grayscale for precise localization. Through experiments, we show that ChromaTag is significantly faster than current fiducial markers while achieving similar or better detection accuracy. We also show how tag size and viewing direction effect detection accuracy. Our contribution is significant because fiducial markers are often used in real-time applications (e.g. marker assisted robot navigation) where heavy computation is required by other parts of the system.

Journal ArticleDOI
TL;DR: This paper discusses how Bayesian multiple-regression methods that are used for whole-genome prediction can be adapted for GWAS and argues that controlling the posterior type I error rate is more suitable than controlling the genomewise error rate for controlling false positives in GWAS.
Abstract: Data that are collected for whole-genome prediction can also be used for genome-wide association studies (GWAS). This paper discusses how Bayesian multiple-regression methods that are used for whole-genome prediction can be adapted for GWAS. It is argued here that controlling the posterior type I error rate (PER) is more suitable than controlling the genomewise error rate (GER) for controlling false positives in GWAS. It is shown here that under ideal conditions, i.e., when the model is correctly specified, PER can be controlled by using Bayesian posterior probabilities that are easy to obtain. Computer simulation was used to examine the properties of this Bayesian approach when the ideal conditions were not met. Results indicate that even then useful inferences can be made.

Journal ArticleDOI
TL;DR: A novel combination of algorithms for automated microaneurysm (MA) detection in retinal images shows that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates and maintains consistent performance across datasets.

Journal ArticleDOI
TL;DR: The logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.
Abstract: The task of classifying is natural to humans, but there are situations in which a person is not best suited to perform this function, which creates the need for automatic methods of classification. Traditional methods, such as logistic regression, are commonly used in this type of situation, but they lack robustness and accuracy. These methods do not not work very well when the data or when there is noise in the data, situations that are common in expert and intelligent systems. Due to the importance and the increasing complexity of problems of this type, there is a need for methods that provide greater accuracy and interpretability of the results. Among these methods, is Boosting, which operates sequentially by applying a classification algorithm to reweighted versions of the training data set. It was recently shown that Boosting may also be viewed as a method for functional estimation. The purpose of the present study was to compare the logistic regressions estimated by the maximum likelihood model (LRMML) and the logistic regression model estimated using the Boosting algorithm, specifically the Binomial Boosting algorithm (LRMBB), and to select the model with the better fit and discrimination capacity in the situation of presence(absence) of a given property (in this case, binary classification). To illustrate this situation, the example used was to classify the presence (absence) of coronary heart disease (CHD) as a function of various biological variables collected from patients. It is shown in the simulations results based on the strength of the indications that the LRMBB model is more appropriate than the LRMML model for the adjustment of data sets with several covariables and noisy data. The following sections report lower values of the information criteria AIC and BIC for the LRMBB model and that the Hosmer–Lemeshow test exhibits no evidence of a bad fit for the LRMBB model. The LRMBB model also presented a higher AUC, sensitivity, specificity and accuracy and lower values of false positives rates and false negatives rates, making it a model with better discrimination power compared to the LRMML model. Based on these results, the logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.

Journal ArticleDOI
TL;DR: It is demonstrated that inappropriate methodology in acoustic analysis can yield false positives with effect sizes as large, or even larger, than those reported in published studies, and psychological observer biases led to false positives.
Abstract: Summary Numerous studies over the past decade have reported correlations between elevated levels of anthropogenic noise and a rise in the minimum frequency of acoustic signals of animals living in noisy habitats. This pattern appears to be occurring globally, and higher pitched signals have been hypothesized to be adaptive changes that reduce masking by low-frequency traffic noise. However, the sound analysis methods most often used in these studies are prone to measurement errors that can result in false positives. In addition, the commonly used method of measuring frequencies visually from spectrograms might also lead to observer-expectancy biases that could exacerbate measurement errors. We conducted an experiment to (i) quantify the size and type of errors that result from ‘eye-balling’ frequency measurements with cursors placed manually on spectrograms of signals recorded in noise and no-noise conditions, and (ii) to test whether observer expectations lead to significant errors in frequency measurements. We asked 54 volunteers, blind to the true intention of our study, to visually measure the minimum frequency of a variety of natural and synthesized bird sounds, recorded either in noise, or no-noise conditions. Test subjects were either informed or uninformed about the hypothesized results of the measurements. Our results demonstrate that inappropriate methodology in acoustic analysis can yield false positives with effect sizes as large, or even larger, than those reported in published studies. In addition to these measurement artefacts, psychological observer biases also led to false positives – when observers expected signals to have higher minimum frequencies in noise, they measured significantly higher minimum frequencies than uninformed observers, who had not been primed with any expectation. The use of improper analysis methods in bioacoustics can lead to the publication of spurious results. We discuss alternative methods that yield unbiased frequency measures and we caution that it is imperative for researchers to familiarize themselves both with the functions and limitations of their sound analysis programmes. In addition, observer-expectancy biases are a potential source of error not only in the field of bioacoustics, but in any situation where measurements can be influenced by human subjectivity.

Journal ArticleDOI
TL;DR: In this paper, Monte Carlo simulations were used to explore the functioning of three commonly used tests proposed by Roisman et al. (2012), and a revised test based on a broader window of proportion of interaction index values (between.20 and.80) was proposed.
Abstract: Statistical tests of differential susceptibility have become standard in the empirical literature, and are routinely used to adjudicate between alternative developmental hypotheses. However, their performance and limitations have never been systematically investigated. In this paper I employ Monte Carlo simulations to explore the functioning of three commonly used tests proposed by Roisman et al. (2012). Simulations showed that critical tests of differential susceptibility require considerably larger samples than standard power calculations would suggest. The results also showed that existing criteria for differential susceptibility based on the proportion of interaction index (i.e., values between .40 and .60) are especially likely to produce false negatives and highly sensitive to assumptions about interaction symmetry. As an initial response to these problems, I propose a revised test based on a broader window of proportion of interaction index values (between .20 and .80). Additional simulations showed that the revised test outperforms existing tests of differential susceptibility, considerably improving detection with little effect on the rate of false positives. I conclude by noting the limitations of a purely statistical approach to differential susceptibility, and discussing the implications of the present results for the interpretation of published findings and the design of future studies in this area.

Journal ArticleDOI
TL;DR: A simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions, finding GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.
Abstract: There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions. Simulations were based on an exposome including 237 exposures with a realistic correlation structure. We considered several statistical regression-based methods, including two-step Environment-Wide Association Study (EWAS2), the Deletion/Substitution/Addition (DSA) algorithm, the Least Absolute Shrinkage and Selection Operator (LASSO), Group-Lasso INTERaction-NET (GLINTERNET), a three-step method based on regression trees and finally Boosted Regression Trees (BRT). We assessed the performance of each method in terms of model size, predictive ability, sensitivity and false discovery rate. GLINTERNET and DSA had better overall performance than the other methods, with GLINTERNET having better properties in terms of selecting the true predictors (sensitivity) and of predictive ability, while DSA had a lower number of false positives. In terms of ability to capture interaction terms, GLINTERNET and DSA had again the best performances, with the same trade-off between sensitivity and false discovery proportion. When GLINTERNET and DSA failed to select an exposure truly associated with the outcome, they tended to select a highly correlated one. When interactions were not present in the data, using variable selection methods that allowed for interactions had only slight costs in performance compared to methods that only searched for main effects. GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.

Journal ArticleDOI
TL;DR: SmartSVA corrects the limitation of traditional SVA under highly confounded scenarios by imposing an explicit convergence criterion and improves the computational efficiency for large datasets and can be applied to other genomic studies to capture unknown sources of variability.
Abstract: One problem that plagues epigenome-wide association studies is the potential confounding due to cell mixtures when purified target cells are not available. Reference-free adjustment of cell mixtures has become increasingly popular due to its flexibility and simplicity. However, existing methods are still not optimal: increased false positive rates and reduced statistical power have been observed in many scenarios. We develop SmartSVA, an optimized surrogate variable analysis (SVA) method, for fast and robust reference-free adjustment of cell mixtures. SmartSVA corrects the limitation of traditional SVA under highly confounded scenarios by imposing an explicit convergence criterion and improves the computational efficiency for large datasets. Compared to traditional SVA, SmartSVA achieves an order-of-magnitude speedup and better false positive control. It protects the signals when capturing the cell mixtures, resulting in significant power increase while controlling for false positives. Through extensive simulations and real data applications, we demonstrate a better performance of SmartSVA than the existing methods. SmartSVA is a fast and robust method for reference-free adjustment of cell mixtures for epigenome-wide association studies. As a general method, SmartSVA can be applied to other genomic studies to capture unknown sources of variability.

Journal ArticleDOI
TL;DR: The trained PLEIC-SVM model is able to capture important interaction patterns between ligand and protein residues for one specific target, which is helpful in discarding false positives in postdocking filtering.
Abstract: A major shortcoming of empirical scoring functions is that they often fail to predict binding affinity properly. Removing false positives of docking results is one of the most challenging works in structure-based virtual screening. Postdocking filters, making use of all kinds of experimental structure and activity information, may help in solving the issue. We describe a new method based on detailed protein–ligand interaction decomposition and machine learning. Protein–ligand empirical interaction components (PLEIC) are used as descriptors for support vector machine learning to develop a classification model (PLEIC-SVM) to discriminate false positives from true positives. Experimentally derived activity information is used for model training. An extensive benchmark study on 36 diverse data sets from the DUD-E database has been performed to evaluate the performance of the new method. The results show that the new method performs much better than standard empirical scoring functions in structure-based virtu...

Journal ArticleDOI
12 Oct 2017
TL;DR: An algorithm that uses identifier names to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs, is presented and it is shown that the probability of an argument selection defect increases markedly when methods have more than five arguments.
Abstract: Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs. We evaluate our algorithm at Google on 200 million lines of internal code and 10 million lines of predominantly open-source external code and find defects even in large, mature projects such as OpenJDK, ASM, and the MySQL JDBC. The precision and recall of the algorithm vary depending on a sensitivity threshold. Higher thresholds increase precision, giving a true positive rate of 85%, reporting 459 true positives and 78 false positives. Lower thresholds increase recall but lower the true positive rate, reporting 2,060 true positives and 1,207 false positives. We show that this is an order of magnitude improvement on previous approaches. By analyzing the defects found, we are able to quantify best practice advice for API design and show that the probability of an argument selection defect increases markedly when methods have more than five arguments.

Journal ArticleDOI
12 Oct 2017
TL;DR: In this article, the authors reproduce more than 30,000 merge cases from 50 open source projects, identifying conflicts incorrectly reported by one approach but not by the other (false positives), and conflicts correctly reported by the latter approach but missed by the former (false negatives).
Abstract: While unstructured merge tools rely only on textual analysis to detect and resolve conflicts, semistructured merge tools go further by partially exploiting the syntactic structure and semantics of the involved artifacts. Previous studies compare these merge approaches with respect to the number of reported conflicts, showing, for most projects and merge situations, reduction in favor of semistructured merge. However, these studies do not investigate whether this reduction actually leads to integration effort reduction (productivity) without negative impact on the correctness of the merging process (quality). To analyze that, and better understand how merge tools could be improved, in this paper we reproduce more than 30,000 merges from 50 open source projects, identifying conflicts incorrectly reported by one approach but not by the other (false positives), and conflicts correctly reported by one approach but missed by the other (false negatives). Our results and complementary analysis indicate that, in the studied sample, the number of false positives is significantly reduced when using semistructured merge. We also find evidence that its false positives are easier to analyze and resolve than those reported by unstructured merge. However, we find no evidence that semistructured merge leads to fewer false negatives, and we argue that they are harder to detect and resolve than unstructured merge false negatives. Driven by these findings, we implement an improved semistructured merge tool that further combines both approaches to reduce the false positives and false negatives of semistructured merge. We find evidence that the improved tool, when compared to unstructured merge in our sample, reduces the number of reported conflicts by half, has no additional false positives, has at least 8% fewer false negatives, and is not prohibitively slower.

Proceedings ArticleDOI
02 Apr 2017
TL;DR: This paper proposes to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search, and shows that XMATCH outperforms the existing bug search techniques in terms of accuracy.
Abstract: With the recent increase in security breaches in embedded systems and IoT devices, it becomes increasingly important to search for vulnerabilities directly in binary executables in a cross-platform setting. However, very little has been explored in this domain. The existing efforts are prone to producing considerable false positives, and their results cannot provide explainable evidence for human analysts to eliminate these false positives. In this paper, we propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a bug: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, binary code search on conditional formulas produces significantly higher accuracy and provide meaningful evidence for human analysts to further examine the search results. We have implemented a prototype, XMATCH, and evaluated it using well-known software, including OpenSSL and BusyBox. Experimental results have shown that XMATCH outperforms the existing bug search techniques in terms of accuracy. Moreover, by evaluating 5 recent vulnerabilities, XMATCH provides clear evidence for human analysts to determine if a matched candidate is indeed vulnerable or has been patched.

Journal ArticleDOI
TL;DR: It is very difficult to achieve high performance metrics using only a single feature class therefore hybrid approach in feature selection remains a better choice.
Abstract: Purpose: The aim of this study was to develop a novel technique for lung nodule detection using an optimized feature set. This feature set has been achieved after rigorous experimentation, which has helped in reducing the false positives significantly. Method: The proposed method starts with preprocessing, removing any present noise from input images, followed by lung segmentation using optimal thresholding. Then the image is enhanced using multiscale dot enhancement filtering prior to nodule detection and feature extraction. Finally, classification of lung nodules is achieved using Support Vector Machine (SVM) classifier. The feature set consists of intensity, shape (2D and 3D) and texture features, which have been selected to optimize the sensitivity and reduce false positives. In addition to SVM, some other supervised classifiers like K‐Nearest‐Neighbor (KNN), Decision Tree and Linear Discriminant Analysis (LDA) have also been used for performance comparison. The extracted features have also been compared class‐wise to determine the most relevant features for lung nodule detection. The proposed system has been evaluated using 850 scans from Lung Image Database Consortium (LIDC) dataset and k‐fold cross‐validation scheme. Results: The overall sensitivity has been improved compared to the previous methods and false positives per scan have been reduced significantly. The achieved sensitivities at detection and classification stages are 94.20% and 98.15%, respectively, with only 2.19 false positives per scan. Conclusions: It is very difficult to achieve high performance metrics using only a single feature class therefore hybrid approach in feature selection remains a better choice. Choosing right set of features can improve the overall accuracy of the system by improving the sensitivity and reducing false positives.

Journal ArticleDOI
TL;DR: An event-related potential ERP BCI-based environmental control system that integrates household electrical appliances, a nursing bed, and an intelligent wheelchair to provide daily assistance to paralyzed patients with severe spinal cord injuries is proposed.
Abstract: Objective: This study proposes an event-related potential (ERP) brain-computer interface (BCI)-based environmental control system that integrates household electrical appliances, a nursing bed, and an intelligent wheelchair to provide daily assistance to paralyzed patients with severe spinal cord injuries (SCIs). Methods: An asynchronous mode is used to switch the environmental control system on or off or to select a device (e.g., a TV) for achieving self-paced control. In the asynchronous mode, we introduce several pseudo-keys and a verification mechanism to effectively reduce the false operation rate. By contrast, when the user selects a function of the device (e.g., a TV channel), a synchronous mode is used to improve the accuracy and speed of BCI detection. Two experiments involving six SCI patients were conducted separately in a nursing bed and a wheelchair, and the patients were instructed to control the nursing bed, the wheelchair, and household electrical appliances (an electric light, an air conditioner, and a TV). Results: The average false rate of BCI commands in the control state was 10.4%, whereas the average false operation ratio was 4.9% (a false BCI command might not necessarily results in a false operation according to our system design). During the idle state, there was an average of 0.97 false positives/min, which did not result in any false operations. Conclusion: All SCI patients could use the proposed ERP BCI-based environmental control system satisfactorily. Significance: The proposed ERP-based environmental control system could be used to assist patients with severe SCIs in their daily lives.

Journal ArticleDOI
TL;DR: This work proposes to smooth the outputs of anomaly detectors by online Local Adaptive Multivariate Smoothing (LAMS), which can reduce a large portion of false positives introduced by the anomaly detection by replacing the anomaly detector's output on a network event with an aggregate of its output on all similar network events observed previously.

Journal ArticleDOI
TL;DR: The results indicate that collecting at least three data points in the first phase (Phase A) and at least five datapoints in the second phase ( phase B) is generally sufficient to produce acceptable levels of false positives.
Abstract: The purpose of our study was to examine the probability of observing false positives in nonsimulated data using the dual-criteria methods. We extracted data from published studies to produce a series of 16,927 datasets and then assessed the proportion of false positives for various phase lengths. Our results indicate that collecting at least three data points in the first phase (Phase A) and at least five data points in the second phase (Phase B) is generally sufficient to produce acceptable levels of false positives.

Journal ArticleDOI
TL;DR: Using moral judgment fMRI data, voxelwise thresholding with familywise error correction based on Random Field Theory provides a more precise overlap than either clusterwiseresholding, Bonferroni correction, or false discovery rate correction methods.
Abstract: In fMRI research, the goal of correcting for multiple comparisons is to identify areas of activity that reflect true effects, and thus would be expected to replicate in future studies. Finding an appropriate balance between trying to minimize false positives (Type I error) while not being too stringent and omitting true effects (Type II error) can be challenging. Furthermore, the advantages and disadvantages of these types of errors may differ for different areas of study. In many areas of social neuroscience that involve complex processes and considerable individual differences, such as the study of moral judgment, effects are typically smaller and statistical power weaker, leading to the suggestion that less stringent corrections that allow for more sensitivity may be beneficial and also result in more false positives. Using moral judgment fMRI data, we evaluated four commonly used methods for multiple comparison correction implemented in Statistical Parametric Mapping 12 by examining which method produced the most precise overlap with results from a meta-analysis of relevant studies and with results from nonparametric permutation analyses. We found that voxelwise thresholding with familywise error correction based on Random Field Theory provides a more precise overlap (i.e., without omitting too few regions or encompassing too many additional regions) than either clusterwise thresholding, Bonferroni correction, or false discovery rate correction methods.

Journal ArticleDOI
TL;DR: In this article, the authors present a comparative validation design that is able to detect false positives without the need for an individual-level validation criterion, which is often unavailable, and show that the most widely used crosswise-model implementation produces false positives to a non-ignorable extent.
Abstract: Validly measuring sensitive issues such as norm violations or stigmatizing traits through self-reports in surveys is often problematic. Special techniques for sensitive questions like the Randomized Response Technique (RRT) and, among its variants, the recent crosswise model should generate more honest answers by providing full response privacy. Different types of validation studies have examined whether these techniques actually improve data validity, with varying results. Yet, most of these studies did not consider the possibility of false positives, i.e. that respondents are misclassified as having a sensitive trait even though they actually do not. Assuming that respondents only falsely deny but never falsely admit possessing a sensitive trait, higher prevalence estimates have typically been interpreted as more valid estimates. If false positives occur, however, conclusions drawn under this assumption might be misleading. We present a comparative validation design that is able to detect false positives without the need for an individual-level validation criterion – which is often unavailable. Results show that the most widely used crosswise-model implementation produced false positives to a non-ignorable extent. This defect was not revealed by several previous validation studies that did not consider false positives - apparently a blind spot in past sensitive question research.

Journal ArticleDOI
29 Mar 2017
TL;DR: This paper examined evidence for false negatives in nonsignificant results in three different ways and concluded that false negatives deserve more attention in the current debate on statistical practices in psychology, and they also proposed the adapted Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificantly results.
Abstract: Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This might be unwarranted, since reported statistically nonsignificant findings may just be ‘too good to be false’. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificant results. Simulations show that the adapted Fisher method generally is a powerful method to detect false negatives. We examined evidence for false negatives in the psychology literature in three applications of the adapted Fisher method. These applications indicate that (i) the observed effect size distribution of nonsignificant effects exceeds the expected distribution assuming a null-effect, and approximately two out of three (66.7%) psychology articles reporting nonsignificant results contain evidence for at least one false negative, (ii) nonsignificant results on gender effects contain evidence of true nonzero effects, and (iii) the statistically nonsignificant replications from the Reproducibility Project Psychology (RPP) do not warrant strong conclusions about the absence or presence of true zero effects underlying these nonsignificant results. We conclude that false negatives deserve more attention in the current debate on statistical practices in psychology. Potentially neglecting effects due to a lack of statistical power can lead to a waste of research resources and stifle the scientific discovery process.

Proceedings ArticleDOI
Georgios Kathareios1, Andreea Anghel1, Akos Mate1, Rolf Clauberg1, Mitch Gusat1 
01 Dec 2017
TL;DR: A real-time network unsupervised anomaly detection system that reduces the manual workload by coupling 2 learning stages and achieves 98.5% true and 1.3% false positive rates, while reducing the human intervention rate by 5x.
Abstract: Unsupervised anomaly detection (AD) has shown promise against the frequently new cyberattacks. But, as anomalies are not always malicious, such systems generate prodigious false alarm rates. The resulting manual validation workload often overwhelms the IT operators: it slows down the system reaction by orders of magnitude and ultimately thwarts its applicability. Therefore, we propose a real-time network AD system that reduces the manual workload by coupling 2 learning stages. The first stage performs adaptive unsupervised AD using a shallow autoencoder. The second stage uses a custom nearest-neighbor classifier to filter the false positives by modeling the manual classification. We implement a prototype for 10-50Gbps speeds and evaluate it with traffic from a national network operator: we achieve 98.5% true and 1.3% false positive rates, while reducing the human intervention rate by 5x.