Showing papers on "False positive paradox published in 2017"

PDF

Open Access

Journal Article•DOI•

Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge.

[...]

Arnaud Arindra Adiyoso Setio¹, Alberto Traverso², Thomas de Bel¹, Moira S.N. Berens¹, Cas van den Bogaard¹, Piergiorgio Cerello², Hao Chen³, Qi Dou³, Maria Evelina Fantacci², Bram Geurts¹, Robbert van der Gugten¹, Pheng-Ann Heng³, Bart Jansen⁴, Michael M.J. de Kaste¹, Valentin Kotov¹, Jack Yu-Hung Lin, Jeroen Manders¹, Alexander Sóñora-Mengana⁴, Juan C. García-Naranjo⁵, Evgenia Papavasileiou⁴, Mathias Prokop¹, M. Saletta², Cornelia M. Schaefer-Prokop¹, Ernst T. Scholten¹, Luuk Scholten¹, Miranda M. Snoeren¹, Ernesto Lopez Torres, Jef Vandemeulebroucke⁴, Nicole Walasek¹, Guido Zuidhof¹, Bram van Ginneken¹, Colin Jacobs¹ - Show less +28 more•Institutions (5)

Radboud University Nijmegen¹, Istituto Nazionale di Fisica Nucleare², The Chinese University of Hong Kong³, Vrije Universiteit Brussel⁴, Universidad de Oriente⁵

01 Dec 2017-Medical Image Analysis

TL;DR: The LUNA16 challenge is described, an objective evaluation framework for automatic nodule detection algorithms using the largest publicly available reference database of chest CT scans, the LIDC‐IDRI data set, and the results so far are presented.

...read moreread less

810 citations

Posted Content•

Detecting Cancer Metastases on Gigapixel Pathology Images

[...]

Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Subhashini Venugopalan, Aleksey S Boyko, Aleksei Timofeev, Philip Q Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe - Show less +9 more

03 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x100,000 pixels and achieves image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides.

...read moreread less

Abstract: Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.

...read moreread less

518 citations

Journal Article•DOI•

Presidential Address: The Scientific Outlook in Financial Economics

[...]

Campbell R. Harvey

01 Aug 2017-Journal of Finance

TL;DR: In this paper, the authors explore the meaning and limitations of a p-value and propose a simple alternative (the minimum Bayes factor) for a robust, transparent research culture in financial economics, and offer some thoughts on the importance of risk-taking (from the perspective of authors and editors).

...read moreread less

Abstract: Given the competition for top journal space, there is an incentive to produce “significant” results. With the combination of unreported tests, lack of adjustment for multiple tests, and direct and indirect p-hacking, many of the results being published will fail to hold up in the future. In addition, there are basic issues with the interpretation of statistical significance. Increasing thresholds may be necessary, but still may not be sufficient: if the effect being studied is rare, even t > 3 will produce a large number of false positives. Here I explore the meaning and limitations of a p-value. I offer a simple alternative (the minimum Bayes factor). I present guidelines for a robust, transparent research culture in financial economics. Finally, I offer some thoughts on the importance of risk-taking (from the perspective of authors and editors) to advance our field. SUMMARY Empirical research in financial economics relies too much on p-values, which are poorly understood in the first place. Journals want to publish papers with positive results and this incentivizes researchers to engage in data mining and “p-hacking.” The outcome will likely be an embarrassing number of false positives—effects that will not be repeated in the future. The minimum Bayes factor (which is a function of the p-value) combined with prior odds provides a simple solution that can be reported alongside the usual p-value. The Bayesianized p-value answers the question: What is the probability that the null is true? The same technique can be used to answer: What threshold of t-statistic do I need so that there is only a 5% chance that the null is true? The threshold depends on the economic plausibility of the hypothesis.

...read moreread less

253 citations

Journal Article•DOI•

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

[...]

Alexa B. R. McIntyre¹, Rachid Ounit², Ebrahim Afshinnekoo¹, Ebrahim Afshinnekoo³, Robert J. Prill⁴, Elizabeth Henaff¹, Noah Alexander¹, Samuel S. Minot, David Danko¹, Jonathan Foox¹, Sofia Ahsanuddin¹, Scott Tighe⁵, Nur A. Hasan⁶, Poorani Subramanian, Kelly Moffat, Shawn Levy, Stefano Lonardi², Nick Greenfield, Rita R. Colwell⁷, Gail L. Rosen⁸, Christopher E. Mason¹ - Show less +17 more•Institutions (8)

Cornell University¹, University of California, Riverside², New York Medical College³, IBM⁴, University of Vermont⁵, University of Maryland, College Park⁶, Johns Hopkins University⁷, Drexel University⁸

21 Sep 2017-Genome Biology

TL;DR: It is shown that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

...read moreread less

Abstract: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

...read moreread less

247 citations

Journal Article•DOI•

One Step Forward for Reducing False Positive and False Negative Compound Identifications from Mass Spectrometry Metabolomics Data: New Algorithms for Constructing Extracted Ion Chromatograms and Detecting Chromatographic Peaks.

[...]

Owen D. Myers¹, Susan Sumner², Shuzhao Li³, Stephen Barnes⁴, Xiuxia Du¹ - Show less +1 more•Institutions (4)

University of North Carolina at Charlotte¹, University of North Carolina at Chapel Hill², Emory University³, University of Alabama at Birmingham⁴

17 Aug 2017-Analytical Chemistry

TL;DR: New algorithms that carry out the sequential construction of EICs and detection of E IC peaks are developed and evidence that these new algorithms detect significantly fewer false positives is presented.

...read moreread less

Abstract: False positive and false negative peaks detected from extracted ion chromatograms (EIC) are an urgent problem with existing software packages that preprocess untargeted liquid or gas chromatography–mass spectrometry metabolomics data because they can translate downstream into spurious or missing compound identifications. We have developed new algorithms that carry out the sequential construction of EICs and detection of EIC peaks. We compare the new algorithms to two popular software packages XCMS and MZmine 2 and present evidence that these new algorithms detect significantly fewer false positives. Regarding the detection of compounds known to be present in the data, the new algorithms perform at least as well as XCMS and MZmine 2. Furthermore, we present evidence that mass tolerance in m/z should be favored rather than mass tolerance in ppm in the process of constructing EICs. The mass tolerance parameter plays a critical role in the EIC construction process and can have immense impact on the detection ...

...read moreread less

232 citations

Posted Content•

Capsule Network Performance on Complex Data

[...]

Edgar Xi, Selina Bing, Yang Jin

10 Dec 2017-arXiv: Machine Learning

TL;DR: The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of0.39%.

...read moreread less

Abstract: In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN's have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN's: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN's classify the test data as the object, disregarding features' relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN's would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a recent paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset.

...read moreread less

163 citations

Journal Article•DOI•

False Positives and Other Statistical Errors in Standard Analyses of Eye Movements in Reading.

[...]

Titus von der Malsburg¹, Bernhard Angele²•Institutions (2)

University of Potsdam¹, Bournemouth University²

01 Jun 2017-Journal of Memory and Language

TL;DR: A computational investigation of the various types of statistical errors than can occur in studies of reading behavior using Monte Carlo simulations shows that, contrary to conventional wisdom, false positives are increased to unacceptable levels when no corrections are applied.

...read moreread less

103 citations

Journal Article•DOI•

An Accurate Security Game for Low-Resource IoT Devices

[...]

Hichem Sedjelmaci, Sidi Mohamed Senouci, Tarik Taleb¹•Institutions (1)

Aalto University¹

05 May 2017-IEEE Transactions on Vehicular Technology

TL;DR: Simulation results show that this lightweight anomaly detection outperforms current anomaly detection techniques, since in scaling mode it requires low energy consumption to detect the attacks with high detection and low false positive rates, almost 93% and 2%, respectively.

...read moreread less

Abstract: The Internet of Things (IoT) technology incorporates a large number of heterogeneous devices connected to untrusted networks. Nevertheless, securing IoT devices is a fundamental issue due to the relevant information handled in IoT networks. The intrusion detection system (IDS) is the most commonly used technique to detect intruders and acts as a second wall of defense when cryptography is broken. This is achieved by combining the advantages of anomaly and signature detection techniques, which are characterized by high detection rates and low false positives, respectively. To achieve a high detection rate, the anomaly detection technique relies on a learning algorithm to model the normal behavior of a node, and when a new attack pattern (often known as signature) is detected, it will be modeled with a set of rules. This latter is used by the signature detection technique for attack confirmation. Activating the anomaly detection technique simultaneously at each low-resource IoT device and all the time could generate a high-energy consumption. Thereby, we propose a game theoretic technique to activate anomaly detection technique only when a new attack's signature is expected to occur; hence, a balance between detection and false positive rates, and energy consumption is achieved. Even by combining between these two detection techniques, we observed that the number of false positives is still non null (almost equal to 5%). Thereby, to decrease further the false positive rate, a reputation model based on game theory is proposed. Simulation results show that this lightweight anomaly detection outperforms current anomaly detection techniques, since in scaling mode (i.e., when the number of IoT devices and attackers are high) it requires low energy consumption to detect the attacks with high detection and low false positive rates, almost 93% and 2%, respectively.

...read moreread less

87 citations

Journal Article•DOI•

Dealing with false-positive and false-negative errors about species occurrence at multiple levels

[...]

Gurutzeta Guillera-Arroita¹, José J. Lahoz-Monfort¹, Anthony van Rooyen, Andrew Weeks¹, Reid Tingley¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

01 Sep 2017-Methods in Ecology and Evolution

TL;DR: A two-stage occupancy-detection model is developed and examined for the analysis of species detection data with false-positive and false-negative errors at multiple levels, and it is demonstrated that the model is not identifiable if only survey data prone to false positives are available.

...read moreread less

Abstract: Summary Accurate knowledge of species occurrence is fundamental to a wide variety of ecological, evolutionary and conservation applications. Assessing the presence or absence of species at sites is often complicated by imperfect detection, with different mechanisms potentially contributing to false-negative and/or false-positive errors at different sampling stages. Ambiguities in the data mean that estimation of relevant parameters might be confounded unless additional information is available to resolve those uncertainties. Here, we consider the analysis of species detection data with false-positive and false-negative errors at multiple levels. We develop and examine a two-stage occupancy-detection model for this purpose. We use profile likelihoods for identifiability analysis and estimation, and study the types of additional data required for reliable estimation. We test the model with simulated data, and then analyse data from environmental DNA (eDNA) surveys of four Australian frog species. In our case study, we consider that false positives may arise due to contamination at the water sample and quantitative PCR-sample levels, whereas false negatives may arise due to eDNA not being captured in a field sample, or due to the sensitivity of laboratory tests. We augment our eDNA survey data with data from aural surveys and laboratory calibration experiments. We demonstrate that the two-stage model with false-positive and false-negative errors is not identifiable if only survey data prone to false positives are available. At least two sources of extra information are required for reliable estimation (e.g. records from a survey method with unambiguous detections, and a calibration experiment). Alternatively, identifiability can be achieved by setting plausible bounds on false detection rates as prior information in a Bayesian setting. The results of our case study matched our simulations with respect to data requirements, and revealed false-positive rates greater than zero for all species. We provide statistical modelling tools to account for uncertainties in species occurrence survey data when false negatives and false positives could occur at multiple sampling stages. Such data are often needed to support management and policy decisions. Dealing with these uncertainties is relevant for traditional survey methods, but also for promising new techniques, such as eDNA sampling.

...read moreread less

82 citations

Journal Article•DOI•

Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling.

[...]

Gang Chen¹, Paul A. Taylor¹, Yong-Wook Shin², Richard C. Reynolds¹, Robert W. Cox¹ - Show less +1 more•Institutions (2)

National Institutes of Health¹, University of Ulsan²

15 Feb 2017-NeuroImage

TL;DR: This paper seeks to explicate and adopt a parametric approach through linear mixed‐effects (LME) modeling for studying the ISC values, building on the previous correlation framework, with the benefit that the LME platform offers wider adaptability, more powerful interpretations, and quality control checking capability than nonparametric methods.

...read moreread less

75 citations

Proceedings Article•DOI•

ChromaTag: A Colored Marker and Fast Detection Algorithm

[...]

Joseph DeGol¹, Timothy Bretl, Derek Hoiem¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 2017

TL;DR: ChromaTag as discussed by the authors uses opponent colors to limit and quickly reject initial false detections and grayscale for precise localization for real-time fiducial marker detection, which is significantly faster than current marker detection algorithms.

...read moreread less

Abstract: Current fiducial marker detection algorithms rely on marker IDs for false positive rejection. Time is wasted on potential detections that will eventually be rejected as false positives. We introduce ChromaTag, a fiducial marker and detection algorithm designed to use opponent colors to limit and quickly reject initial false detections and grayscale for precise localization. Through experiments, we show that ChromaTag is significantly faster than current fiducial markers while achieving similar or better detection accuracy. We also show how tag size and viewing direction effect detection accuracy. Our contribution is significant because fiducial markers are often used in real-time applications (e.g. marker assisted robot navigation) where heavy computation is required by other parts of the system.

...read moreread less

Journal Article•DOI•

Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach

[...]

Rohan L. Fernando¹, Ali Toosi¹, Anna Wolc¹, Dorian J. Garrick¹, Jack C. M. Dekkers¹ - Show less +1 more•Institutions (1)

Iowa State University¹

01 Jun 2017-Journal of Agricultural Biological and Environmental Statistics

TL;DR: This paper discusses how Bayesian multiple-regression methods that are used for whole-genome prediction can be adapted for GWAS and argues that controlling the posterior type I error rate is more suitable than controlling the genomewise error rate for controlling false positives in GWAS.

...read moreread less

Abstract: Data that are collected for whole-genome prediction can also be used for genome-wide association studies (GWAS). This paper discusses how Bayesian multiple-regression methods that are used for whole-genome prediction can be adapted for GWAS. It is argued here that controlling the posterior type I error rate (PER) is more suitable than controlling the genomewise error rate (GER) for controlling false positives in GWAS. It is shown here that under ideal conditions, i.e., when the model is correctly specified, PER can be controlled by using Bayesian posterior probabilities that are easy to obtain. Computer simulation was used to examine the properties of this Bayesian approach when the ideal conditions were not met. Results indicate that even then useful inferences can be made.

...read moreread less

Journal Article•DOI•

Detection of microaneurysms in retinal images using an ensemble classifier

[...]

M.M. Habib¹, R.A. Welikala¹, Andreas Hoppe¹, Christopher G. Owen², Alicja R. Rudnicka², Sarah Barman¹ - Show less +2 more•Institutions (2)

Kingston University¹, St George's, University of London²

01 Jan 2017-Informatics in Medicine Unlocked

TL;DR: A novel combination of algorithms for automated microaneurysm (MA) detection in retinal images shows that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates and maintains consistent performance across datasets.

...read moreread less

Journal Article•DOI•

Data classification with binary response through the Boosting algorithm and logistic regression

[...]

Fortunato Silva de Menezes, Gilberto Rodrigues Liska, Marcelo Ângelo Cirillo, Mario Javier Ferrua Vivanco

01 Mar 2017-Expert Systems With Applications

TL;DR: The logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.

...read moreread less

Abstract: The task of classifying is natural to humans, but there are situations in which a person is not best suited to perform this function, which creates the need for automatic methods of classification. Traditional methods, such as logistic regression, are commonly used in this type of situation, but they lack robustness and accuracy. These methods do not not work very well when the data or when there is noise in the data, situations that are common in expert and intelligent systems. Due to the importance and the increasing complexity of problems of this type, there is a need for methods that provide greater accuracy and interpretability of the results. Among these methods, is Boosting, which operates sequentially by applying a classification algorithm to reweighted versions of the training data set. It was recently shown that Boosting may also be viewed as a method for functional estimation. The purpose of the present study was to compare the logistic regressions estimated by the maximum likelihood model (LRMML) and the logistic regression model estimated using the Boosting algorithm, specifically the Binomial Boosting algorithm (LRMBB), and to select the model with the better fit and discrimination capacity in the situation of presence(absence) of a given property (in this case, binary classification). To illustrate this situation, the example used was to classify the presence (absence) of coronary heart disease (CHD) as a function of various biological variables collected from patients. It is shown in the simulations results based on the strength of the indications that the LRMBB model is more appropriate than the LRMML model for the adjustment of data sets with several covariables and noisy data. The following sections report lower values of the information criteria AIC and BIC for the LRMBB model and that the Hosmer–Lemeshow test exhibits no evidence of a bad fit for the LRMBB model. The LRMBB model also presented a higher AUC, sensitivity, specificity and accuracy and lower values of false positives rates and false negatives rates, making it a model with better discrimination power compared to the LRMML model. Based on these results, the logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.

...read moreread less

Journal Article•DOI•

Measurement artefacts lead to false positives in the study of birdsong in noise

[...]

Henrik Brumm¹, Sue Anne Zollinger¹, Petri T. Niemelä², Philipp Sprau²•Institutions (2)

Max Planck Society¹, Ludwig Maximilian University of Munich²

01 Nov 2017-Methods in Ecology and Evolution

TL;DR: It is demonstrated that inappropriate methodology in acoustic analysis can yield false positives with effect sizes as large, or even larger, than those reported in published studies, and psychological observer biases led to false positives.

...read moreread less

Abstract: Summary Numerous studies over the past decade have reported correlations between elevated levels of anthropogenic noise and a rise in the minimum frequency of acoustic signals of animals living in noisy habitats. This pattern appears to be occurring globally, and higher pitched signals have been hypothesized to be adaptive changes that reduce masking by low-frequency traffic noise. However, the sound analysis methods most often used in these studies are prone to measurement errors that can result in false positives. In addition, the commonly used method of measuring frequencies visually from spectrograms might also lead to observer-expectancy biases that could exacerbate measurement errors. We conducted an experiment to (i) quantify the size and type of errors that result from ‘eye-balling’ frequency measurements with cursors placed manually on spectrograms of signals recorded in noise and no-noise conditions, and (ii) to test whether observer expectations lead to significant errors in frequency measurements. We asked 54 volunteers, blind to the true intention of our study, to visually measure the minimum frequency of a variety of natural and synthesized bird sounds, recorded either in noise, or no-noise conditions. Test subjects were either informed or uninformed about the hypothesized results of the measurements. Our results demonstrate that inappropriate methodology in acoustic analysis can yield false positives with effect sizes as large, or even larger, than those reported in published studies. In addition to these measurement artefacts, psychological observer biases also led to false positives – when observers expected signals to have higher minimum frequencies in noise, they measured significantly higher minimum frequencies than uninformed observers, who had not been primed with any expectation. The use of improper analysis methods in bioacoustics can lead to the publication of spurious results. We discuss alternative methods that yield unbiased frequency measures and we caution that it is imperative for researchers to familiarize themselves both with the functions and limitations of their sound analysis programmes. In addition, observer-expectancy biases are a potential source of error not only in the field of bioacoustics, but in any situation where measurements can be influenced by human subjectivity.

...read moreread less

Journal Article•DOI•

Statistical tests of differential susceptibility: Performance, limitations, and improvements.

[...]

Marco Del Giudice¹•Institutions (1)

University of New Mexico¹

05 Jan 2017-Development and Psychopathology

TL;DR: In this paper, Monte Carlo simulations were used to explore the functioning of three commonly used tests proposed by Roisman et al. (2012), and a revised test based on a broader window of proportion of interaction index values (between.20 and.80) was proposed.

...read moreread less

Abstract: Statistical tests of differential susceptibility have become standard in the empirical literature, and are routinely used to adjudicate between alternative developmental hypotheses. However, their performance and limitations have never been systematically investigated. In this paper I employ Monte Carlo simulations to explore the functioning of three commonly used tests proposed by Roisman et al. (2012). Simulations showed that critical tests of differential susceptibility require considerably larger samples than standard power calculations would suggest. The results also showed that existing criteria for differential susceptibility based on the proportion of interaction index (i.e., values between .40 and .60) are especially likely to produce false negatives and highly sensitive to assumptions about interaction symmetry. As an initial response to these problems, I propose a revised test based on a broader window of proportion of interaction index values (between .20 and .80). Additional simulations showed that the revised test outperforms existing tests of differential susceptibility, considerably improving detection with little effect on the rate of false positives. I conclude by noting the limitations of a purely statistical approach to differential susceptibility, and discussing the implications of the present results for the interpretation of published findings and the design of future studies in this area.

...read moreread less

Journal Article•DOI•

A systematic comparison of statistical methods to detect interactions in exposome-health associations

[...]

Jose Barrera-Gómez¹, Lydiane Agier², Lützen Portengen³, Marc Chadeau-Hyam⁴, Lise Giorgis-Allemand², Valérie Siroux², Oliver Robinson, Jelle Vlaanderen³, Juan R. González¹, Mark J. Nieuwenhuijsen¹, Paolo Vineis⁴, Martine Vrijheid¹, Roel Vermeulen³, Roel Vermeulen⁴, Rémy Slama², Xavier Basagaña¹ - Show less +12 more•Institutions (4)

Pompeu Fabra University¹, University of Grenoble², Utrecht University³, Imperial College London⁴

14 Jul 2017-Environmental Health

TL;DR: A simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions, finding GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.

...read moreread less

Abstract: There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions. Simulations were based on an exposome including 237 exposures with a realistic correlation structure. We considered several statistical regression-based methods, including two-step Environment-Wide Association Study (EWAS2), the Deletion/Substitution/Addition (DSA) algorithm, the Least Absolute Shrinkage and Selection Operator (LASSO), Group-Lasso INTERaction-NET (GLINTERNET), a three-step method based on regression trees and finally Boosted Regression Trees (BRT). We assessed the performance of each method in terms of model size, predictive ability, sensitivity and false discovery rate. GLINTERNET and DSA had better overall performance than the other methods, with GLINTERNET having better properties in terms of selecting the true predictors (sensitivity) and of predictive ability, while DSA had a lower number of false positives. In terms of ability to capture interaction terms, GLINTERNET and DSA had again the best performances, with the same trade-off between sensitivity and false discovery proportion. When GLINTERNET and DSA failed to select an exposure truly associated with the outcome, they tended to select a highly correlated one. When interactions were not present in the data, using variable selection methods that allowed for interactions had only slight costs in performance compared to methods that only searched for main effects. GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.

...read moreread less

Journal Article•DOI•

Fast and robust adjustment of cell mixtures in epigenome-wide association studies with SmartSVA

[...]

Jun Chen¹, Ehsan Behnam¹, Jinyan Huang², Miriam F. Moffatt³, Daniel J. Schaid¹, Liming Liang⁴, Xihong Lin⁴ - Show less +3 more•Institutions (4)

Mayo Clinic¹, Shanghai Jiao Tong University², National Institutes of Health³, Harvard University⁴

26 May 2017-BMC Genomics

TL;DR: SmartSVA corrects the limitation of traditional SVA under highly confounded scenarios by imposing an explicit convergence criterion and improves the computational efficiency for large datasets and can be applied to other genomic studies to capture unknown sources of variability.

...read moreread less

Abstract: One problem that plagues epigenome-wide association studies is the potential confounding due to cell mixtures when purified target cells are not available. Reference-free adjustment of cell mixtures has become increasingly popular due to its flexibility and simplicity. However, existing methods are still not optimal: increased false positive rates and reduced statistical power have been observed in many scenarios. We develop SmartSVA, an optimized surrogate variable analysis (SVA) method, for fast and robust reference-free adjustment of cell mixtures. SmartSVA corrects the limitation of traditional SVA under highly confounded scenarios by imposing an explicit convergence criterion and improves the computational efficiency for large datasets. Compared to traditional SVA, SmartSVA achieves an order-of-magnitude speedup and better false positive control. It protects the signals when capturing the cell mixtures, resulting in significant power increase while controlling for false positives. Through extensive simulations and real data applications, we demonstrate a better performance of SmartSVA than the existing methods. SmartSVA is a fast and robust method for reference-free adjustment of cell mixtures for epigenome-wide association studies. As a general method, SmartSVA can be applied to other genomic studies to capture unknown sources of variability.

...read moreread less

Journal Article•DOI•

Protein-Ligand Empirical Interaction Components for Virtual Screening.

[...]

Yuna Yan¹, Yuna Yan², Wang Weijun¹, Wang Weijun², Zhaoxi Sun¹, Zhaoxi Sun², John Z. H. Zhang², John Z. H. Zhang¹, Changge Ji², Changge Ji¹ - Show less +6 more•Institutions (2)

New York University Shanghai¹, East China Normal University²

18 Jul 2017-Journal of Chemical Information and Modeling

TL;DR: The trained PLEIC-SVM model is able to capture important interaction patterns between ligand and protein residues for one specific target, which is helpful in discarding false positives in postdocking filtering.

...read moreread less

Abstract: A major shortcoming of empirical scoring functions is that they often fail to predict binding affinity properly. Removing false positives of docking results is one of the most challenging works in structure-based virtual screening. Postdocking filters, making use of all kinds of experimental structure and activity information, may help in solving the issue. We describe a new method based on detailed protein–ligand interaction decomposition and machine learning. Protein–ligand empirical interaction components (PLEIC) are used as descriptors for support vector machine learning to develop a classification model (PLEIC-SVM) to discriminate false positives from true positives. Experimentally derived activity information is used for model training. An extensive benchmark study on 36 diverse data sets from the DUD-E database has been performed to evaluate the performance of the new method. The results show that the new method performs much better than standard empirical scoring functions in structure-based virtu...

...read moreread less

Journal Article•DOI•

Detecting argument selection defects

[...]

Andrew Rice¹, Edward Aftandilian², Ciera Jaspan², Emily Johnston², Michael Pradel³, Yulissa Arroyo-Paredes⁴ - Show less +2 more•Institutions (4)

University of Cambridge¹, Google², Technische Universität Darmstadt³, Columbia University⁴

12 Oct 2017

TL;DR: An algorithm that uses identifier names to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs, is presented and it is shown that the probability of an argument selection defect increases markedly when methods have more than five arguments.

...read moreread less

Abstract: Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs. We evaluate our algorithm at Google on 200 million lines of internal code and 10 million lines of predominantly open-source external code and find defects even in large, mature projects such as OpenJDK, ASM, and the MySQL JDBC. The precision and recall of the algorithm vary depending on a sensitivity threshold. Higher thresholds increase precision, giving a true positive rate of 85%, reporting 459 true positives and 78 false positives. Lower thresholds increase recall but lower the true positive rate, reporting 2,060 true positives and 1,207 false positives. We show that this is an order of magnitude improvement on previous approaches. By analyzing the defects found, we are able to quantify best practice advice for API design and show that the probability of an argument selection defect increases markedly when methods have more than five arguments.

...read moreread less

Journal Article•DOI•

Evaluating and improving semistructured merge

[...]

Guilherme Cavalcanti¹, Paulo Borba¹, Paola Accioly¹•Institutions (1)

Federal University of Pernambuco¹

12 Oct 2017

TL;DR: In this article, the authors reproduce more than 30,000 merge cases from 50 open source projects, identifying conflicts incorrectly reported by one approach but not by the other (false positives), and conflicts correctly reported by the latter approach but missed by the former (false negatives).

...read moreread less

Abstract: While unstructured merge tools rely only on textual analysis to detect and resolve conflicts, semistructured merge tools go further by partially exploiting the syntactic structure and semantics of the involved artifacts. Previous studies compare these merge approaches with respect to the number of reported conflicts, showing, for most projects and merge situations, reduction in favor of semistructured merge. However, these studies do not investigate whether this reduction actually leads to integration effort reduction (productivity) without negative impact on the correctness of the merging process (quality). To analyze that, and better understand how merge tools could be improved, in this paper we reproduce more than 30,000 merges from 50 open source projects, identifying conflicts incorrectly reported by one approach but not by the other (false positives), and conflicts correctly reported by one approach but missed by the other (false negatives). Our results and complementary analysis indicate that, in the studied sample, the number of false positives is significantly reduced when using semistructured merge. We also find evidence that its false positives are easier to analyze and resolve than those reported by unstructured merge. However, we find no evidence that semistructured merge leads to fewer false negatives, and we argue that they are harder to detect and resolve than unstructured merge false negatives. Driven by these findings, we implement an improved semistructured merge tool that further combines both approaches to reduce the false positives and false negatives of semistructured merge. We find evidence that the improved tool, when compared to unstructured merge in our sample, reduces the number of reported conflicts by half, has no additional false positives, has at least 8% fewer false negatives, and is not prohibitively slower.

...read moreread less

Proceedings Article•DOI•

Extracting Conditional Formulas for Cross-Platform Bug Search

[...]

Qian Feng¹, Minghua Wang², Mu Zhang³, Rundong Zhou¹, Andrew Henderson¹, Heng Yin⁴ - Show less +2 more•Institutions (4)

Syracuse University¹, Baidu², Princeton University³, University of California, Riverside⁴

02 Apr 2017

TL;DR: This paper proposes to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search, and shows that XMATCH outperforms the existing bug search techniques in terms of accuracy.

...read moreread less

Abstract: With the recent increase in security breaches in embedded systems and IoT devices, it becomes increasingly important to search for vulnerabilities directly in binary executables in a cross-platform setting. However, very little has been explored in this domain. The existing efforts are prone to producing considerable false positives, and their results cannot provide explainable evidence for human analysts to eliminate these false positives. In this paper, we propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a bug: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, binary code search on conditional formulas produces significantly higher accuracy and provide meaningful evidence for human analysts to further examine the search results. We have implemented a prototype, XMATCH, and evaluated it using well-known software, including OpenSSL and BusyBox. Experimental results have shown that XMATCH outperforms the existing bug search techniques in terms of accuracy. Moreover, by evaluating 5 recent vulnerabilities, XMATCH provides clear evidence for human analysts to determine if a matched candidate is indeed vulnerable or has been patched.

...read moreread less

Journal Article•DOI•

Fully automatic detection of lung nodules in CT images using a hybrid feature set

[...]

Furqan Shaukat¹, Gulistan Raja¹, Ali Gooya², Alejandro F. Frangi²•Institutions (2)

University of Engineering and Technology¹, University of Sheffield²

01 Jul 2017-Medical Physics

TL;DR: It is very difficult to achieve high performance metrics using only a single feature class therefore hybrid approach in feature selection remains a better choice.

...read moreread less

Abstract: Purpose: The aim of this study was to develop a novel technique for lung nodule detection using an optimized feature set. This feature set has been achieved after rigorous experimentation, which has helped in reducing the false positives significantly. Method: The proposed method starts with preprocessing, removing any present noise from input images, followed by lung segmentation using optimal thresholding. Then the image is enhanced using multiscale dot enhancement filtering prior to nodule detection and feature extraction. Finally, classification of lung nodules is achieved using Support Vector Machine (SVM) classifier. The feature set consists of intensity, shape (2D and 3D) and texture features, which have been selected to optimize the sensitivity and reduce false positives. In addition to SVM, some other supervised classifiers like K‐Nearest‐Neighbor (KNN), Decision Tree and Linear Discriminant Analysis (LDA) have also been used for performance comparison. The extracted features have also been compared class‐wise to determine the most relevant features for lung nodule detection. The proposed system has been evaluated using 850 scans from Lung Image Database Consortium (LIDC) dataset and k‐fold cross‐validation scheme. Results: The overall sensitivity has been improved compared to the previous methods and false positives per scan have been reduced significantly. The achieved sensitivities at detection and classification stages are 94.20% and 98.15%, respectively, with only 2.19 false positives per scan. Conclusions: It is very difficult to achieve high performance metrics using only a single feature class therefore hybrid approach in feature selection remains a better choice. Choosing right set of features can improve the overall accuracy of the system by improving the sensitivity and reducing false positives.

...read moreread less

Journal Article•DOI•

A BCI-Based Environmental Control System for Patients With Severe Spinal Cord Injuries

[...]

Rui Zhang, Qihong Wang¹, Kai Li², Shenghong He², Si Qin², Zhenghui Feng¹, Yang Chen¹, Pingxia Song¹, Tingyan Yang¹, Yuandong Zhang¹, Zhu Liang Yu², Yaohua Hu, Ming Shao¹, Yuanqing Li² - Show less +10 more•Institutions (2)

Chongqing Medical University¹, China University of Technology²

09 Jan 2017-IEEE Transactions on Biomedical Engineering

TL;DR: An event-related potential ERP BCI-based environmental control system that integrates household electrical appliances, a nursing bed, and an intelligent wheelchair to provide daily assistance to paralyzed patients with severe spinal cord injuries is proposed.

...read moreread less

Abstract: Objective: This study proposes an event-related potential (ERP) brain-computer interface (BCI)-based environmental control system that integrates household electrical appliances, a nursing bed, and an intelligent wheelchair to provide daily assistance to paralyzed patients with severe spinal cord injuries (SCIs). Methods: An asynchronous mode is used to switch the environmental control system on or off or to select a device (e.g., a TV) for achieving self-paced control. In the asynchronous mode, we introduce several pseudo-keys and a verification mechanism to effectively reduce the false operation rate. By contrast, when the user selects a function of the device (e.g., a TV channel), a synchronous mode is used to improve the accuracy and speed of BCI detection. Two experiments involving six SCI patients were conducted separately in a nursing bed and a wheelchair, and the patients were instructed to control the nursing bed, the wheelchair, and household electrical appliances (an electric light, an air conditioner, and a TV). Results: The average false rate of BCI commands in the control state was 10.4%, whereas the average false operation ratio was 4.9% (a false BCI command might not necessarily results in a false operation according to our system design). During the idle state, there was an average of 0.97 false positives/min, which did not result in any false operations. Conclusion: All SCI patients could use the proposed ERP BCI-based environmental control system satisfactorily. Significance: The proposed ERP-based environmental control system could be used to assist patients with severe SCIs in their daily lives.

...read moreread less

Journal Article•DOI•

Reducing false positives of network anomaly detection by local adaptive multivariate smoothing

[...]

Martin Grill¹, Martin Grill², Tomáš Pevný², Tomáš Pevný¹, Martin Rehak¹, Martin Rehak² - Show less +2 more•Institutions (2)

Czech Technical University in Prague¹, Cisco Systems, Inc.²

01 Feb 2017-Journal of Computer and System Sciences

TL;DR: This work proposes to smooth the outputs of anomaly detectors by online Local Adaptive Multivariate Smoothing (LAMS), which can reduce a large portion of false positives introduced by the anomaly detection by replacing the anomaly detector's output on a network event with an aggregate of its output on all similar network events observed previously.

...read moreread less

Journal Article•DOI•

Using the dual-criteria methods to supplement visual inspection: An analysis of nonsimulated data.

[...]

Marc J. Lanovaz¹, Sarah C. Huxley¹, Marie-Michèle Dufour¹•Institutions (1)

Université de Montréal¹

01 Jul 2017-Journal of Applied Behavior Analysis

TL;DR: The results indicate that collecting at least three data points in the first phase (Phase A) and at least five datapoints in the second phase ( phase B) is generally sufficient to produce acceptable levels of false positives.

...read moreread less

Abstract: The purpose of our study was to examine the probability of observing false positives in nonsimulated data using the dual-criteria methods. We extracted data from published studies to produce a series of 16,927 datasets and then assessed the proportion of false positives for various phase lengths. Our results indicate that collecting at least three data points in the first phase (Phase A) and at least five data points in the second phase (Phase B) is generally sufficient to produce acceptable levels of false positives.

...read moreread less

Journal Article•DOI•

Evaluating methods of correcting for multiple comparisons implemented in SPM12 in social neuroscience fMRI studies: an example from moral psychology

[...]

Hyemin Han¹, Andrea L. Glenn¹•Institutions (1)

University of Alabama¹

15 May 2017-Social Neuroscience

TL;DR: Using moral judgment fMRI data, voxelwise thresholding with familywise error correction based on Random Field Theory provides a more precise overlap than either clusterwiseresholding, Bonferroni correction, or false discovery rate correction methods.

...read moreread less

Abstract: In fMRI research, the goal of correcting for multiple comparisons is to identify areas of activity that reflect true effects, and thus would be expected to replicate in future studies. Finding an appropriate balance between trying to minimize false positives (Type I error) while not being too stringent and omitting true effects (Type II error) can be challenging. Furthermore, the advantages and disadvantages of these types of errors may differ for different areas of study. In many areas of social neuroscience that involve complex processes and considerable individual differences, such as the study of moral judgment, effects are typically smaller and statistical power weaker, leading to the suggestion that less stringent corrections that allow for more sensitivity may be beneficial and also result in more false positives. Using moral judgment fMRI data, we evaluated four commonly used methods for multiple comparison correction implemented in Statistical Parametric Mapping 12 by examining which method produced the most precise overlap with results from a meta-analysis of relevant studies and with results from nonparametric permutation analyses. We found that voxelwise thresholding with familywise error correction based on Random Field Theory provides a more precise overlap (i.e., without omitting too few regions or encompassing too many additional regions) than either clusterwise thresholding, Bonferroni correction, or false discovery rate correction methods.

...read moreread less

Journal Article•DOI•

Uncovering a Blind Spot in Sensitive Question Research: False Positives Undermine the Crosswise-Model RRT

[...]

Marc Höglinger¹, Andreas Diekmann•Institutions (1)

Zürcher Fachhochschule¹

01 Jan 2017-Political Analysis

TL;DR: In this article, the authors present a comparative validation design that is able to detect false positives without the need for an individual-level validation criterion, which is often unavailable, and show that the most widely used crosswise-model implementation produces false positives to a non-ignorable extent.

...read moreread less

Abstract: Validly measuring sensitive issues such as norm violations or stigmatizing traits through self-reports in surveys is often problematic. Special techniques for sensitive questions like the Randomized Response Technique (RRT) and, among its variants, the recent crosswise model should generate more honest answers by providing full response privacy. Different types of validation studies have examined whether these techniques actually improve data validity, with varying results. Yet, most of these studies did not consider the possibility of false positives, i.e. that respondents are misclassified as having a sensitive trait even though they actually do not. Assuming that respondents only falsely deny but never falsely admit possessing a sensitive trait, higher prevalence estimates have typically been interpreted as more valid estimates. If false positives occur, however, conclusions drawn under this assumption might be misleading. We present a comparative validation design that is able to detect false positives without the need for an individual-level validation criterion – which is often unavailable. Results show that the most widely used crosswise-model implementation produced false positives to a non-ignorable extent. This defect was not revealed by several previous validation studies that did not consider false positives - apparently a blind spot in past sensitive question research.

...read moreread less

Journal Article•DOI•

Too Good to be False: Nonsignificant Results Revisited

[...]

Chris H.J. Hartgerink¹, Jelte M. Wicherts¹, M.A.L.M. van Assen¹•Institutions (1)

Tilburg University¹

29 Mar 2017

TL;DR: This paper examined evidence for false negatives in nonsignificant results in three different ways and concluded that false negatives deserve more attention in the current debate on statistical practices in psychology, and they also proposed the adapted Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificantly results.

...read moreread less

Abstract: Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This might be unwarranted, since reported statistically nonsignificant findings may just be ‘too good to be false’. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificant results. Simulations show that the adapted Fisher method generally is a powerful method to detect false negatives. We examined evidence for false negatives in the psychology literature in three applications of the adapted Fisher method. These applications indicate that (i) the observed effect size distribution of nonsignificant effects exceeds the expected distribution assuming a null-effect, and approximately two out of three (66.7%) psychology articles reporting nonsignificant results contain evidence for at least one false negative, (ii) nonsignificant results on gender effects contain evidence of true nonzero effects, and (iii) the statistically nonsignificant replications from the Reproducibility Project Psychology (RPP) do not warrant strong conclusions about the absence or presence of true zero effects underlying these nonsignificant results. We conclude that false negatives deserve more attention in the current debate on statistical practices in psychology. Potentially neglecting effects due to a lack of statistical power can lead to a waste of research resources and stifle the scientific discovery process.

...read moreread less

Proceedings Article•DOI•

Catch It If You Can: Real-Time Network Anomaly Detection with Low False Alarm Rates

[...]

Georgios Kathareios¹, Andreea Anghel¹, Akos Mate¹, Rolf Clauberg¹, Mitch Gusat¹ - Show less +1 more•Institutions (1)

IBM¹

01 Dec 2017

TL;DR: A real-time network unsupervised anomaly detection system that reduces the manual workload by coupling 2 learning stages and achieves 98.5% true and 1.3% false positive rates, while reducing the human intervention rate by 5x.

...read moreread less

Abstract: Unsupervised anomaly detection (AD) has shown promise against the frequently new cyberattacks. But, as anomalies are not always malicious, such systems generate prodigious false alarm rates. The resulting manual validation workload often overwhelms the IT operators: it slows down the system reaction by orders of magnitude and ultimately thwarts its applicability. Therefore, we propose a real-time network AD system that reduces the manual workload by coupling 2 learning stages. The first stage performs adaptive unsupervised AD using a shallow autoencoder. The second stage uses a custom nearest-neighbor classifier to filter the false positives by modeling the manual classification. We implement a prototype for 10-50Gbps speeds and evaluate it with traffic from a national network operator: we achieve 98.5% true and 1.3% false positive rates, while reducing the human intervention rate by 5x.

...read moreread less

Collapse