scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2021"


Journal ArticleDOI
14 Jan 2021
TL;DR: This Primer on Bayesian statistics summarizes the most important aspects of determining prior distributions, likelihood functions and posterior distributions, in addition to discussing different applications of the method across disciplines.
Abstract: Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement. We discuss the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection. Examples of successful applications of Bayesian analysis across various research fields are provided, including in social sciences, ecology, genetics, medicine and more. We propose strategies for reproducibility and reporting standards, outlining an updated WAMBS (when to Worry and how to Avoid the Misuse of Bayesian Statistics) checklist. Finally, we outline the impact of Bayesian analysis on artificial intelligence, a major goal in the next decade. This Primer on Bayesian statistics summarizes the most important aspects of determining prior distributions, likelihood functions and posterior distributions, in addition to discussing different applications of the method across disciplines.

337 citations


Journal ArticleDOI
TL;DR: A data-driven methodology for fault detection and diagnosis by integrating the principal component analysis (PCA) with the Bayesian network (BN) and a combination of vine copula and Bayes’ theorem suggests that the proposed framework provides superior performance.

86 citations


Journal ArticleDOI
TL;DR: The use of Bayes factors is becoming increasingly common in psychological sciences and it is important that researchers understand the logic behind the Bayes factor in order to correctly interpret it, and the strengths of weaknesses of the Bayesian approach as mentioned in this paper.
Abstract: The use of Bayes factors is becoming increasingly common in psychological sciences. Thus, it is important that researchers understand the logic behind the Bayes factor in order to correctly interpret it, and the strengths of weaknesses of the Bayesian approach. As education for psychological scientists focuses on frequentist statistics, resources are needed for researchers and students who want to learn more about this alternative approach. The aim of the current article is to provide such an overview to a psychological researcher. We cover the general logic behind Bayesian statistics, explain how the Bayes factor is calculated, how to set the priors in popular software packages to reflect the prior beliefs of the researcher, and finally provide a set of recommendations and caveats for interpreting Bayes factors. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

83 citations


Journal ArticleDOI
TL;DR: This work designs artificial neural networks (ANNs) that take as input single nucleotide polymorphic sites (SNPs) found in individuals sampled from a single population and infer the past effective population size history, and shows that combining deep learning and ABC can improve performance while taking advantage of both frameworks.
Abstract: For the past decades, simulation-based likelihood-free inference methods have enabled researchers to address numerous population genetics problems. As the richness and amount of simulated and real genetic data keep increasing, the field has a strong opportunity to tackle tasks that current methods hardly solve. However, high data dimensionality forces most methods to summarize large genomic data sets into a relatively small number of handcrafted features (summary statistics). Here, we propose an alternative to summary statistics, based on the automatic extraction of relevant information using deep learning techniques. Specifically, we design artificial neural networks (ANNs) that take as input single nucleotide polymorphic sites (SNPs) found in individuals sampled from a single population and infer the past effective population size history. First, we provide guidelines to construct artificial neural networks that comply with the intrinsic properties of SNP data such as invariance to permutation of haplotypes, long scale interactions between SNPs and variable genomic length. Thanks to a Bayesian hyperparameter optimization procedure, we evaluate the performance of multiple networks and compare them to well-established methods like Approximate Bayesian Computation (ABC). Even without the expert knowledge of summary statistics, our approach compares fairly well to an ABC approach based on handcrafted features. Furthermore, we show that combining deep learning and ABC can improve performance while taking advantage of both frameworks. Finally, we apply our approach to reconstruct the effective population size history of cattle breed populations.

50 citations


Journal ArticleDOI
TL;DR: In this article, the authors cast synaptic plasticity as a problem of Bayesian inference, and thus provide a normative view of learning, and propose two hypotheses to explain the large variability in the size of postsynaptic potentials.
Abstract: Learning, especially rapid learning, is critical for survival. However, learning is hard; a large number of synaptic weights must be set based on noisy, often ambiguous, sensory information. In such a high-noise regime, keeping track of probability distributions over weights is the optimal strategy. Here we hypothesize that synapses take that strategy; in essence, when they estimate weights, they include error bars. They then use that uncertainty to adjust their learning rates, with more uncertain weights having higher learning rates. We also make a second, independent, hypothesis: synapses communicate their uncertainty by linking it to variability in postsynaptic potential size, with more uncertainty leading to more variability. These two hypotheses cast synaptic plasticity as a problem of Bayesian inference, and thus provide a normative view of learning. They generalize known learning rules, offer an explanation for the large variability in the size of postsynaptic potentials and make falsifiable experimental predictions.

44 citations


Journal ArticleDOI
22 Apr 2021
TL;DR: In this paper, a neural-network conditional density estimator is trained to model posterior probability distributions over the full 15-dimensional space of binary black hole system parameters, given detector strain data from multiple detectors.
Abstract: The LIGO and Virgo gravitational-wave observatories have detected many exciting events over the past five years. To infer the system parameters, iterative sampling algorithms such as MCMC are typically used with Bayes' theorem to obtain posterior samples—by repeatedly generating waveforms and comparing to measured strain data. However, as the rate of detections grows with detector sensitivity, this poses a growing computational challenge. To confront this challenge, as well as that of fast multimessenger alerts, in this work we apply deep learning to learn non-iterative surrogate models for the Bayesian posterior. We train a neural-network conditional density estimator to model posterior probability distributions over the full 15-dimensional space of binary black hole system parameters, given detector strain data from multiple detectors. We use the method of normalizing flows—specifically, a neural spline flow—which allows for rapid sampling and density estimation. Training the network is likelihood-free, requiring samples from the data generative process, but no likelihood evaluations. Through training, the network learns a global set of posteriors: it can generate thousands of independent posterior samples per second for any strain data consistent with the training distribution. We demonstrate our method by performing inference on GW150914, and obtain results in close agreement with standard techniques.

42 citations


Journal ArticleDOI
TL;DR: This work proposes a novel statistical construction of the finite element method that provides the means of synthesising measurement data and finite element models and uses Bayes rule to choose the most suitable finite element model in light of the observed data by computing the model posteriors.

38 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field and derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB.
Abstract: Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.

28 citations


Journal ArticleDOI
TL;DR: A way to estimate the anomaly base rate is proposed by combining two key insights: Empirical Bayes methods capture the implicit process by which researchers form priors about the likelihood that a new variable is a tradable anomaly based on their past experience, and under certain conditions, a one-to-one mapping exists between these prior beliefs and the best-fit tuning parameter in a penalized regression.

27 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to compare the empirical Bayesian and approximation theoretic approaches to hierarchical learning, in terms of large data consistency, variance of estimators, robustness of the estimators to model misspecification, and computational cost.
Abstract: Hierarchical modeling and learning has proven very powerful in the field of Gaussian process regression and kernel methods, especially for machine learning applications and, increasingly, within the field of inverse problems more generally. The classical approach to learning hierarchical information is through Bayesian formulations of the problem, implying a posterior distribution on the hierarchical parameters or, in the case of empirical Bayes, providing an optimization criterion for them. Recent developments in the machine learning literature have suggested new criteria for hierarchical learning, based on approximation theoretic considerations that can be interpreted as variants of cross-validation, and exploiting approximation consistency in data splitting. The purpose of this paper is to compare the empirical Bayesian and approximation theoretic approaches to hierarchical learning, in terms of large data consistency, variance of estimators, robustness of the estimators to model misspecification, and computational cost. Our analysis is rooted in the setting of Matern-like Gaussian random field priors, with smoothness, amplitude and inverse lengthscale as hierarchical parameters, in the regression setting. Numerical experiments validate the theory and extend the scope of the paper beyond the Matern setting.

26 citations


Journal ArticleDOI
TL;DR: A phylogenetic extension to the MDS model is implemented and it is learned that subtype H3N2 spreads most effectively, consistent with its epidemic success relative to other seasonal influenza subtypes.
Abstract: Big Bayes is the computationally intensive co-application of big data and large, expressive Bayesian models for the analysis of complex phenomena in scientific inference and statistical learning. S...

Proceedings ArticleDOI
01 Aug 2021
TL;DR: This paper empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search and finds that MBR still exhibits a length and token frequency bias, but thatMBR also increases robustness against copy noise in the training data and domain shift.
Abstract: Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search – the de facto standard inference algorithm in NMT – and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift.

Journal ArticleDOI
01 Feb 2021
TL;DR: In this paper, the statistical inference of multicomponent stress-strength reliability under the adaptive Type-II hybrid progressive censored samples for the Weibull distribution is considered, and the performance of different methods are compared using the Monte Carlo simulations.
Abstract: The statistical inference of multicomponent stress-strength reliability under the adaptive Type-II hybrid progressive censored samples for the Weibull distribution is considered. It is assumed that both stress and strength are two Weibull independent random variables. We study the problem in three cases. First assuming that the stress and strength have the same shape parameter and different scale parameters, the maximum likelihood estimation (MLE), approximate maximum likelihood estimation (AMLE) and two Bayes approximations, due to the lack of explicit forms, are derived. Also, the asymptotic confidence intervals, two bootstrap confidence intervals and highest posterior density (HPD) credible intervals are obtained. In the second case, when the shape parameter is known, MLE, exact Bayes estimation, uniformly minimum variance unbiased estimator (UMVUE) and different confidence intervals (asymptotic and HPD) are studied. Finally, assuming that the stress and strength have the different shape and scale parameters, ML, AML and Bayesian estimations on multicomponent reliability have been considered. The performances of different methods are compared using the Monte Carlo simulations and for illustrative aims, one data set is investigated.

Journal ArticleDOI
TL;DR: In this paper, a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression was proposed.
Abstract: We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the desi...

Posted Content
TL;DR: In this article, the authors provide a workflow to test the strengths and limitations of Bayes factors as a way to quantify evidence in support of scientific hypotheses, and illustrate this workflow using an example from the cognitive sciences.
Abstract: Inferences about hypotheses are ubiquitous in the cognitive sciences. Bayes factors provide one general way to compare different hypotheses by their compatibility with the observed data. Those quantifications can then also be used to choose between hypotheses. While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions. Moreover it's not clear how straightforwardly this approach can be implemented in practice, and in particular how sensitive it is to the details of the computational implementation. Here, we investigate these questions for Bayes factor analyses in the cognitive sciences. We explain the statistics underlying Bayes factors as a tool for Bayesian inferences and discuss that utility functions are needed for principled decisions on hypotheses. Next, we study how Bayes factors misbehave under different conditions. This includes a study of errors in the estimation of Bayes factors. Importantly, it is unknown whether Bayes factor estimates based on bridge sampling are unbiased for complex analyses. We are the first to use simulation-based calibration as a tool to test the accuracy of Bayes factor estimates. Moreover, we study how stable Bayes factors are against different MCMC draws. We moreover study how Bayes factors depend on variation in the data. We also look at variability of decisions based on Bayes factors and how to optimize decisions using a utility function. We outline a Bayes factor workflow that researchers can use to study whether Bayes factors are robust for their individual analysis, and we illustrate this workflow using an example from the cognitive sciences. We hope that this study will provide a workflow to test the strengths and limitations of Bayes factors as a way to quantify evidence in support of scientific hypotheses. Reproducible code is available from this https URL.

Journal ArticleDOI
TL;DR: A reliability-based method for the assessment of the elastic modulus (EM) of concrete in simply supported girders from dynamic identification using the correlation between the natural frequencies of the first bending modes and the concrete EM.
Abstract: This paper delivers a reliability-based method for the assessment of the elastic modulus (EM) of concrete in simply supported girders from dynamic identification. The correlation between the natura...

Journal ArticleDOI
TL;DR: In this article, the problem of estimating the parameters, survival and hazard rate functions of the two-parameter Hjorth distribution under adaptive type-II progressive hybrid censoring scheme using maximum likelihood and Bayesian approaches is addressed.
Abstract: Adaptive Type-II progressive hybrid censoring scheme has been proposed to increase the efficiency of statistical analysis and save the total test time on a life-testing experiment This article deals with the problem of estimating the parameters, survival and hazard rate functions of the two-parameter Hjorth distribution under adaptive Type-II progressive hybrid censoring scheme using maximum likelihood and Bayesian approaches The two-sided approximate confidence intervals of the unknown quantities are constructed Under the assumption of independent gamma priors, the Bayes estimators are obtained using squared error loss function Since the Bayes estimators cannot be expressed in closed forms, Lindley’s approximation and Markov chain Monte Carlo methods are considered and the highest posterior density credible intervals are also obtained To study the behavior of the various estimators, a Monte Carlo simulation study is performed The performances of the different estimators have been compared on the basis of their average root mean squared error and relative absolute bias Finally, to show the applicability of the proposed estimators a data set of industrial devices has been analyzed

Journal ArticleDOI
TL;DR: Interactions among laboratory-tested water quality parameters, observations of tap water, and household characteristics, including plumbing type, source water, household location, and on-site water treatment are explored to develop features for predicting water lead levels.

Posted ContentDOI
TL;DR: In this article, the authors compare three approaches for finding evidence for equivalence: the frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure.
Abstract: Some important research questions require the ability to find evidence for two conditions being practically equivalent. This is impossible to accomplish within the traditional frequentist null hypothesis significance testing framework; hence, other methodologies must be utilized. We explain and illustrate three approaches for finding evidence for equivalence: The frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure. We compare the classification performances of these three approaches for various plausible scenarios. The results indicate that the Bayes factor interval null approach compares favorably to the other two approaches in terms of statistical power. Critically, compared with the Bayes factor interval null procedure, the two one-sided tests and the highest density interval region of practical equivalence procedures have limited discrimination capabilities when the sample size is relatively small: Specifically, in order to be practically useful, these two methods generally require over 250 cases within each condition when rather large equivalence margins of approximately .2 or .3 are used; for smaller equivalence margins even more cases are required. Because of these results, we recommend that researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence, especially for studies that are constrained on sample size. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

Journal ArticleDOI
TL;DR: This article uses statistical theory and simulated data to show that EB estimates are biased toward zero, a phenomenon known as “shrinkage,” and illustrates these issues using an empirical data set on emotion regulation and neuroticism.
Abstract: Empirical Bayes (EB) estimates of the random effects in multilevel models represent how individuals deviate from the population averages and are often extracted to detect outliers or used as predictors in follow-up analysis. However, little research has examined whether EB estimates are indeed reliable and valid measures of individual traits. In this article, we use statistical theory and simulated data to show that EB estimates are biased toward zero, a phenomenon known as "shrinkage." The degree of shrinkage and reliability of EB estimates depend on a number of factors, including Level-1 residual variance, Level-1 predictor variance, Level-2 random effects variance, and number of within-person observations. As a result, EB estimates may not be ideal for detecting outliers, and they produce biased regression coefficients when used as predictors. We illustrate these issues using an empirical data set on emotion regulation and neuroticism.

Journal ArticleDOI
TL;DR: In this article, the authors provide a simple explanation of the interpretation and use of Bayes' rule in diagnosis, and show how both the prior probability and the measurement properties of diagnostic tests (sensitivity and specificity) are crucial determinants of the posterior probability of disease (predictive value).

MonographDOI
17 Jun 2021
TL;DR: The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and influence as discussed by the authors, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce.
Abstract: The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and influence. 'Data science' and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? How does it all fit together? Now in paperback and fortified with exercises, this book delivers a concentrated course in modern statistical thinking. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov Chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. Each chapter ends with class-tested exercises, and the book concludes with speculation on the future direction of statistics and data science.

Journal ArticleDOI
TL;DR: Using generalized linear mixed models, it is demonstrated that reparametrized variational Bayes (RVB) provides improvements in both accuracy and convergence rate compared to state of the art Gaussian variational approximation methods.
Abstract: We propose using model reparametrization to improve variational Bayes inference for hierarchical models whose variables can be classified as global (shared across observations) or local (observation‐specific). Posterior dependence between local and global variables is minimized by applying an invertible affine transformation on the local variables. The functional form of this transformation is deduced by approximating the posterior distribution of each local variable conditional on the global variables by a Gaussian density via a second order Taylor expansion. Variational Bayes inference for the reparametrized model is then obtained using stochastic approximation. Our approach can be readily extended to large datasets via a divide and recombine strategy. Using generalized linear mixed models, we demonstrate that reparametrized variational Bayes (RVB) provides improvements in both accuracy and convergence rate compared to state of the art Gaussian variational approximation methods.

Journal ArticleDOI
30 May 2021
TL;DR: In this article, the estimating problems of the model parameters, reliability and hazard functions of an inverted Nadarajah-Haghighi distribution when sample is available from Type-II progressive censoring scheme have been considered.
Abstract: In this paper, the estimating problems of the model parameters, reliability and hazard functions of an inverted Nadarajah–Haghighi distribution when sample is available from Type-II progressive censoring scheme have been considered. The maximum likelihood and maximum product of spacings estimators have been obtained for any function of the model parameters. The normality property of the classical estimators is used to construct the approximate confidence intervals for the unknown parameters and some related functions of them such as the reliability characteristics. Using independent gamma informative priors, the Bayes estimators of the unknown parameters are derived under squared error loss function. Since the Bayes estimators are obtained in a complex form, so we have been used two approximation techniques, namely: Tierney–Kadane approximation method and Metropolis–Hastings algorithm to carry out the Bayes estimates and also to construct the associate highest posterior density credible intervals. To evaluate the performance of the proposed methods, a Monte Carlo simulation study is carried out. To suggest the optimum censoring scheme among different competing censoring plans, four optimality criteria have been considered. One real-life data set is analyzed to discuss how the applicability of the proposed methods in real phenomenon.

Journal ArticleDOI
TL;DR: Fuzzy sets theory has embraced uncertainty modeling when membership functions have been reinterpreted as possibility distributions as discussed by the authors, and human behavior is highly consistent with Bayesian probabilistic inference in both sensory and motor and cognitive domain.
Abstract: Human interaction with the world is dominated by uncertainty. Probability theory is a valuable tool to face such uncertainty. According to the Bayesian definition, probabilities are personal beliefs. Experimental evidence supports the notion that human behavior is highly consistent with Bayesian probabilistic inference in both the sensory and motor and cognitive domain. All the higher-level psychophysical functions of our brain are believed to take the activities of interconnected and distributed networks of neurons in the neocortex as their physiological substrate. Neurons in the neocortex are organized in cortical columns that behave as fuzzy sets. Fuzzy sets theory has embraced uncertainty modeling when membership functions have been reinterpreted as possibility distributions. The terms of Bayes’ formula are conceivable as fuzzy sets and Bayes’ inference becomes a fuzzy inference. According to the QBism, quantum probabilities are also Bayesian. They are logical constructs rather than physical realities. It derives that the Born rule is nothing but a kind of Quantum Law of Total Probability. Wavefunctions and measurement operators are viewed epistemically. Both of them are similar to fuzzy sets. The new link that is established between fuzzy logic, neuroscience, and quantum mechanics through Bayesian probability could spark new ideas for the development of artificial intelligence and unconventional computing.

Journal ArticleDOI
TL;DR: The results reveal that visibility, location, and time of the day are the main environmental factors affecting the occurrence probability of human errors.

Journal ArticleDOI
TL;DR: The model was applied to data of lane change conflicts collected from 11 basic freeway segments in Guangdong Province, China and showed that Bayesian hierarchical extreme value models significantly outperform the at-site models in terms of crash estimation accuracy and precision.

Journal ArticleDOI
TL;DR: Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.
Abstract: In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.

Journal ArticleDOI
TL;DR: This paper developed a Bayesian approach that dynamically integrates imputation and estimation for line list data, which can accurately estimate the epidemic curve and instantaneous reproduction numbers, even with most symptom onset dates missing.
Abstract: Surveillance is critical to mounting an appropriate and effective response to pandemics. However, aggregated case report data suffers from reporting delays and can lead to misleading inferences. Different from aggregated case report data, line list data is a table contains individual features such as dates of symptom onset and reporting for each reported case and a good source for modeling delays. Current methods for modeling reporting delays are not particularly appropriate for line list data, which typically has missing symptom onset dates that are non-ignorable for modeling reporting delays. In this paper, we develop a Bayesian approach that dynamically integrates imputation and estimation for line list data. Specifically, this Bayesian approach can accurately estimate the epidemic curve and instantaneous reproduction numbers, even with most symptom onset dates missing. The Bayesian approach is also robust to deviations from model assumptions, such as changes in the reporting delay distribution or incorrect specification of the maximum reporting delay. We apply the Bayesian approach to COVID-19 line list data in Massachusetts and find the reproduction number estimates correspond more closely to the control measures than the estimates based on the reported curve.

Journal ArticleDOI
TL;DR: A simple, straight forward and efficient approach called BHFS (Bagging Homogeneous Feature Selection) which is based upon Ensemble data perturbation feature selection methods and makes better predictions compared to the standard NB.
Abstract: The significant success of an organization greatly depends upon the consumers and their relationship with the organization. The knowledge of consumer behavioral and a excellent understanding of consumer expectations is important for the development of strategic management decisions in support of improving the business value. CRM is intensively applied in the analysis of consumer behavior patterns with the use of Machine Learning (ML) Techniques. Naive Bayes (NB) one of the ML supervised classification models is used to analyze customer behavior predictions. In some domain, the NB performance degrades which involves the existence of redundant, noisy and irrelevant attributes in the dataset, which is a violation of underlying assumption made by naive Bayes. Different enhancements have been suggested to enhance the primary assumption of the NB classifier-independence assumption between the attributes of given class label. In this research, we suggest a simple, straight forward and efficient approach called BHFS (Bagging Homogeneous Feature Selection) which is based upon Ensemble data perturbation feature selection methods. The BHFS method is applied to eliminate the correlated, irrelevant attributes in the dataset and selecting a stable feature subset for improving performance prediction of the NB model. The advantage of the BHFS method requires less running time and selects the best relevant attributes for the evaluation of naive Bayes. The Experimental outcomes demonstrate that the BHFS-naive Bayes model makes better predictions compared to the standard NB. The running time complexity is also less with BHFS-NB since the naive Bayes is constructed using selected features obtained from BHFS.