scispace - formally typeset
Search or ask a question
Author

Joseph Antonelli

Other affiliations: Harvard University
Bio: Joseph Antonelli is an academic researcher from University of Florida. The author has contributed to research in topics: Estimator & Covariate. The author has an hindex of 8, co-authored 35 publications receiving 253 citations. Previous affiliations of Joseph Antonelli include Harvard University.

Papers
More filters
Journal ArticleDOI
TL;DR: In a nationally representative sample of Medicare enrollees, changes in exposure to PM2.5, even at levels consistently below standards, are associated with increases in hospital admissions for all causes and cardiovascular and respiratory diseases.
Abstract: Background:In 2012, the EPA enacted more stringent National Ambient Air Quality Standards (NAAQS) for fine particulate matter (PM25) Few studies have characterized the health effects of air pollution levels lower than the most recent NAAQS for long-term exposure to PM25 (now 12 μg/m3)Methods:We

76 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed matching on both the estimated propensity score and the estimated prognostic scores when the number of covariates is large relative to the total number of observations, and derived asymptotic results for the matching estimator.
Abstract: Valid estimation of treatment effects from observational data requires proper control of confounding. If the number of covariates is large relative to the number of observations, then controlling for all available covariates is infeasible. In cases where a sparsity condition holds, variable selection or penalization can reduce the dimension of the covariate space in a manner that allows for valid estimation of treatment effects. In this article, we propose matching on both the estimated propensity score and the estimated prognostic scores when the number of covariates is large relative to the number of observations. We derive asymptotic results for the matching estimator and show that it is doubly robust in the sense that only one of the two score models need be correct to obtain a consistent estimator. We show via simulation its effectiveness in controlling for confounding and highlight its potential to address nonlinear confounding. Finally, we apply the proposed procedure to analyze the effect of gender on prescription opioid use using insurance claims data.

49 citations

Journal ArticleDOI
TL;DR: In this paper, a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection, is presented, where the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow.
Abstract: High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow Certain pervasive analytical challenges facing the field warrant ongoing focused research Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations

46 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed continuous spike and slab priors on the regression coefficients corresponding to the potential confounders Xj to estimate the causal effects of persistent pesticide exposure on triglyceride levels.
Abstract: In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders (p) is larger than the number of observations (n), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable Xj is strongly associated with the treatment T but weakly with the outcome Y, the coefficient βj will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients βj corresponding to the potential confounders Xj . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients (βj s) of the Xj s that are strongly associated with T but weakly associated with Y. We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.

29 citations

Journal ArticleDOI
TL;DR: The SSGL is extended to sparse generalized additive models (GAMs), thereby introducing the first nonparametric variant of the spike-and-slab lasso methodology and developing theory to uniquely characterize the global posterior mode under the SSGL and introducing a highly efficient block coordinate ascent algorithm for maximum a posteriori estimation.
Abstract: –We introduce the spike-and-slab group lasso (SSGL) for Bayesian estimation and variable selection in linear regression with grouped variables. We further extend the SSGL to sparse generali...

25 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In the entire Medicare population, there was significant evidence of adverse effects related to exposure to PM2.5 and ozone at concentrations below current national standards.
Abstract: BackgroundStudies have shown that long-term exposure to air pollution increases mortality. However, evidence is limited for air-pollution levels below the most recent National Ambient Air Quality Standards. Previous studies involved predominantly urban populations and did not have the statistical power to estimate the health effects in underrepresented groups. MethodsWe constructed an open cohort of all Medicare beneficiaries (60,925,443 persons) in the continental United States from the years 2000 through 2012, with 460,310,521 person-years of follow-up. Annual averages of fine particulate matter (particles with a mass median aerodynamic diameter of less than 2.5 μm [PM2.5]) and ozone were estimated according to the ZIP Code of residence for each enrollee with the use of previously validated prediction models. We estimated the risk of death associated with exposure to increases of 10 μg per cubic meter for PM2.5 and 10 parts per billion (ppb) for ozone using a two-pollutant Cox proportional-hazards model...

985 citations

Journal ArticleDOI
TL;DR: This paper introduces an open-source software package in the R programming language, the bkmr R package, and demonstrates methods for visualizing high-dimensional exposure-response functions, and for estimating scientifically relevant summaries of Bayesian kernel machine regression.
Abstract: Estimating the health effects of multi-pollutant mixtures is of increasing interest in environmental epidemiology. Recently, a new approach for estimating the health effects of mixtures, Bayesian kernel machine regression (BKMR), has been developed. This method estimates the multivariable exposure-response function in a flexible and parsimonious way, conducts variable selection on the (potentially high-dimensional) vector of exposures, and allows for a grouped variable selection approach that can accommodate highly correlated exposures. However, the application of this novel method has been limited by a lack of available software, the need to derive interpretable output in a computationally efficient manner, and the inability to apply the method to non-continuous outcome variables. This paper addresses these limitations by (i) introducing an open-source software package in the R programming language, the bkmr R package, (ii) demonstrating methods for visualizing high-dimensional exposure-response functions, and for estimating scientifically relevant summaries, (iii) illustrating a probit regression implementation of BKMR for binary outcomes, and (iv) describing a fast version of BKMR that utilizes a Gaussian predictive process approach. All of the methods are illustrated using fully reproducible examples with the provided R code. Applying the methods to a continuous outcome example illustrated the ability of the BKMR implementation to estimate the health effects of multi-pollutant mixtures in the context of a highly nonlinear, biologically-based dose-response function, and to estimate overall, single-exposure, and interactive health effects. The Gaussian predictive process method led to a substantial reduction in the runtime, without a major decrease in accuracy. In the setting of a larger number of exposures and a dichotomous outcome, the probit BKMR implementation was able to correctly identify the variables included in the exposure-response function and yielded interpretable quantities on the scale of a latent continuous outcome or on the scale of the outcome probability. This newly developed software, integrated suite of tools, and extended methodology makes BKMR accessible for use across a broad range of epidemiological applications in which multiple risk factors have complex effects on health.

407 citations

Journal ArticleDOI
01 Nov 2019
TL;DR: This study adds to known causes of death associated with PM2.5 by identifying 3 new causes (death due to chronic kidney disease, hypertension, and dementia); racial and socioeconomic disparities in the burden were also evident.
Abstract: Importance Ambient fine particulate matter (PM2.5) air pollution is associated with increased risk of several causes of death. However, epidemiologic evidence suggests that current knowledge does not comprehensively capture all causes of death associated with PM2.5exposure. Objective To systematically identify causes of death associated with PM2.5pollution and estimate the burden of death for each cause in the United States. Design, Setting, and Participants In a cohort study of US veterans followed up between 2006 and 2016, ensemble modeling was used to identify and characterize morphology of the association between PM2.5and causes of death. Burden of death associated with PM2.5exposure in the contiguous United States and for each state was then estimated by application of estimated risk functions to county-level PM2.5estimates from the US Environmental Protection Agency and cause-specific death rate data from the Centers for Disease Control and Prevention. Main Outcomes and Measures Nonlinear exposure-response functions of the association between PM2.5and causes of death and burden of death associated with PM2.5. Exposures Annual mean PM2.5levels. Results A cohort of 4 522 160 US veterans (4 243 462 [93.8%] male; median [interquartile range] age, 64.1 [55.7-75.5] years; 3 702 942 [82.0%] white, 667 550 [14.8%] black, and 145 593 [3.2%] other race) was followed up for a median (interquartile range) of 10.0 (6.8-10.2) years. In the contiguous United States, PM2.5exposure was associated with excess burden of death due to cardiovascular disease (56 070.1 deaths [95% uncertainty interval {UI}, 51 940.2-60 318.3 deaths]), cerebrovascular disease (40 466.1 deaths [95% UI, 21 770.1-46 487.9 deaths]), chronic kidney disease (7175.2 deaths [95% UI, 5910.2-8371.9 deaths]), chronic obstructive pulmonary disease (645.7 deaths [95% UI, 300.2-2490.9 deaths]), dementia (19 851.5 deaths [95% UI, 14 420.6-31 621.4 deaths]), type 2 diabetes (501.3 deaths [95% UI, 447.5-561.1 deaths]), hypertension (30 696.9 deaths [95% UI, 27 518.1-33 881.9 deaths]), lung cancer (17 545.3 deaths [95% UI, 15 055.3-20 464.5 deaths]), and pneumonia (8854.9 deaths [95% UI, 7696.2-10 710.6 deaths]). Burden exhibited substantial geographic variation. Estimated burden of death due to nonaccidental causes was 197 905.1 deaths (95% UI, 183 463.3-213 644.9 deaths); mean age-standardized death rates (per 100 000) due to nonaccidental causes were higher among black individuals (55.2 [95% UI, 50.5-60.6]) than nonblack individuals (51.0 [95% UI, 46.4-56.1]) and higher among those living in counties with high (65.3 [95% UI, 56.2-75.4]) vs low (46.1 [95% UI, 42.3-50.4]) socioeconomic deprivation; 99.0% of the burden of death due to nonaccidental causes was associated with PM2.5levels below standards set by the US Environmental Protection Agency. Conclusions and Relevance In this study, 9 causes of death were associated with PM2.5exposure. The burden of death associated with PM2.5was disproportionally borne by black individuals and socioeconomically disadvantaged communities. Effort toward cleaner air might reduce the burden of PM2.5-associated deaths.

192 citations

Journal ArticleDOI
TL;DR: In this article, the authors explore different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease.

164 citations

Journal ArticleDOI
TL;DR: Different racial/ethnic populations and income groups are found to have been exposed to different levels of air pollution in the USA during the years 2000 to 2016, and it is suggested that more-targeted PM 2.5 reductions are necessary to provide all people with a similar degree of protection from environmental hazards.

142 citations