scispace - formally typeset
Search or ask a question
Posted ContentDOI

A Guide for Population-based Analysis of the Adolescent Brain Cognitive Development (ABCD) Study Baseline Data

10 Feb 2020-bioRxiv (Cold Spring Harbor Laboratory)-
TL;DR: This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of theABCD baselineData.
Abstract: ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline, recruited at random from the household populations in defined catchment areas for each of 21 study sites. The 21 geographic locations that comprise the ABCD research sites are nationally distributed and generally represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and the potential for selection bias in study site selection and enrollment are features of the ABCD observational study design that are informative for statistical estimation and inference. Both multi-level modeling and robust survey design-based methods can be used to account for clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical models and propensity weighting methods that calibrate ABCD weighted distributions to nationally-representative controls from the American Community Survey (ACS) can be employed in analysis to account for known informative sample design features or to attenuate potential demographic and socio-economic selection bias in the national sampling and recruitment of eligible children. This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of the ABCD baseline data. Specific recommendations will be provided for researchers on robust, efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.

Summary (3 min read)

I. Introduction

  • The Adolescent Brain Cognitive Development Study (ABCD) is a prospective cohort study of a baseline sample of U.S. children born during the period 2006-2008.
  • Eligible children, ages 9-10, were recruited from the household populations in defined catchment areas for each of 21 study sites during the roughly two year period beginning September 2016 and ending in October of 2018.
  • This methodological paper describes alternative approaches to analysis of the rich array of social, behavioral, environmental, genetic and summary-level neuroimaging data that is collected in the ABCD study.
  • Features of the ABCD design and data that are statistically "informative" and complicate population estimation and inference are the subject of Section 3.

2. Population orientation to ABCD analysis

  • As described in Garavan et al. (2018) , within each of the 21 ABCD study sites, a probability sample of the public and private schools was selected as the basis for the recruitment of the majority of eligible children to the ABCD baseline cohort.
  • The process of obtaining school cooperation and then parental consent could selectively impact the final characteristics of the sample that was actually observed.
  • The following sections will describe two approaches, propensity-based weighting and use of appropriate covariate controls in modeling, that aim to address potential selectivity that may have entered the ABCD cohort through the site election or school/parental consent gateways to actual study participation.

5. Properties of the ABCD Baseline Sample Cohort in Comparison to ACS

  • The unweighted distribution of reported annual family incomes for the ABCD baseline cohort differs from the nationally representative ACS estimates for the U.S. population of 9, 10 year olds.
  • In nominal dollars, the family incomes of the ABCD children are higher on average than ACS estimates for the comparable U.S.
  • The ABCD Passive Data Work Group is currently in the process of acquiring external data on school, community and environmental characteristics that can be linked to individual child data and used to analyze the role that these contextual effects may have on the current status and development trajectories of the children in the ABCD baseline cohort.
  • At this stage, the propensity-based population weighting methodology described in the next section does not incorporate calibration based on detailed characteristics of children's residences, schools or communities.

6. Weighting the ABCD Sample to ACS Population Controls

  • Following the step of trimming the extremes of the weight distribution, the R Rake iterative proportional fitting algorithm was used to "rake" the trimmed initial weights to exact ACS population counts for the marginal categories of: age (9,10), sex(female, male), and race/ethnicity (Hispanic, Black, White, Asian and all Other persons)-see Table 3 .
  • Figure 1 is a histogram display of the frequency distribution of the final population weights for the ABCD baseline children.
  • Figure 2 provides a boxplot comparison of the distribution of weights separately for boys and girls.
  • The Figure 3 boxplots of weights by family income category show a very different pattern.
  • Compared to the national population, children from families with lower incomes are underrepresented and the population weights for children in these lower income categories have higher average values and a greater variance than the weights for the children from higher income families.

7. Comparison of Analysis Methods

  • The comparative results for these two regression models suggest that when the special ABCD twin sample data are pooled with the general population sample and a LMM approach is used it is important to apply the three level DEAP model that includes a level two contribution for clustering within family unit.
  • When a two level model is applied to these pooled data and family level clustering is ignored, the results of these example analyses suggest the parameter estimates will be attenuated and estimated standard errors will be seriously overestimated.
  • If the two level LMM is fitted using only data for the general population sample (excluding the special twin sample cases), the resulting parameters estimates and standard errors are more consistent with those for the three level model.

7.B.3 Three-level LMM vs. Design-based Population weighted LS and Robust SEs

  • As noted above, the unweighted LMM and weighted design-based approaches compared in Table 7 aim to capture/model the complex variance structure of clustering and non-independence of the baseline observations for the ABCD child cohort.
  • The design-based estimation approaches employ the population weights described in Section 6 above and use a weighted least squares (WLS) methodology to estimate the population regression parameters.
  • Unlike the LMM approach, the components of variance associated with each level of clustering are estimated as a single weighted aggregate for the residual variance and not as individual components of variance attributable to each level of the clustering.
  • Poisson Regression-Comparison of model fitting methods, also known as 7.C Generalized Linear Model.
  • Here again, as in the previous comparisons based on the linear regression model, the three-level DEAP LMM and the design-based estimation for the pooled data show minor differences in the estimated relative risks and confidence intervals but the magnitude of these differences would not be judged to be substantively important.

8. Summary: Recommendations for research analysts

  • Researchers are encouraged to consider each of the informative features of the ABCD (clustering, sample selectivity, twin sample pooling) as they may apply to their analytic aims.
  • Sensitivity analyses such as those underlying the comparisons in Section 7 should provide good insight into the degree to which results for descriptive estimates or fitted models are influenced by clustering, weighting and twin sample pooling.

Did you find this useful? Give us your feedback

Figures (12)

Content maybe subject to copyright    Report

1
A Guide for Population-based Analysis of the Adolescent Brain Cognitive
Development (ABCD) Study Baseline Data
Steven G. Heeringa and Patricia A. Berglund.
Institute for Social Research, University of Michigan
June, 2019
Abstract: ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline,
recruited at random from the household populations in defined catchment areas for each of 21
study sites. The 21 geographic locations that comprise the ABCD research sites are nationally
distributed and generally represent the range of demographic and socio-economic diversity of the
U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and
the potential for selection bias in study site selection and enrollment are features of the ABCD
observational study design that are informative for statistical estimation and inference. Both
multi-level modeling and robust survey design-based methods can be used to account for
clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical
models and propensity weighting methods that calibrate ABCD weighted distributions to
nationally-representative controls from the American Community Survey (ACS) can be
employed in analysis to account for known informative sample design features or to attenuate
potential demographic and socio-economic selection bias in the national sampling and
recruitment of eligible children. This guide will present results of an empirical investigation of
the ABCD baseline data that compares the statistical efficiency of multi-level modeling and
distribution-free design-based approaches—both weighted and unweighted--to analyses of the
ABCD baseline data. Specific recommendations will be provided for researchers on robust,
efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.
I. Introduction
The Adolescent Brain Cognitive Development Study (ABCD) is a prospective cohort study of a
baseline sample of U.S. children born during the period 2006-2008. Eligible children, ages 9-
10, were recruited from the household populations in defined catchment areas for each of 21
study sites during the roughly two year period beginning September 2016 and ending in October
of 2018. Within study sites, consenting parents and assenting children were primarily recruited
through a probability sample of public and private schools augmented to a small extent by
special recruitment through summer camp programs and community volunteers. Approximately
9500 eligible, single-born children and 1600 eligible twins completed the ABCD baseline
imaging studies and assessments. The sample design and procedures employed in the
recruitment of the baseline sample are described in detail in Garavan, et al. (2018).
This methodological paper describes alternative approaches to analysis of the rich array of social,
behavioral, environmental, genetic and summary-level neuroimaging data that is collected in the
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

2
ABCD study. Section 2 will attempt to frame a response to the broad question, why should
ABCD analysts be concerned about estimation and inference for the population of U.S.
children—does external validity matter? Features of the ABCD design and data that are
statistically “informative” and complicate population estimation and inference are the subject of
Section 3. Section 4 will attempt to address the specific methodological question, “If inference
to the U.S. population is important, what are the appropriate choices of methods for estimating
population characteristics and relationships based on the ABCD data?”, describing both model-
based and design-based approaches to ABCD estimation and inference. A summary of the
general demographic and socio-economic characteristics for the ABCD baseline cohort before
any weighting adjustments are applied is presented in Section 5. Section 6 describes the
propensity-based weighting adjustment methodology that is used to calibrate the baseline sample
cohort to key demographic and socio-economic distributions for U.S. children ages 9 and 10
estimated from the American Community Survey (ACS). Section 7 presents results of an
empirical investigation of the ABCD baseline data that compares the statistical efficiency of
multi-level modeling and distribution-free design-based approaches—both weighted and
unweighted--to analyses of the ABCD baseline data. The paper concludes with specific
recommendations for researchers on approaches to both descriptive and multivariate analyses of
the ABCD baseline data. Appendices to this paper will contain illustrations of recommended
command syntax for analysis of the ABCD data using the major software packages.
2. Population orientation to ABCD analysis
As defined in Garavan et al.(2018), the label “population neuroscience “ when applied to
observational studies such as ABCD refers to the application of epidemiological research
practices including large-scale representative samples to assessments of target populations. It is
a study in neuroscience in that it focuses on brain and neurological system development,
morphology and function. It is a population study in that observational data are gathered in
such a way that they can be used to understand real population distributions and the biological,
familial, social and environmental factors that can govern how individuals actually live and
grow in today’s society.
From the outset, ABCD’s primary sponsor, the National Institute of Drug Abuse (NIDA) and
the ABCD scientific investigators were motivated to develop a baseline sample that reflected the
sociodemographic variation present in the U.S. population of 9 and 10 year-old children. ABCD
is an observational study sharing many aspects of its longitudinal design with existing
population-based survey programs such as the National Longitudinal Study of Adolescent to
Adult Health (Add Health,https://www.cpc.unc.edu/projects/addhealth), the Early Childhood
Longitudinal Surveys (ECLS, https://nces.ed.gov/ecls/) or the Child Development Supplement
(CDS,http://src.isr.umich.edu/src/childdevelopment/home.html) to the Panel Study of Income
Dynamics (PSID).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

3
Population representativeness or more correctly, absence of uncorrected selective or informative
bias in the subject pool, is important in achieving external validity—the ability to generalize
specific results of the study to the world at large. However, even with good, representative
samples of populations, failure to measure or control key factors or to recognize important
moderating and or mediating relationships can impact external validity of study findings. The
ABCD data are observational and although propensity-based methods may be used to control for
characteristics of “treated” and “control” participants, in the strictest sense insights gained from
the data—even in longitudinal studies such as ABCD—will be associative.
The ABCD baseline recruitment effort worked very hard to maintain a nationally distributed set
of controls on the age, sex and race/ethnicity of the children in the study. In year 2, additional
monitoring and targeted recruitment were put in place to raise the proportion of children from
lower income families. The predominantly probability sampling methodology for recruiting
children within each study site was intended to randomize over confounding factors that were not
explicitly controlled (or subsequently reflected in the propensity weighting). Nevertheless,
school consent and parental consent were strong forces that certainly may have altered the
effectiveness of the randomization over these uncontrolled confounders.
The purpose of covariate adjustments in models or the propensity weighting described below in
Section 6 is in fact to control specific sources of selection bias and restore unbiasedness to
descriptive and analytical estimates of the population characteristics and relationships. For many
measures of substantive interest, the success of this effort will never be fully known except in
rare cases where comparative national benchmarks exist (e.g. children's height) from
administrative records or very large surveys or population censuses. The effectiveness of
weighting adjustments to eliminate bias in population estimates depends of course on the
relationship of the substantive variable of interest (e.g. amygdala volume) to the variables that
were explicitly used to derive the propensity weights, namely age, sex, race/ethnicity, family
type, parental employment status, family size and Census region. These are the types of
variables that are available and are identically measured in a national source (American
Community Survey) and ABCD. It would have been ideal to have detailed population level data
on many other characteristics that may be highly correlated with the ABCD variable of interest
(e.g. the child's parents' amygdala volume when mom and dad were age 9,10). Only rarely and
in large two-phase studies will we ever have population level statistical controls of this nature for
a small group such as 9,10 year olds.
"Representative" is a strong adjective to apply to any data set. The accuracy of the descriptor
will vary by variable, by subpopulation and by the extent to which the weighting methodology or
model covariates capture factors that truly affect the outcome of interest (both in terms of the
variables and their functional relationship to the outcome). All forms of statistical estimation
and inference make assumptions. No study gets an uncontestable stamp of approval on the
unbiasedness of their survey estimates. In both approaches—propensity weighting or covariate
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

4
adjustment in modeling—it is easy to overlook a selective factor that influences the outcome or
modifies the effect of other variables. That is an inherent challenge in population inference from
a national study such as ABCD. The position that we take here is that multilevel models that
include appropriate statistical controls for demographic and socio-economic factors or propensity
weighted estimates of descriptive statistics from the ABCD baseline are in fact publishable
estimates for the population of U.S. children so long as authors acknowledge the design and
accurately describe the underlying methodology and its assumptions.
3. Properties of the ABCD design and data to consider in analysis.
This section describes three features of the ABCD design that must be considered in any analysis
of the baseline data.
Clustering and non-independence of observations: Cohort recruitment for the ABCD study
design was distinguished by the constraint that eligible children must live within reasonable
travel distance (e.g. 50 miles) of a major medical center or research facility where MRI and
fMRI imaging could be performed. The geographically-clustered observations on individual
children are not independent and the intraclass (“intra-site”) correlations for the many variables
must be accounted for to correctly estimate variances of descriptive estimates and analytical
model parameters. Correlations among the ABCD observations for individual children are also
introduced by other sources of clustering in the ABCD recruitment and measurement protocols:
selection of multiple students from schools, multiple children (including twins) recruited from
the same family, multiple children imaged on the same MRI scanner.
Selection bias in site choice and within-site subject enrollment: While the 21 geographic
locations that comprise the ABCD research sites are nationally distributed and generally
represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that
comprise the ABCD study population, in the restricted sense they do not constitute the primary
stage of a multi-stage probability sample such as those employed in major population-based
epidemiological surveys. To achieve population representativeness for statistical analyses, a
mechanism (e.g. modeling site characteristics, assuming pseudo-randomization) is needed to
calibrate the broader geographic, demographic and socio-economic characteristics of the set of
21 sites to the larger U.S. population framework (Olsen et al., 2013).
As described in Garavan et al. (2018), within each of the 21 ABCD study sites, a probability
sample of the public and private schools was selected as the basis for the recruitment of the
majority of eligible children to the ABCD baseline cohort. Although this school-based
recruitment approach within each site introduced randomization to the sample of students who
could be recruited to ABCD, the process of obtaining school cooperation and then parental
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

5
consent could selectively impact the final characteristics of the sample that was actually
observed. The following sections will describe two approaches, propensity-based weighting and
use of appropriate covariate controls in modeling, that aim to address potential selectivity that
may have entered the ABCD cohort through the site election or school/parental consent gateways
to actual study participation.
Special twin supplement: A final feature of the ABCD design that deserves attention in the
analysis of the baseline cohort data is the special oversample of twin pairs in four of the 21
ABCD sites. Although twins were eligible to be recruited in all sites that used the school-based
recruitment sampling methodology, in the four special twin sites supplemental samples of 150-
250 twin pairs per site were enrolled in ABCD using samples selected from state registries
(Garavan et al., 2018). These special samples of twin pairs can be distinguished in the final
baseline cohort of n=11,874 children; however, the study has chosen not to explicitly segregate
these twin data from the general population sample of single births and incidental twins recruited
through the school-based sampling protocol.
By a default decision of the study team, the propensity-based population weighting methodology
described in Section 6 and incorporated in the ABCD Data Exploration and Analysis Portal
(DEAP) descriptive estimation does assume a pooled analysis of the general and special twin
samples. Section 7 will apply multiple analytic approaches to investigate this assumption that
the special twin samples are in fact “exchangeable” with the ABCD general population sample.
4. Design-based and model-based approaches to ABCD analysis
Analysts may choose several approaches to estimation and inference that address the challenges
posed by the clustering, selection bias and special twin sample properties of the ABCD data.
The first approach is to assume that the multi-stage sample selection for ABCD follows a quasi-
probability design and employ design-based methodology similar to that typically used to
analyze large probability sample epidemiological surveys such as the U.S. National Health and
Nutrition Examination Survey (NHANES). Designed-based analysis will employ population
weighting to estimate population statistics and model parameters and non-parametric methods
(Taylor Series Linearization, Jackknife, and Bootstrap) to compute robust estimates of standard
errors. Any quasi-probability approach to analysis the ABCD data requires a minimum of two
things: 1) assignment of cases to ultimate cluster (UC) groupings to account for non-
independence of observations; and 2) modeling to derive case-specific analysis weights that
account for differential selection factors and permits the observed sample to be “mapped” to the
U.S. population of interest (Heeringa, et al., 2017).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: This cohort study investigates the association of individual and structural social determinants of health and vaccinations with child mental health during the COVID-19 pandemic in the US.
Abstract: This cohort study investigates the association of individual and structural social determinants of health and vaccinations with child mental health during the COVID-19 pandemic in the US.

25 citations

Journal ArticleDOI
TL;DR: For example, the authors found that p exhibited a broad pattern of statistically significant associations with risk variables across all domains assessed, including temperament, neurocognition, fear/distress, and social adversity.
Abstract: Background Structural models of psychopathology consistently identify internalizing (INT) and externalizing (EXT) specific factors as well as a superordinate factor that captures their shared variance, the p factor. Questions remain, however, about the meaning of these data-driven dimensions and the interpretability and distinguishability of the larger nomological networks in which they are embedded. Methods The sample consisted of 10 645 youth aged 9-10 years participating in the multisite Adolescent Brain and Cognitive Development (ABCD) Study. p, INT, and EXT were modeled using the parent-rated Child Behavior Checklist (CBCL). Patterns of associations were examined with variables drawn from diverse domains including demographics, psychopathology, temperament, family history of substance use and psychopathology, school and family environment, and cognitive ability, using instruments based on youth-, parent-, and teacher-report, and behavioral task performance. Results p exhibited a broad pattern of statistically significant associations with risk variables across all domains assessed, including temperament, neurocognition, and social adversity. The specific factors exhibited more domain-specific patterns of associations, with INT exhibiting greater fear/distress and EXT exhibiting greater impulsivity. Conclusions In this largest study of hierarchical models of psychopathology to date, we found that p, INT, and EXT exhibit well-differentiated nomological networks that are interpretable in terms of neurocognition, impulsivity, fear/distress, and social adversity. These networks were, in contrast, obscured when relying on the a priori Internalizing and Externalizing dimensions of the CBCL scales. Our findings add to the evidence for the validity of p, INT, and EXT as theoretically and empirically meaningful broad psychopathology liabilities.

24 citations

Journal ArticleDOI
TL;DR: In this paper, a cross-sectional baseline (2016-2018) data from the Adolescent Brain Cognitive Development (ABCD) Study (N=10,755) was analyzed to determine sociodemographic correlates of contemporary screen time use among a diverse population-based sample of 9-10-year-old children.

22 citations

Journal ArticleDOI
TL;DR: This paper used data from the Adolescent Brain Cognitive Development Study (ABCD) to assess the prevalence of perceived racism and discrimination among US children aged 10 through 11 years, and found that perceived racism was more prevalent among those aged 10 to 11.
Abstract: This cross-sectional study uses data from the Adolescent Brain Cognitive Development Study to assess the prevalence of perceived racism and discrimination among US children aged 10 through 11 years.

22 citations

Journal ArticleDOI
TL;DR: Clinicians should assess screen time usage and binge eating in children and adolescents and advise parents about the potential risks associated with excessive screen time.
Abstract: Objective To determine the prospective associations between contemporary screen time modalities in a nationally representative cohort of 9-10-year-old children and binge-eating disorder at one-year follow-up. Method We analyzed prospective cohort data from the Adolescent Brain Cognitive Development (ABCD) Study (N = 11,025). Logistic regression analyses were conducted to estimate associations between baseline child-reported screen time (exposure) and parent-reported binge-eating disorder based on the Kiddie Schedule for Affective Disorders and Schizophrenia (KSADS-5, outcome) at one-year follow-up, adjusting for race/ethnicity, sex, household income, parent education, BMI percentile, site, and baseline binge-eating disorder. Results Each additional hour of total screen time per day was prospectively associated with 1.11 higher odds of binge-eating disorder at 1-year follow-up (95% CI 1.05-1.18) after adjusting for covariates. In particular, each additional hour of social networking (aOR 1.62, 95% CI 1.18-2.22), texting (aOR 1.40, 95% CI 1.08-1.82), and watching/streaming television shows/movies (aOR 1.39, 95% CI 1.14-1.69) was significantly associated with binge-eating disorder. Discussion Clinicians should assess screen time usage and binge eating in children and adolescents and advise parents about the potential risks associated with excessive screen time.

21 citations

References
More filters
Journal ArticleDOI
TL;DR: A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
Abstract: Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results. Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alt...

3,495 citations


"A Guide for Population-based Analys..." refers background in this paper

  • ...…of sample inclusion or differential nonresponse, some authors have suggested that the weight variable itself be included as a covariate in the model specification as further protection against sample selectivity that may not be captured in the observed covariate controls (Rubin, 1996)....

    [...]

Journal ArticleDOI
TL;DR: A suite of quantitative and qualitative methods are described that allow one to assess whether measured baseline covariates are balanced between treatment groups in the weighted sample to contribute towards an evolving concept of ‘best practice’ when using IPTW to estimate causal treatment effects using observational data.
Abstract: The propensity score is defined as a subject's probability of treatment selection, conditional on observed baseline covariates. Weighting subjects by the inverse probability of treatment received creates a synthetic sample in which treatment assignment is independent of measured baseline covariates. Inverse probability of treatment weighting (IPTW) using the propensity score allows one to obtain unbiased estimates of average treatment effects. However, these estimates are only valid if there are no residual systematic differences in observed baseline characteristics between treated and control subjects in the sample weighted by the estimated inverse probability of treatment. We report on a systematic literature review, in which we found that the use of IPTW has increased rapidly in recent years, but that in the most recent year, a majority of studies did not formally examine whether weighting balanced measured covariates between treatment groups. We then proceed to describe a suite of quantitative and qualitative methods that allow one to assess whether measured baseline covariates are balanced between treatment groups in the weighted sample. The quantitative methods use the weighted standardized difference to compare means, prevalences, higher-order moments, and interactions. The qualitative methods employ graphical methods to compare the distribution of continuous baseline covariates between treated and control subjects in the weighted sample. Finally, we illustrate the application of these methods in an empirical case study. We propose a formal set of balance diagnostics that contribute towards an evolving concept of 'best practice' when using IPTW to estimate causal treatment effects using observational data.

2,602 citations


"A Guide for Population-based Analys..." refers methods in this paper

  • ...…population estimation and inference, ABCD has used a second weighting method that is closely related to the inverse propensity score weighting methodology (IPSW) that is employed to reduce confounding and estimate average treatment effects (ATE) from observational data (Austin and Stuart, 2015)....

    [...]

  • ...…attention paid to weighted estimation of multi-level models for observational data both in the context of inverse probability weighting (IPW) for exposure probability in estimates of treatment effects (Austin and Stuart, 2015) or in analysis of multi-level data (Pfeffermann, et al., 1998)....

    [...]

Book
22 Nov 2006
TL;DR: The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for theFinal Model Software Notes and Recommendations Other Analytic Approaches Recommendations.
Abstract: INTRODUCTION What Are Linear Mixed Models (LMMs)? A Brief History of Linear Mixed Models LINEAR MIXED MODELS: AN OVERVIEW Introduction Specification of LMMs The Marginal Linear Model Estimation in LMMs Computational Issues Tools for Model Selection Model-Building Strategies Checking Model Assumptions (Diagnostics) Other Aspects of LMMs Power Analysis for Linear Mixed Models Chapter Summary TWO-LEVEL MODELS FOR CLUSTERED DATA: THE RAT PUP EXAMPLE Introduction The Rat Pup Study Overview of the Rat Pup Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Estimating the Intraclass Correlation Coefficients (ICCs) Calculating Predicted Values Diagnostics for the Final Model Software Notes and Recommendations THREE-LEVEL MODELS FOR CLUSTERED DATA THE CLASSROOM EXAMPLE Introduction The Classroom Study Overview of the Classroom Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Estimating the Intraclass Correlation Coefficients (ICCs) Calculating Predicted Values Diagnostics for the Final Model Software Notes Recommendations MODELS FOR REPEATED-MEASURES DATA: THE RAT BRAIN EXAMPLE Introduction The Rat Brain Study Overview of the Rat Brain Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for the Final Model Software Notes Other Analytic Approaches Recommendations RANDOM COEFFICIENT MODELS FOR LONGITUDINAL DATA: THE AUTISM EXAMPLE Introduction The Autism Study Overview of the Autism Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Calculating Predicted Values Diagnostics for the Final Model Software Note: Computational Problems with the D Matrix An Alternative Approach: Fitting the Marginal Model with an Unstructured Covariance Matrix MODELS FOR CLUSTERED LONGITUDINAL DATA: THE DENTAL VENEER EXAMPLE Introduction The Dental Veneer Study Overview of the Dental Veneer Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for the Final Model Software Notes and Recommendations Other Analytic Approaches MODELS FOR DATA WITH CROSSED RANDOM FACTORS: THE SAT SCORE EXAMPLE Introduction The SAT Score Study Overview of the SAT Score Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Recommended Diagnostics for the Final Model Software Notes and Additional Recommendations APPENDIX A: STATISTICAL SOFTWARE RESOURCES APPENDIX B: CALCULATION OF THE MARGINAL VARIANCE-COVARIANCE MATRIX APPENDIX C: ACRONYMS/ABBREVIATIONS BIBLIOGRAPHY INDEX

1,680 citations

Book
01 Mar 2010
TL;DR: Complex Surveys is a practical guide to the analysis of complex survey data using R, the freely available and downloadable statistical programming language, and a practical reference guide for applied statisticians and practitioners in the social and health sciences who use statistics in their everyday work.
Abstract: A complete guide to carrying out complex survey analysis using R As survey analysis continues to serve as a core component of sociological research, researchers are increasingly relying upon data gathered from complex surveys to carry out traditional analyses. Complex Surveys is a practical guide to the analysis of this kind of data using R, the freely available and downloadable statistical programming language. As creator of the specific survey package for R, the author provides the ultimate presentation of how to successfully use the software for analyzing data from complex surveys while also utilizing the most current data from health and social sciences studies to demonstrate the application of survey research methods in these fields. The book begins with coverage of basic tools and topics within survey analysis such as simple and stratified sampling, cluster sampling, linear regression, and categorical data regression. Subsequent chapters delve into more technical aspects of complex survey analysis, including post-stratification, two-phase sampling, missing data, and causal inference. Throughout the book, an emphasis is placed on graphics, regression modeling, and two-phase designs. In addition, the author supplies a unique discussion of epidemiological two-phase designs as well as probability-weighting for causal inference. All of the book's examples and figures are generated using R, and a related Web site provides the R code that allows readers to reproduce the presented content. Each chapter concludes with exercises that vary in level of complexity, and detailed appendices outline additional mathematical and computational descriptions to assist readers with comparing results from various software systems. Complex Surveys is an excellent book for courses on sampling and complex surveys at the upper-undergraduate and graduate levels. It is also a practical reference guide for applied statisticians and practitioners in the social and health sciences who use statistics in their everyday work.

600 citations


"A Guide for Population-based Analys..." refers methods in this paper

  • ...  R software, Survey library (Lumley, 2010) http://www....

    [...]

  • ...The ABCD DEAP employs the R Survey Package (Lumley, 2010) as the default for producing descriptive estimates for the population of ABCD children....

    [...]

Related Papers (5)
Donald J. Hagler, Sean N. Hatton, M. Daniela Cornejo, Carolina Makowski, Damien A. Fair, Anthony Steven Dick, Matthew T. Sutherland, B. J. Casey, M Deanna, Michael P. Harms, Richard Watts, James M. Bjork, Hugh Garavan, Laura Hilmer, Christopher J. Pung, Chelsea S. Sicat, Joshua M. Kuperman, Hauke Bartsch, Feng Xue, Mary M. Heitzeg, Angela R. Laird, Thanh T. Trinh, Raul Gonzalez, Susan F. Tapert, Michael C. Riedel, Lindsay M. Squeglia, Luke W. Hyde, Monica D. Rosenberg, Eric Earl, Katia D. Howlett, Fiona C. Baker, Mary E. Soules, Jazmin Diaz, Octavio Ruiz de Leon, Wesley K. Thompson, Michael C. Neale, Megan M. Herting, Elizabeth R. Sowell, Ruben P. Alvarez, Samuel W. Hawes, Mariana Sanchez, Jerzy Bodurka, Florence J. Breslin, Amanda Sheffield Morris, Martin P. Paulus, W. Kyle Simmons, Jonathan R. Polimeni, Andre van der Kouwe, Andrew S. Nencka, Kevin M. Gray, Carlo Pierpaoli, John A. Matochik, Antonio Noronha, Will M. Aklin, Kevin P. Conway, Meyer D. Glantz, Elizabeth Hoffman, Roger Little, Marsha F. Lopez, Vani Pariyadath, Susan R.B. Weiss, Dana L. Wolff-Hughes, Rebecca DelCarmen-Wiggins, Sarah W. Feldstein Ewing, Oscar Miranda-Dominguez, Bonnie J. Nagel, Anders Perrone, Darrick Sturgeon, Aimee Goldstone, Adolf Pfefferbaum, Kilian M. Pohl, Devin Prouty, Kristina A. Uban, Susan Y. Bookheimer, Mirella Dapretto, Adriana Galván, Kara Bagot, Jay N. Giedd, M. Alejandra Infante, Joanna Jacobus, Kevin Patrick, Paul D. Shilling, Rahul S. Desikan, Yi Li, Leo P. Sugrue, Marie T. Banich, Naomi P. Friedman, John K. Hewitt, Christian J. Hopfer, Joseph T. Sakai, Jody Tanabe, Linda B. Cottler, Sara Jo Nixon, Linda Chang, Christine C. Cloak, Thomas Ernst, Gloria Reeves, David N. Kennedy, Steve Heeringa, Scott Peltier, John E. Schulenberg, Chandra Sripada, Robert A. Zucker, William G. Iacono, Monica Luciana, Finnegan J. Calabro, Duncan B. Clark, David A. Lewis, Beatriz Luna, Claudiu Schirda, Tufikameni Brima, John J. Foxe, Edward G. Freedman, Daniel W. Mruzek, Michael J. Mason, Rebekah S. Huber, Erin McGlade, Andrew P. Prescot, Perry F. Renshaw, Deborah A. Yurgelun-Todd, Nicholas Allgaier, Julie A. Dumas, Masha Y. Ivanova, Alexandra Potter, Paul Florsheim, Christine L. Larson, Krista M. Lisdahl, Michael E. Charness, Michael E. Charness, Michael E. Charness, Bernard F. Fuemmeler, John M. Hettema, Hermine H. Maes, Joel L. Steinberg, Andrey P. Anokhin, Paul E.A. Glaser, Andrew C. Heath, Pamela A. F. Madden, Arielle R. Baskin-Sommers, R. Todd Constable, Steven Grant, Gayathri J. Dowling, Sandra A. Brown, Terry L. Jernigan, Anders M. Dale 
Trending Questions (2)
What the demographic variates people focus in the papers about ADHD using ABCD data?

The papers focus on demographic and socio-economic diversity, clustering, selection bias, and propensity weighting methods in analyzing ADHD using ABCD data.

What the demographic features people focus in the papers about ADHD using ABCD data?

The papers focus on demographic and socio-economic diversity of U.S. children aged 9-10 in the ABCD study, using methods like multi-level modeling and propensity weighting to analyze baseline data.