scispace - formally typeset
Open AccessPosted ContentDOI

A Guide for Population-based Analysis of the Adolescent Brain Cognitive Development (ABCD) Study Baseline Data

Reads0
Chats0
TLDR
This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of theABCD baselineData.
Abstract
ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline, recruited at random from the household populations in defined catchment areas for each of 21 study sites. The 21 geographic locations that comprise the ABCD research sites are nationally distributed and generally represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and the potential for selection bias in study site selection and enrollment are features of the ABCD observational study design that are informative for statistical estimation and inference. Both multi-level modeling and robust survey design-based methods can be used to account for clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical models and propensity weighting methods that calibrate ABCD weighted distributions to nationally-representative controls from the American Community Survey (ACS) can be employed in analysis to account for known informative sample design features or to attenuate potential demographic and socio-economic selection bias in the national sampling and recruitment of eligible children. This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of the ABCD baseline data. Specific recommendations will be provided for researchers on robust, efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.

read more

Content maybe subject to copyright    Report

1
A Guide for Population-based Analysis of the Adolescent Brain Cognitive
Development (ABCD) Study Baseline Data
Steven G. Heeringa and Patricia A. Berglund.
Institute for Social Research, University of Michigan
June, 2019
Abstract: ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline,
recruited at random from the household populations in defined catchment areas for each of 21
study sites. The 21 geographic locations that comprise the ABCD research sites are nationally
distributed and generally represent the range of demographic and socio-economic diversity of the
U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and
the potential for selection bias in study site selection and enrollment are features of the ABCD
observational study design that are informative for statistical estimation and inference. Both
multi-level modeling and robust survey design-based methods can be used to account for
clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical
models and propensity weighting methods that calibrate ABCD weighted distributions to
nationally-representative controls from the American Community Survey (ACS) can be
employed in analysis to account for known informative sample design features or to attenuate
potential demographic and socio-economic selection bias in the national sampling and
recruitment of eligible children. This guide will present results of an empirical investigation of
the ABCD baseline data that compares the statistical efficiency of multi-level modeling and
distribution-free design-based approaches—both weighted and unweighted--to analyses of the
ABCD baseline data. Specific recommendations will be provided for researchers on robust,
efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.
I. Introduction
The Adolescent Brain Cognitive Development Study (ABCD) is a prospective cohort study of a
baseline sample of U.S. children born during the period 2006-2008. Eligible children, ages 9-
10, were recruited from the household populations in defined catchment areas for each of 21
study sites during the roughly two year period beginning September 2016 and ending in October
of 2018. Within study sites, consenting parents and assenting children were primarily recruited
through a probability sample of public and private schools augmented to a small extent by
special recruitment through summer camp programs and community volunteers. Approximately
9500 eligible, single-born children and 1600 eligible twins completed the ABCD baseline
imaging studies and assessments. The sample design and procedures employed in the
recruitment of the baseline sample are described in detail in Garavan, et al. (2018).
This methodological paper describes alternative approaches to analysis of the rich array of social,
behavioral, environmental, genetic and summary-level neuroimaging data that is collected in the
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

2
ABCD study. Section 2 will attempt to frame a response to the broad question, why should
ABCD analysts be concerned about estimation and inference for the population of U.S.
children—does external validity matter? Features of the ABCD design and data that are
statistically “informative” and complicate population estimation and inference are the subject of
Section 3. Section 4 will attempt to address the specific methodological question, “If inference
to the U.S. population is important, what are the appropriate choices of methods for estimating
population characteristics and relationships based on the ABCD data?”, describing both model-
based and design-based approaches to ABCD estimation and inference. A summary of the
general demographic and socio-economic characteristics for the ABCD baseline cohort before
any weighting adjustments are applied is presented in Section 5. Section 6 describes the
propensity-based weighting adjustment methodology that is used to calibrate the baseline sample
cohort to key demographic and socio-economic distributions for U.S. children ages 9 and 10
estimated from the American Community Survey (ACS). Section 7 presents results of an
empirical investigation of the ABCD baseline data that compares the statistical efficiency of
multi-level modeling and distribution-free design-based approaches—both weighted and
unweighted--to analyses of the ABCD baseline data. The paper concludes with specific
recommendations for researchers on approaches to both descriptive and multivariate analyses of
the ABCD baseline data. Appendices to this paper will contain illustrations of recommended
command syntax for analysis of the ABCD data using the major software packages.
2. Population orientation to ABCD analysis
As defined in Garavan et al.(2018), the label “population neuroscience “ when applied to
observational studies such as ABCD refers to the application of epidemiological research
practices including large-scale representative samples to assessments of target populations. It is
a study in neuroscience in that it focuses on brain and neurological system development,
morphology and function. It is a population study in that observational data are gathered in
such a way that they can be used to understand real population distributions and the biological,
familial, social and environmental factors that can govern how individuals actually live and
grow in today’s society.
From the outset, ABCD’s primary sponsor, the National Institute of Drug Abuse (NIDA) and
the ABCD scientific investigators were motivated to develop a baseline sample that reflected the
sociodemographic variation present in the U.S. population of 9 and 10 year-old children. ABCD
is an observational study sharing many aspects of its longitudinal design with existing
population-based survey programs such as the National Longitudinal Study of Adolescent to
Adult Health (Add Health,https://www.cpc.unc.edu/projects/addhealth), the Early Childhood
Longitudinal Surveys (ECLS, https://nces.ed.gov/ecls/) or the Child Development Supplement
(CDS,http://src.isr.umich.edu/src/childdevelopment/home.html) to the Panel Study of Income
Dynamics (PSID).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

3
Population representativeness or more correctly, absence of uncorrected selective or informative
bias in the subject pool, is important in achieving external validity—the ability to generalize
specific results of the study to the world at large. However, even with good, representative
samples of populations, failure to measure or control key factors or to recognize important
moderating and or mediating relationships can impact external validity of study findings. The
ABCD data are observational and although propensity-based methods may be used to control for
characteristics of “treated” and “control” participants, in the strictest sense insights gained from
the data—even in longitudinal studies such as ABCD—will be associative.
The ABCD baseline recruitment effort worked very hard to maintain a nationally distributed set
of controls on the age, sex and race/ethnicity of the children in the study. In year 2, additional
monitoring and targeted recruitment were put in place to raise the proportion of children from
lower income families. The predominantly probability sampling methodology for recruiting
children within each study site was intended to randomize over confounding factors that were not
explicitly controlled (or subsequently reflected in the propensity weighting). Nevertheless,
school consent and parental consent were strong forces that certainly may have altered the
effectiveness of the randomization over these uncontrolled confounders.
The purpose of covariate adjustments in models or the propensity weighting described below in
Section 6 is in fact to control specific sources of selection bias and restore unbiasedness to
descriptive and analytical estimates of the population characteristics and relationships. For many
measures of substantive interest, the success of this effort will never be fully known except in
rare cases where comparative national benchmarks exist (e.g. children's height) from
administrative records or very large surveys or population censuses. The effectiveness of
weighting adjustments to eliminate bias in population estimates depends of course on the
relationship of the substantive variable of interest (e.g. amygdala volume) to the variables that
were explicitly used to derive the propensity weights, namely age, sex, race/ethnicity, family
type, parental employment status, family size and Census region. These are the types of
variables that are available and are identically measured in a national source (American
Community Survey) and ABCD. It would have been ideal to have detailed population level data
on many other characteristics that may be highly correlated with the ABCD variable of interest
(e.g. the child's parents' amygdala volume when mom and dad were age 9,10). Only rarely and
in large two-phase studies will we ever have population level statistical controls of this nature for
a small group such as 9,10 year olds.
"Representative" is a strong adjective to apply to any data set. The accuracy of the descriptor
will vary by variable, by subpopulation and by the extent to which the weighting methodology or
model covariates capture factors that truly affect the outcome of interest (both in terms of the
variables and their functional relationship to the outcome). All forms of statistical estimation
and inference make assumptions. No study gets an uncontestable stamp of approval on the
unbiasedness of their survey estimates. In both approaches—propensity weighting or covariate
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

4
adjustment in modeling—it is easy to overlook a selective factor that influences the outcome or
modifies the effect of other variables. That is an inherent challenge in population inference from
a national study such as ABCD. The position that we take here is that multilevel models that
include appropriate statistical controls for demographic and socio-economic factors or propensity
weighted estimates of descriptive statistics from the ABCD baseline are in fact publishable
estimates for the population of U.S. children so long as authors acknowledge the design and
accurately describe the underlying methodology and its assumptions.
3. Properties of the ABCD design and data to consider in analysis.
This section describes three features of the ABCD design that must be considered in any analysis
of the baseline data.
Clustering and non-independence of observations: Cohort recruitment for the ABCD study
design was distinguished by the constraint that eligible children must live within reasonable
travel distance (e.g. 50 miles) of a major medical center or research facility where MRI and
fMRI imaging could be performed. The geographically-clustered observations on individual
children are not independent and the intraclass (“intra-site”) correlations for the many variables
must be accounted for to correctly estimate variances of descriptive estimates and analytical
model parameters. Correlations among the ABCD observations for individual children are also
introduced by other sources of clustering in the ABCD recruitment and measurement protocols:
selection of multiple students from schools, multiple children (including twins) recruited from
the same family, multiple children imaged on the same MRI scanner.
Selection bias in site choice and within-site subject enrollment: While the 21 geographic
locations that comprise the ABCD research sites are nationally distributed and generally
represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that
comprise the ABCD study population, in the restricted sense they do not constitute the primary
stage of a multi-stage probability sample such as those employed in major population-based
epidemiological surveys. To achieve population representativeness for statistical analyses, a
mechanism (e.g. modeling site characteristics, assuming pseudo-randomization) is needed to
calibrate the broader geographic, demographic and socio-economic characteristics of the set of
21 sites to the larger U.S. population framework (Olsen et al., 2013).
As described in Garavan et al. (2018), within each of the 21 ABCD study sites, a probability
sample of the public and private schools was selected as the basis for the recruitment of the
majority of eligible children to the ABCD baseline cohort. Although this school-based
recruitment approach within each site introduced randomization to the sample of students who
could be recruited to ABCD, the process of obtaining school cooperation and then parental
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

5
consent could selectively impact the final characteristics of the sample that was actually
observed. The following sections will describe two approaches, propensity-based weighting and
use of appropriate covariate controls in modeling, that aim to address potential selectivity that
may have entered the ABCD cohort through the site election or school/parental consent gateways
to actual study participation.
Special twin supplement: A final feature of the ABCD design that deserves attention in the
analysis of the baseline cohort data is the special oversample of twin pairs in four of the 21
ABCD sites. Although twins were eligible to be recruited in all sites that used the school-based
recruitment sampling methodology, in the four special twin sites supplemental samples of 150-
250 twin pairs per site were enrolled in ABCD using samples selected from state registries
(Garavan et al., 2018). These special samples of twin pairs can be distinguished in the final
baseline cohort of n=11,874 children; however, the study has chosen not to explicitly segregate
these twin data from the general population sample of single births and incidental twins recruited
through the school-based sampling protocol.
By a default decision of the study team, the propensity-based population weighting methodology
described in Section 6 and incorporated in the ABCD Data Exploration and Analysis Portal
(DEAP) descriptive estimation does assume a pooled analysis of the general and special twin
samples. Section 7 will apply multiple analytic approaches to investigate this assumption that
the special twin samples are in fact “exchangeable” with the ABCD general population sample.
4. Design-based and model-based approaches to ABCD analysis
Analysts may choose several approaches to estimation and inference that address the challenges
posed by the clustering, selection bias and special twin sample properties of the ABCD data.
The first approach is to assume that the multi-stage sample selection for ABCD follows a quasi-
probability design and employ design-based methodology similar to that typically used to
analyze large probability sample epidemiological surveys such as the U.S. National Health and
Nutrition Examination Survey (NHANES). Designed-based analysis will employ population
weighting to estimate population statistics and model parameters and non-parametric methods
(Taylor Series Linearization, Jackknife, and Bootstrap) to compute robust estimates of standard
errors. Any quasi-probability approach to analysis the ABCD data requires a minimum of two
things: 1) assignment of cases to ultimate cluster (UC) groupings to account for non-
independence of observations; and 2) modeling to derive case-specific analysis weights that
account for differential selection factors and permits the observed sample to be “mapped” to the
U.S. population of interest (Heeringa, et al., 2017).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

The ABCD study: understanding the development of risk for mental and physical health outcomes.

TL;DR: How the Adolescent Brain Cognitive Development Study was designed to elucidate factors associated with the development of negative mental and physical health outcomes is outlined and a selective overview of results emerging from the ABCD Study is provided.
Journal ArticleDOI

Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study.

TL;DR: In this article, Pearson's correlations among 161 variables representing constructs from all questionnaires and tasks from the Adolescent Brain and Cognitive Development Study® baseline data were used to describe the distribution of effect sizes across multiple instruments, consider factors qualifying the effect size distribution and identify examples as benchmarks for various effect sizes.
Journal ArticleDOI

Criterion validity and relationships between alternative hierarchical dimensional models of general and specific psychopathology.

TL;DR: Evaluating parent symptom ratings of 9-10 year olds in the ABCD Study indicated that all factors in both bifactor and second-order models exhibited at least adequate construct reliability and estimated replicability, and the interpretation of such associations in second-orders was ambiguous due to shared variance among factors.
References
More filters
Journal ArticleDOI

Multiple Imputation After 18+ Years

TL;DR: A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
Journal ArticleDOI

Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies

TL;DR: A suite of quantitative and qualitative methods are described that allow one to assess whether measured baseline covariates are balanced between treatment groups in the weighted sample to contribute towards an evolving concept of ‘best practice’ when using IPTW to estimate causal treatment effects using observational data.
Book

Linear Mixed Models: A Practical Guide Using Statistical Software

TL;DR: The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for theFinal Model Software Notes and Recommendations Other Analytic Approaches Recommendations.
Book

Complex Surveys: A Guide to Analysis Using R

Thomas Lumley
TL;DR: Complex Surveys is a practical guide to the analysis of complex survey data using R, the freely available and downloadable statistical programming language, and a practical reference guide for applied statisticians and practitioners in the social and health sciences who use statistics in their everyday work.
Related Papers (5)

Image processing and analysis methods for the Adolescent Brain Cognitive Development Study.

Donald J. Hagler, +144 more
- 15 Nov 2019 - 
Trending Questions (2)
What the demographic variates people focus in the papers about ADHD using ABCD data?

The papers focus on demographic and socio-economic diversity, clustering, selection bias, and propensity weighting methods in analyzing ADHD using ABCD data.

What the demographic features people focus in the papers about ADHD using ABCD data?

The papers focus on demographic and socio-economic diversity of U.S. children aged 9-10 in the ABCD study, using methods like multi-level modeling and propensity weighting to analyze baseline data.