A Guide for Population-based Analysis of the Adolescent Brain Cognitive Development (ABCD) Study Baseline Data

doi:10.1101/2020.02.10.942011



1



A Guide for Population-based Analysis of the Adolescent Brain Cognitive

Development (ABCD) Study Baseline Data

Steven G. Heeringa and Patricia A. Berglund.

Institute for Social Research, University of Michigan

June, 2019

Abstract: ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline,

recruited at random from the household populations in defined catchment areas for each of 21

study sites. The 21 geographic locations that comprise the ABCD research sites are nationally

distributed and generally represent the range of demographic and socio-economic diversity of the

U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and

the potential for selection bias in study site selection and enrollment are features of the ABCD

observational study design that are informative for statistical estimation and inference. Both

multi-level modeling and robust survey design-based methods can be used to account for

clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical

models and propensity weighting methods that calibrate ABCD weighted distributions to

nationally-representative controls from the American Community Survey (ACS) can be

employed in analysis to account for known informative sample design features or to attenuate

potential demographic and socio-economic selection bias in the national sampling and

recruitment of eligible children. This guide will present results of an empirical investigation of

the ABCD baseline data that compares the statistical efficiency of multi-level modeling and

distribution-free design-based approaches—both weighted and unweighted--to analyses of the

ABCD baseline data. Specific recommendations will be provided for researchers on robust,

efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.

I. Introduction

The Adolescent Brain Cognitive Development Study (ABCD) is a prospective cohort study of a

baseline sample of U.S. children born during the period 2006-2008. Eligible children, ages 9-

10, were recruited from the household populations in defined catchment areas for each of 21

study sites during the roughly two year period beginning September 2016 and ending in October

of 2018. Within study sites, consenting parents and assenting children were primarily recruited

through a probability sample of public and private schools augmented to a small extent by

special recruitment through summer camp programs and community volunteers. Approximately

9500 eligible, single-born children and 1600 eligible twins completed the ABCD baseline

imaging studies and assessments. The sample design and procedures employed in the

recruitment of the baseline sample are described in detail in Garavan, et al. (2018).

This methodological paper describes alternative approaches to analysis of the rich array of social,

behavioral, environmental, genetic and summary-level neuroimaging data that is collected in the

.CC-BY-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint



2



ABCD study. Section 2 will attempt to frame a response to the broad question, why should

ABCD analysts be concerned about estimation and inference for the population of U.S.

children—does external validity matter? Features of the ABCD design and data that are

statistically “informative” and complicate population estimation and inference are the subject of

Section 3. Section 4 will attempt to address the specific methodological question, “If inference

to the U.S. population is important, what are the appropriate choices of methods for estimating

population characteristics and relationships based on the ABCD data?”, describing both model-

based and design-based approaches to ABCD estimation and inference. A summary of the

general demographic and socio-economic characteristics for the ABCD baseline cohort before

any weighting adjustments are applied is presented in Section 5. Section 6 describes the

propensity-based weighting adjustment methodology that is used to calibrate the baseline sample

cohort to key demographic and socio-economic distributions for U.S. children ages 9 and 10

estimated from the American Community Survey (ACS). Section 7 presents results of an

empirical investigation of the ABCD baseline data that compares the statistical efficiency of

multi-level modeling and distribution-free design-based approaches—both weighted and

unweighted--to analyses of the ABCD baseline data. The paper concludes with specific

recommendations for researchers on approaches to both descriptive and multivariate analyses of

the ABCD baseline data. Appendices to this paper will contain illustrations of recommended

command syntax for analysis of the ABCD data using the major software packages.

2. Population orientation to ABCD analysis

As defined in Garavan et al.(2018), the label “population neuroscience “ when applied to

observational studies such as ABCD refers to the application of epidemiological research

practices including large-scale representative samples to assessments of target populations. It is

a study in neuroscience in that it focuses on brain and neurological system development,

morphology and function. It is a population study in that observational data are gathered in

such a way that they can be used to understand real population distributions and the biological,

familial, social and environmental factors that can govern how individuals actually live and

grow in today’s society.

From the outset, ABCD’s primary sponsor, the National Institute of Drug Abuse (NIDA) and

the ABCD scientific investigators were motivated to develop a baseline sample that reflected the

sociodemographic variation present in the U.S. population of 9 and 10 year-old children. ABCD

is an observational study sharing many aspects of its longitudinal design with existing

population-based survey programs such as the National Longitudinal Study of Adolescent to

Adult Health (Add Health,https://www.cpc.unc.edu/projects/addhealth), the Early Childhood

Longitudinal Surveys (ECLS, https://nces.ed.gov/ecls/) or the Child Development Supplement

(CDS,http://src.isr.umich.edu/src/child‐development/home.html) to the Panel Study of Income

Dynamics (PSID).

.CC-BY-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint



3



Population representativeness or more correctly, absence of uncorrected selective or informative

bias in the subject pool, is important in achieving external validity—the ability to generalize

specific results of the study to the world at large. However, even with good, representative

samples of populations, failure to measure or control key factors or to recognize important

moderating and or mediating relationships can impact external validity of study findings. The

ABCD data are observational and although propensity-based methods may be used to control for

characteristics of “treated” and “control” participants, in the strictest sense insights gained from

the data—even in longitudinal studies such as ABCD—will be associative.

The ABCD baseline recruitment effort worked very hard to maintain a nationally distributed set

of controls on the age, sex and race/ethnicity of the children in the study. In year 2, additional

monitoring and targeted recruitment were put in place to raise the proportion of children from

lower income families. The predominantly probability sampling methodology for recruiting

children within each study site was intended to randomize over confounding factors that were not

explicitly controlled (or subsequently reflected in the propensity weighting). Nevertheless,

school consent and parental consent were strong forces that certainly may have altered the

effectiveness of the randomization over these uncontrolled confounders.

The purpose of covariate adjustments in models or the propensity weighting described below in

Section 6 is in fact to control specific sources of selection bias and restore unbiasedness to

descriptive and analytical estimates of the population characteristics and relationships. For many

measures of substantive interest, the success of this effort will never be fully known except in

rare cases where comparative national benchmarks exist (e.g. children's height) from

administrative records or very large surveys or population censuses. The effectiveness of

weighting adjustments to eliminate bias in population estimates depends of course on the

relationship of the substantive variable of interest (e.g. amygdala volume) to the variables that

were explicitly used to derive the propensity weights, namely age, sex, race/ethnicity, family

type, parental employment status, family size and Census region. These are the types of

variables that are available and are identically measured in a national source (American

Community Survey) and ABCD. It would have been ideal to have detailed population level data

on many other characteristics that may be highly correlated with the ABCD variable of interest

(e.g. the child's parents' amygdala volume when mom and dad were age 9,10). Only rarely and

in large two-phase studies will we ever have population level statistical controls of this nature for

a small group such as 9,10 year olds.

"Representative" is a strong adjective to apply to any data set. The accuracy of the descriptor

will vary by variable, by subpopulation and by the extent to which the weighting methodology or

model covariates capture factors that truly affect the outcome of interest (both in terms of the

variables and their functional relationship to the outcome). All forms of statistical estimation

and inference make assumptions. No study gets an uncontestable stamp of approval on the

unbiasedness of their survey estimates. In both approaches—propensity weighting or covariate

.CC-BY-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint



4



adjustment in modeling—it is easy to overlook a selective factor that influences the outcome or

modifies the effect of other variables. That is an inherent challenge in population inference from

a national study such as ABCD. The position that we take here is that multilevel models that

include appropriate statistical controls for demographic and socio-economic factors or propensity

weighted estimates of descriptive statistics from the ABCD baseline are in fact publishable

estimates for the population of U.S. children so long as authors acknowledge the design and

accurately describe the underlying methodology and its assumptions.

3. Properties of the ABCD design and data to consider in analysis.

This section describes three features of the ABCD design that must be considered in any analysis

of the baseline data.

Clustering and non-independence of observations: Cohort recruitment for the ABCD study

design was distinguished by the constraint that eligible children must live within reasonable

travel distance (e.g. 50 miles) of a major medical center or research facility where MRI and

fMRI imaging could be performed. The geographically-clustered observations on individual

children are not independent and the intraclass (“intra-site”) correlations for the many variables

must be accounted for to correctly estimate variances of descriptive estimates and analytical

model parameters. Correlations among the ABCD observations for individual children are also

introduced by other sources of clustering in the ABCD recruitment and measurement protocols:

selection of multiple students from schools, multiple children (including twins) recruited from

the same family, multiple children imaged on the same MRI scanner.

Selection bias in site choice and within-site subject enrollment: While the 21 geographic

locations that comprise the ABCD research sites are nationally distributed and generally

represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that

comprise the ABCD study population, in the restricted sense they do not constitute the primary

stage of a multi-stage probability sample such as those employed in major population-based

epidemiological surveys. To achieve population representativeness for statistical analyses, a

mechanism (e.g. modeling site characteristics, assuming pseudo-randomization) is needed to

calibrate the broader geographic, demographic and socio-economic characteristics of the set of

21 sites to the larger U.S. population framework (Olsen et al., 2013).

As described in Garavan et al. (2018), within each of the 21 ABCD study sites, a probability

sample of the public and private schools was selected as the basis for the recruitment of the

majority of eligible children to the ABCD baseline cohort. Although this school-based

recruitment approach within each site introduced randomization to the sample of students who

could be recruited to ABCD, the process of obtaining school cooperation and then parental

.CC-BY-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint



5



consent could selectively impact the final characteristics of the sample that was actually

observed. The following sections will describe two approaches, propensity-based weighting and

use of appropriate covariate controls in modeling, that aim to address potential selectivity that

may have entered the ABCD cohort through the site election or school/parental consent gateways

to actual study participation.

Special twin supplement: A final feature of the ABCD design that deserves attention in the

analysis of the baseline cohort data is the special oversample of twin pairs in four of the 21

ABCD sites. Although twins were eligible to be recruited in all sites that used the school-based

recruitment sampling methodology, in the four special twin sites supplemental samples of 150-

250 twin pairs per site were enrolled in ABCD using samples selected from state registries

(Garavan et al., 2018). These special samples of twin pairs can be distinguished in the final

baseline cohort of n=11,874 children; however, the study has chosen not to explicitly segregate

these twin data from the general population sample of single births and incidental twins recruited

through the school-based sampling protocol.

By a default decision of the study team, the propensity-based population weighting methodology

described in Section 6 and incorporated in the ABCD Data Exploration and Analysis Portal

(DEAP) descriptive estimation does assume a pooled analysis of the general and special twin

samples. Section 7 will apply multiple analytic approaches to investigate this assumption that

the special twin samples are in fact “exchangeable” with the ABCD general population sample.

4. Design-based and model-based approaches to ABCD analysis

Analysts may choose several approaches to estimation and inference that address the challenges

posed by the clustering, selection bias and special twin sample properties of the ABCD data.

The first approach is to assume that the multi-stage sample selection for ABCD follows a quasi-

probability design and employ design-based methodology similar to that typically used to

analyze large probability sample epidemiological surveys such as the U.S. National Health and

Nutrition Examination Survey (NHANES). Designed-based analysis will employ population

weighting to estimate population statistics and model parameters and non-parametric methods

(Taylor Series Linearization, Jackknife, and Bootstrap) to compute robust estimates of standard

errors. Any quasi-probability approach to analysis the ABCD data requires a minimum of two

things: 1) assignment of cases to ultimate cluster (UC) groupings to account for non-

independence of observations; and 2) modeling to derive case-specific analysis weights that

account for differential selection factors and permits the observed sample to be “mapped” to the

U.S. population of interest (Heeringa, et al., 2017).

.CC-BY-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted February 10, 2020. ; https://doi.org/10.1101/2020.02.10.942011doi: bioRxiv preprint

A Guide for Population-based Analysis of the Adolescent Brain Cognitive Development (ABCD) Study Baseline Data

Figures

Citations

The ABCD study: understanding the development of risk for mental and physical health outcomes.

Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study.

Early Adolescent Substance Use Before and During the COVID-19 Pandemic: A Longitudinal Survey in the ABCD Study Cohort.

Criterion validity and relationships between alternative hierarchical dimensional models of general and specific psychopathology.

Meaningful Effects in the Adolescent Brain Cognitive Development Study

References

Multiple Imputation After 18+ Years

Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies

Linear Mixed Models: A Practical Guide Using Statistical Software

With contributions from

Complex Surveys: A Guide to Analysis Using R

Related Papers (5)

Recruiting the ABCD sample: Design considerations and procedures

Demographic, physical and mental health assessments in the adolescent brain and cognitive development study: Rationale and description

The conception of the ABCD study: From substance use to a broad NIH collaboration

The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites.

Image processing and analysis methods for the Adolescent Brain Cognitive Development Study.

Trending Questions (2)