# A Guide for Population-based Analysis of the Adolescent Brain Cognitive Development (ABCD) Study Baseline Data

TL;DR: This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of theABCD baselineData.

Abstract: ABCD is a longitudinal, observational study of U.S. children, ages 9-10 at baseline, recruited at random from the household populations in defined catchment areas for each of 21 study sites. The 21 geographic locations that comprise the ABCD research sites are nationally distributed and generally represent the range of demographic and socio-economic diversity of the U.S. birth cohorts that comprise the ABCD study population. The clustering of participants and the potential for selection bias in study site selection and enrollment are features of the ABCD observational study design that are informative for statistical estimation and inference. Both multi-level modeling and robust survey design-based methods can be used to account for clustering of sampled ABCD children in the 21 study sites. Covariate controls in analytical models and propensity weighting methods that calibrate ABCD weighted distributions to nationally-representative controls from the American Community Survey (ACS) can be employed in analysis to account for known informative sample design features or to attenuate potential demographic and socio-economic selection bias in the national sampling and recruitment of eligible children. This guide will present results of an empirical investigation of the ABCD baseline data that compares the statistical efficiency of multi-level modeling and distribution-free design-based approaches—both weighted and unweighted--to analyses of the ABCD baseline data. Specific recommendations will be provided for researchers on robust, efficient approaches to both descriptive and multivariate analyses of the ABCD baseline data.

## Summary (3 min read)

### I. Introduction

- The Adolescent Brain Cognitive Development Study (ABCD) is a prospective cohort study of a baseline sample of U.S. children born during the period 2006-2008.
- Eligible children, ages 9-10, were recruited from the household populations in defined catchment areas for each of 21 study sites during the roughly two year period beginning September 2016 and ending in October of 2018.
- This methodological paper describes alternative approaches to analysis of the rich array of social, behavioral, environmental, genetic and summary-level neuroimaging data that is collected in the ABCD study.
- Features of the ABCD design and data that are statistically "informative" and complicate population estimation and inference are the subject of Section 3.

### 2. Population orientation to ABCD analysis

- As described in Garavan et al. (2018) , within each of the 21 ABCD study sites, a probability sample of the public and private schools was selected as the basis for the recruitment of the majority of eligible children to the ABCD baseline cohort.
- The process of obtaining school cooperation and then parental consent could selectively impact the final characteristics of the sample that was actually observed.
- The following sections will describe two approaches, propensity-based weighting and use of appropriate covariate controls in modeling, that aim to address potential selectivity that may have entered the ABCD cohort through the site election or school/parental consent gateways to actual study participation.

### 5. Properties of the ABCD Baseline Sample Cohort in Comparison to ACS

- The unweighted distribution of reported annual family incomes for the ABCD baseline cohort differs from the nationally representative ACS estimates for the U.S. population of 9, 10 year olds.
- In nominal dollars, the family incomes of the ABCD children are higher on average than ACS estimates for the comparable U.S.
- The ABCD Passive Data Work Group is currently in the process of acquiring external data on school, community and environmental characteristics that can be linked to individual child data and used to analyze the role that these contextual effects may have on the current status and development trajectories of the children in the ABCD baseline cohort.
- At this stage, the propensity-based population weighting methodology described in the next section does not incorporate calibration based on detailed characteristics of children's residences, schools or communities.

### 6. Weighting the ABCD Sample to ACS Population Controls

- Following the step of trimming the extremes of the weight distribution, the R Rake iterative proportional fitting algorithm was used to "rake" the trimmed initial weights to exact ACS population counts for the marginal categories of: age (9,10), sex(female, male), and race/ethnicity (Hispanic, Black, White, Asian and all Other persons)-see Table 3 .
- Figure 1 is a histogram display of the frequency distribution of the final population weights for the ABCD baseline children.
- Figure 2 provides a boxplot comparison of the distribution of weights separately for boys and girls.
- The Figure 3 boxplots of weights by family income category show a very different pattern.
- Compared to the national population, children from families with lower incomes are underrepresented and the population weights for children in these lower income categories have higher average values and a greater variance than the weights for the children from higher income families.

### 7. Comparison of Analysis Methods

- The comparative results for these two regression models suggest that when the special ABCD twin sample data are pooled with the general population sample and a LMM approach is used it is important to apply the three level DEAP model that includes a level two contribution for clustering within family unit.
- When a two level model is applied to these pooled data and family level clustering is ignored, the results of these example analyses suggest the parameter estimates will be attenuated and estimated standard errors will be seriously overestimated.
- If the two level LMM is fitted using only data for the general population sample (excluding the special twin sample cases), the resulting parameters estimates and standard errors are more consistent with those for the three level model.

### 7.B.3 Three-level LMM vs. Design-based Population weighted LS and Robust SEs

- As noted above, the unweighted LMM and weighted design-based approaches compared in Table 7 aim to capture/model the complex variance structure of clustering and non-independence of the baseline observations for the ABCD child cohort.
- The design-based estimation approaches employ the population weights described in Section 6 above and use a weighted least squares (WLS) methodology to estimate the population regression parameters.
- Unlike the LMM approach, the components of variance associated with each level of clustering are estimated as a single weighted aggregate for the residual variance and not as individual components of variance attributable to each level of the clustering.
- Poisson Regression-Comparison of model fitting methods, also known as 7.C Generalized Linear Model.
- Here again, as in the previous comparisons based on the linear regression model, the three-level DEAP LMM and the design-based estimation for the pooled data show minor differences in the estimated relative risks and confidence intervals but the magnitude of these differences would not be judged to be substantively important.

### 8. Summary: Recommendations for research analysts

- Researchers are encouraged to consider each of the informative features of the ABCD (clustering, sample selectivity, twin sample pooling) as they may apply to their analytic aims.
- Sensitivity analyses such as those underlying the comparisons in Section 7 should provide good insight into the degree to which results for descriptive estimates or fitted models are influenced by clustering, weighting and twin sample pooling.

Did you find this useful? Give us your feedback

...read more

##### Citations

34 citations

### Cites background from "A Guide for Population-based Analys..."

..., nesting ABCD Study site and family) [76]....

[...]

^{1}, University of Rochester Medical Center

^{2}, University of Missouri

^{3}, University of Michigan

^{4}, University of California, San Diego

^{5}, Haukeland University Hospital

^{6}, University of Massachusetts Medical School

^{7}, Children's Hospital Los Angeles

^{8}, University of Oxford

^{9}, Washington University in St. Louis

^{10}, University of Vermont

^{11}, National Institute on Drug Abuse

^{12}, Virginia Commonwealth University

^{13}, Johns Hopkins University

^{14}, McGovern Institute for Brain Research

^{15}

22 citations

^{1}, University of Rochester Medical Center

^{2}, University of Missouri

^{3}, University of Michigan

^{4}, University of California, San Diego

^{5}, Haukeland University Hospital

^{6}, University of Massachusetts Medical School

^{7}, University of Southern California

^{8}, University of Oxford

^{9}, Washington University in St. Louis

^{10}, University of Vermont

^{11}, National Institute on Drug Abuse

^{12}, Virginia Commonwealth University

^{13}, Johns Hopkins University

^{14}, McGovern Institute for Brain Research

^{15}

16 citations

11 citations

7 citations

##### References

3,219 citations

### "A Guide for Population-based Analys..." refers background in this paper

...…of sample inclusion or differential nonresponse, some authors have suggested that the weight variable itself be included as a covariate in the model specification as further protection against sample selectivity that may not be captured in the observed covariate controls (Rubin, 1996)....

[...]

1,565 citations

1,548 citations

### "A Guide for Population-based Analys..." refers methods in this paper

...…population estimation and inference, ABCD has used a second weighting method that is closely related to the inverse propensity score weighting methodology (IPSW) that is employed to reduce confounding and estimate average treatment effects (ATE) from observational data (Austin and Stuart, 2015)....

[...]

...…attention paid to weighted estimation of multi-level models for observational data both in the context of inverse probability weighting (IPW) for exposure probability in estimates of treatment effects (Austin and Stuart, 2015) or in analysis of multi-level data (Pfeffermann, et al., 1998)....

[...]

528 citations

### "A Guide for Population-based Analys..." refers background in this paper

...Weighted estimation of multi-level models requires a disaggregation and scaling of the individual propensity or population weights for each level of the model (Rabe-Hesketh and Skrondal, 2006)....

[...]

...Presently, there is no empirical evidence from preliminary comparative analysis trials (results not reported here) that methods for multi-level weighting (Rabe-Hesketh and Skrondal, 2006) will improve the accuracy or precision of the model fit although additional research on this topic is ongoing....

[...]