Showing papers by "George E. Dahl published in 2022"

PDF

Open Access

Journal Article•DOI•

Adaptive Gradient Methods at the Edge of Stability

[...]

Jeremy M. Cohen, Behrooz Ghorbani, S. Kartik Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Enrique Fabrega Cardoze, Zachary Nado, George E. Dahl, Justin Gilmer - Show less +7 more

29 Jul 2022-arXiv.org

TL;DR: Light is shed on the behavior of adaptive gradient methods in the full-batch and sufﬁciently large batch settings and empirically demonstrates that during full- batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value — the stability threshold of a gradient descent algorithm.

...read moreread less

Abstract: Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value -- the stability threshold of a gradient descent algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the ``Adaptive Edge of Stability'' (AEoS), their behavior in this regime differs in a significant way from that of non-adaptive methods at the EoS. Whereas non-adaptive algorithms at the EoS are blocked from entering high-curvature regions of the loss landscape, adaptive gradient methods at the AEoS can keep advancing into high-curvature regions, while adapting the preconditioner to compensate. Our findings can serve as a foundation for the community's future understanding of adaptive gradient methods in deep learning.

...read moreread less

16 citations

Proceedings Article•

A Loss Curvature Perspective on Training Instabilities of Deep Learning Models

[...]

Justin Gilmer, Behrooz Ghorbani, A. Kumar Garg, Sneha Kudugunta, Behnam Neyshabur, David Enrique Fabrega Cardoze, George E. Dahl, Zachary Nado, Orhan Firat - Show less +5 more

10 citations

Journal Article•DOI•

Pre-training helps Bayesian optimization too

[...]

Zichen Wang, George E. Dahl, Kevin Jordan Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, J. Snoek, Zoubin Ghahramani - Show less +5 more

07 Jul 2022-arXiv.org

TL;DR: In this paper , a Gaussian process prior is used for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend and it is not an easy task to select a prior.

...read moreread less

Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.

...read moreread less

5 citations

Journal Article•DOI•

A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment

[...]

11 Oct 2022-Communications medicine

TL;DR: In this article , the authors investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings and developed artificial intelligence (AI) models that used blind sweeps to predict gestational age and fetal malpresentation.

...read moreread less

Abstract: Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings.Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones.Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference -1.4 ± 4.5 days, 95% CI -1.8, -0.9, n = 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00, n = 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep.The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.

...read moreread less

4 citations

Journal Article•DOI•

Leave Graphs Alone: Addressing Over-Squashing without Rewiring

[...]

Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Teku Vaswani, Nicolas Heess, Daniel Pieter Wierstra, Robert F. Garnett, Danai Koutra - Show less +8 more

13 Dec 2022-arXiv.org

TL;DR: In this article , a graph echo state network (GESN) is proposed to solve the problem of over-squashing in message-passing graph neural networks, where node embeddings are recursively computed by an untrained message passing function.

...read moreread less

Abstract: Recent works have investigated the role of graph bottlenecks in preventing long-range information propagation in message-passing graph neural networks, causing the so-called ‘over-squashing’ phenomenon. As a remedy, graph rewiring mechanisms have been proposed as preprocessing steps. Graph Echo State Networks (GESNs) are a reservoir computing model for graphs, where node embeddings are recursively computed by an untrained message-passing function. In this paper, we show that GESNs can achieve a signiﬁcantly better accuracy on six heterophilic node classiﬁcation tasks without altering the graph connectivity, thus suggesting a different route for addressing the over-squashing problem

...read moreread less

1 citations

A l oss c urvature p erspective on t raining i n stability in d eep l earning

[...]

Behrooz Ghorbani, A. Kumar Garg, Sneha Kudugunta, Behnam Neyshabur, David Enrique Fabrega Cardoze, George E. Dahl, Zachary Nado, Orhan Firat - Show less +4 more

TL;DR: In this paper , the authors study the effect of the curvature of the loss Hessian on the training dynamics and demonstrate that successful model and hyperparameter choices allow the early optimization trajectory to either avoid regions of high curvature and into regions that tolerate a higher learning rate.

...read moreread less

Abstract: In this work, we study the evolution of the loss Hessian across many classiﬁcation tasks in order to understand the effect the curvature of the loss has on the training dynamics. Whereas prior work has focused on how different learning rates affect the loss Hessian observed during training, we also analyze the effects of model initialization, architectural choices, and common training heuristics such as gradient clipping and learning rate warmup. Our results demonstrate that successful model and hyperparameter choices allow the early optimization trajectory to either avoid— or navigate out of—regions of high curvature and into ﬂatter regions that tolerate a higher learning rate. Our results suggest a unifying perspective on how disparate mitigation strategies for training instability ultimately address the same underlying failure mode of neural network optimization, namely poor conditioning. Inspired by the conditioning perspective, we show that learning rate warmup can improve training stability just as much as batch normalization, layer normalization, MetaInit, GradInit, and Fixup initialization.

...read moreread less

Bad priors, their threats to Bayesian optimization and some remedies via prior learning

[...]

Zichen Wang, George E. Dahl, Kevin Jordan Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, J. Snoek, Zoubin Ghahramani - Show less +5 more

TL;DR: This work focuses on the common but potentially costly task of tuning optimizer parameters for training neural networks, and develops practical improvements that boost its performance by leveraging tuning results on multiple tasks without requiring observations for the same meta-parameter points across all tasks.

...read moreread less

Abstract: The performance of deep neural networks can be highly sensitive to the choice of a variety of meta-parameters, such as optimizer parameters and model hyperparameters. Tuning these well, however, often requires extensive and costly experimentation. Bayesian optimization (BO) is a principled approach to solve such expensive hyperparameter tuning problems efficiently. Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized. In this work, we consider the scenario where we have data from similar functions that allows us to specify a tighter distribution a priori. Specifically, we focus on the common but potentially costly task of tuning optimizer parameters for training neural networks. Building on the meta BO method from Wang et al. (2018b), we develop practical improvements that (a) boost its performance by leveraging tuning results on multiple tasks without requiring observations for the same meta-parameter points across all tasks, and (b) retain its regret bound for a special case of our method. As a result, we provide a coherent BO solution for iterative optimization of continuous optimizer parameters. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.

...read moreread less

Journal Article•DOI•

AI system for fetal ultrasound in low-resource settings

[...]

18 Mar 2022-arXiv.org

TL;DR: An artificial intelligence (AI) system that uses novice-acquired “blind sweep” ultrasound videos to estimate gestational age (GA) and fetal malpresentation and demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration is developed and validated.

...read moreread less

Abstract: Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired “blind sweep” ultrasound videos to estimate gestational age (GA) and fetal malpresentation. We further addressed obstacles that may be encountered in low-resourced settings. Using a simplified sweep protocol with real-time AI feedback on sweep quality, we have demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration. The GA model was non-inferior to standard fetal biometry estimates with as few as two sweeps, and the fetal malpresentation model had high AUC-ROCs across operators and devices. Our AI models have the potential to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings. Introduction Despite considerable progress in maternal healthcare in recent decades, maternal and perinatal deaths remain high with 295,000 maternal deaths during and following pregnancy and 2.4 million neonatal deaths each year. The majority of these deaths occur in low-to-middle-income countries (LMICs).1–3 The lack of antenatal care and limited access to facilities that can provide lifesaving treatment for the mother, fetus and newborn contribute to inequities in quality of care and outcomes in these regions.4,5 Obstetric ultrasound is an important component of quality antenatal care. The WHO recommends one routine early ultrasound scan for all pregnant women, but up to 50% of women in developing countries receive no ultrasound screening during pregnancy.6 Fetal ultrasounds can be used to estimate gestational age (GA), which is critical in scheduling and planning for screening tests throughout pregnancy and interventions for pregnancy complications such as preeclampsia and preterm labor. Fetal ultrasounds later in pregnancy can also be used to diagnose fetal malpresentation, which affects up to 3-4% of pregnancies at term and is associated with trauma-related injury during birth, perinatal mortality, and maternal morbidity.7–11 Though ultrasound devices have traditionally been costly, the recent commercial availability of low-cost, battery powered handheld devices could greatly expand access.12,13,14 However, current ultrasound training programs require months of supervised evaluation as well as indefinite continuing education visits for quality assurance.13–18 To address these barriers, prior studies have introduced a protocol where fetal ultrasounds can be acquired by minimally trained operators via a “blind sweep” protocol, consisting of 6 predefined freehand sweeps over the abdomen.19–23 In this study, we used two prospectively collected fetal ultrasound datasets to estimate gestational age and fetal malpresentation while demonstrating key considerations for use by novice users in LMICs: a) validating that it is possible to build blind sweep GA and fetal malpresentation models that run in real-time on mobile devices; b) evaluating generalization of these models to minimally trained ultrasound operators and low cost ultrasound devices; c) describing a modified 2-sweep blind sweep protocol to simplify novice acquisition; d) adding feedback scores to provide real-time information on sweep quality. Blind sweep procedure Blind sweep ultrasounds consisted of a fixed number of predefined freehand ultrasound sweeps over the gravid abdomen. Certified sonographers completed up to 15 sweeps. Novice operators (“novices”), with 8 hours of blind sweep ultrasound acquisition training, completed 6 sweeps. Evaluation of both sonographers and novices was limited to a set of 6 sweeps 3 vertical and 3 horizontal sweeps (Figure 1B). Fetal Age Machine Learning Initiative (FAMLI) and Novice User Study Datasets Data was analyzed from the Fetal Age Machine Learning Initiative cohort, which collected ultrasound data from study sites at Chapel Hill, NC (USA) and the Novice User Study collected from Lusaka, Zambia (Figure 1A).24 The goal of this prospectively collected dataset was to empower development of technology to estimate gestational age.25 Data collection occurred between September 2018 and June 2021. All study participants provided written informed consent, and the research was approved by the UNC institutional review board and the biomedical research ethics committee at the University of Zambia. Studies also included standard clinical assessments of GA and fetal malpresentation performed by a trained sonographer.26 Blind sweep data were collected with standard ultrasound devices (SonoSite M-Turbo or GE Voluson) as well as a low cost portable ultrasound device (ButterflyIQ). Evaluation was performed on the FAMLI (sonographer-acquired) and Novice User Study (novice-acquired) datasets. Test sets consisted of patients independent of those used for AI development (Figure 1A). For our GA model evaluation, the primary FAMLI test set comprised 407 women in 657 study visits in the USA. A second test set, “Novice User Study” included 114 participants in 140 study visits in Zambia. Novice blind sweep studies were exclusively performed at Zambian sites. Sweeps collected with standard ultrasound devices were available for 406 of 407 participants in the sonographer-acquired test set, and 112 of 114 participants in the novice-acquired test set. Sweeps collected with the low cost device were available for 104 of 407 participants in the sonographer-acquired test set, and 56 of 114 participants in the novice-acquired test set. Analyzable data from the low cost device became available later during the study, and this group of patients is representative of the full patient set. We randomly selected one study visit per patient for each analysis group to avoid combining correlated measurements from the same patient. For our fetal malpresentation model, the test set included 613 patients from the sonographer-acquired and novice-acquired datasets, resulting in 65 instances of non-cephalic presentation (10.6%). For each patient, the last study visit of the third trimester was included. Of note, there are more patients in the malpresentation model test set since the ground truth is not dependent on a prior visit. The disposition of study participants are summarized in STARD diagrams (Extended Data Figure 1) and Extended Data Table 1. Mobile-device-optimized AI gestational age and fetal malpresentation estimation We calculated the mean difference in absolute error between the GA model estimate and estimated gestational age as determined by standard fetal biometry measurements using imaging from traditional ultrasound devices operated by sonographers.26 The reference ground truth GA was established as described above (Figure 1A). When conducting pairwise statistical comparisons between blind sweep and standard fetal biometry absolute errors, we established an a priori criterion for non-inferiority which was confirmed if the blind sweep mean absolute error (MAE) was less than 1.0 day greater than the standard fetal biometry’s MAE. Statistical estimates and comparisons were computed after randomly selecting one study visit per patient for each analysis group, to avoid combining correlated measurements from the same patient. We conducted a supplemental analysis of GA model prediction error with mixed effects regression on all test data, combining sonographer-acquired and novice-acquired test sets. Fixed effect terms accounted for the ground truth GA, the type of ultrasound machine used (standard vs. low cost), and the training level of the ultrasound operator (sonographer vs. novice). All patient studies were included in the analysis, and random effects terms accounted for intra-patient and intra-study effects. GA analysis results are summarized in Table 1. The MAE for the GA model estimate with blind sweeps collected by sonographers using standard ultrasound devices was significantly lower than the MAE for the standard fetal biometry estimates (mean difference -1.4 ± 4.5 days, 95% CI -1.8, -0.9 days). There was a trend towards increasing error for bind sweep and standard fetal biometry procedures with gestational week (Figure 2, top left). The accuracy of the fetal malpresentation model for predicting non-cephalic fetal presentation from third trimester blind sweeps was assessed using a reference standard determined by sonographers equipped with traditional ultrasound imagery (described above). We selected the latest study visit in the third trimester for each patient. Data from sweeps performed by the sonographers and novices were analyzed separately. We evaluated the fetal malpresentation model’s area under the receiver operating curve (AUC-ROC) on the test set in addition to non-cephalic sensitivity and specificity. The fetal malpresentation model attained an AUC-ROC of 0.977 (95% CI 0.949, 1.00), sensitivity of 0.938 (95% CI 0.848, 0.983), and specificity of 0.973 (95% CI 0.955, 0.985) (Table 2 and Figure 3). Generalization of GA and malpresentation estimation to novices Our models were trained on up to 15 blind sweeps per study performed by sonographers. No novice-acquired blind sweeps were used to train our models. We assessed GA model generalization to blind sweeps performed by novice operators that performed 6 sweeps. We compared the MAE between novice-performed blind sweep AI estimates and the standard fetal biometry. For the malpresentation model, we reported the AUC-ROC for blind sweeps performed by novices, along with the sensitivity and specificity at the same operating point used for evaluating blind sweeps performed by sonographers. In this novice-acquired dataset, the difference in MAE between blind sweep AI estimates and the standard fetal bio

...read moreread less