scispace - formally typeset
Search or ask a question

Showing papers by "Madhav V. Marathe published in 2020"


Journal ArticleDOI
TL;DR: A coarse taxonomy of models is discussed and the context and significance of the Imperial College and other models in contributing to the analysis of COVID-19 are explored.

1,189 citations


Journal ArticleDOI
TL;DR: This article reviews some of the important mathematical models used to support the ongoing planning and response efforts in the COVID-19 pandemic and discusses their use, their mathematical form and their scope.
Abstract: COVID-19 pandemic represents an unprecedented global health crisis in the last 100 years. Its economic, social and health impact continues to grow and is likely to end up as one of the worst global disasters since the 1918 pandemic and the World Wars. Mathematical models have played an important role in the ongoing crisis; they have been used to inform public policies and have been instrumental in many of the social distancing measures that were instituted worldwide. In this article, we review some of the important mathematical models used to support the ongoing planning and response efforts. These models differ in their use, their mathematical form and their scope.

100 citations


Posted ContentDOI
02 Mar 2020-medRxiv
TL;DR: A well-established metric known as effective distance on the global air traffic data from IATA is used to quantify risk of emergence for different countries as a consequence of direct importation from China, and compared against arrival times for the first 24 countries.
Abstract: Global airline networks play a key role in the global importation of emerging infectious diseases. Detailed information on air traffic between international airports has been demonstrated to be useful in retrospectively validating and prospectively predicting case emergence in other countries. In this paper, we use a well-established metric known as effective distance on the global air traffic data from IATA to quantify risk of emergence for different countries as a consequence of direct importation from China, and compare it against arrival times for the first 24 countries. Using this model trained on official first reports from WHO, we estimate time of arrival (ToA) for all other countries. We then incorporate data on airline suspensions to recompute the effective distance and assess the effect of such cancellations in delaying the estimated arrival time for all other countries. Finally we use the infectious disease vulnerability indices to explain some of the estimated reporting delays.

36 citations


Journal ArticleDOI
TL;DR: An individual based model and national level epidemic simulations are used to estimate the medical costs of keeping the US economy open during COVID-19 pandemic under different counterfactual scenarios and show the tradeoffs between deaths, costs, infections, compliance and the duration of stay-home order.
Abstract: We use an individual based model and national level epidemic simulations to estimate the medical costs of keeping the US economy open during COVID-19 pandemic under different counterfactual scenarios. We model an unmitigated scenario and 12 mitigation scenarios which differ in compliance behavior to social distancing strategies and in the duration of the stay-home order. Under each scenario we estimate the number of people who are likely to get infected and require medical attention, hospitalization, and ventilators. Given the per capita medical cost for each of these health states, we compute the total medical costs for each scenario and show the tradeoffs between deaths, costs, infections, compliance and the duration of stay-home order. We also consider the hospital bed capacity of each Hospital Referral Region (HRR) in the US to estimate the deficit in beds each HRR will likely encounter given the demand for hospital beds. We consider a case where HRRs share hospital beds among the neighboring HRRs during a surge in demand beyond the available beds and the impact it has in controlling additional deaths.

34 citations


Journal ArticleDOI
TL;DR: Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information is presented, an epidemic forecasting framework that integrates the strengths of deep neural networks and high-resolution simulations of epidemic processes over networks and outperforms the other methods at the state and county levels in the USA.
Abstract: Influenza-like illness (ILI) places a heavy social and economic burden on our society. Traditionally, ILI surveillance data are updated weekly and provided at a spatially coarse resolution. Producing timely and reliable high-resolution spatiotemporal forecasts for ILI is crucial for local preparedness and optimal interventions. We present Theory-guided Deep Learning-based Epidemic Forecasting with Synthetic Information (TDEFSI),1 an epidemic forecasting framework that integrates the strengths of deep neural networks and high-resolution simulations of epidemic processes over networks. TDEFSI yields accurate high-resolution spatiotemporal forecasts using low-resolution time-series data.During the training phase, TDEFSI uses high-resolution simulations of epidemics that explicitly model spatial and social heterogeneity inherent in urban regions as one component of training data. We train a two-branch recurrent neural network model to take both within-season and between-season low-resolution observations as features and output high-resolution detailed forecasts. The resulting forecasts are not just driven by observed data but also capture the intricate social, demographic, and geographic attributes of specific urban regions and mathematical theories of disease propagation over networks.We focus on forecasting the incidence of ILI and evaluate TDEFSI’s performance using synthetic and real-world testing datasets at the state and county levels in the USA. The results show that, at the state level, our method achieves comparable/better performance than several state-of-the-art methods. At the county level, TDEFSI outperforms the other methods. The proposed method can be applied to other infectious diseases as well.

26 citations


Posted ContentDOI
30 Nov 2020-medRxiv
TL;DR: The findings show that a low compliance to interventions can be overcome by a longer shutdown period and vice versa to arrive at similar epidemiological impact but their net effect on economic loss depends on the interplay between the marginal gains from averting infections and deaths, versus the marginal loss from having healthy workers stay at home during the shutdown.
Abstract: This research measures the epidemiological and economic impact of COVID-19 spread in the US under different mitigation scenarios, comprising of non-pharmaceutical interventions. A detailed disease model of COVID-19 is combined with a model of the US economy to estimate the direct impact of labor supply shock to each sector arising from morbidity, mortality, and lock down, as well as the indirect impact caused by the interdependencies between sectors. During a lockdown, estimates of jobs that are workable from home in each sector are used to modify the shock to labor supply. Results show trade-offs between economic losses, and lives saved and infections averted are non-linear in compliance to social distancing and the duration of lockdown. Sectors that are worst hit are not the labor-intensive sectors such as Agriculture and Construction, but the ones with high valued jobs such as Professional Services, even after the teleworkability of jobs is accounted for. Additionally, the findings show that a low compliance to interventions can be overcome by a longer shutdown period and vice versa to arrive at similar epidemiological impact but their net effect on economic loss depends on the interplay between the marginal gains from averting infections and deaths, versus the marginal loss from having healthy workers stay at home during the shutdown.

19 citations


Posted ContentDOI
19 Jul 2020-medRxiv
TL;DR: An individual based model and national level epidemic simulations are used to estimate the medical costs of keeping the US economy open during COVID-19 pandemic under different counterfactual scenarios and show the tradeoffs between deaths, costs, infections, compliance and the duration of stay-home order.
Abstract: We use an individual based model and national level epidemic simulations to estimate the medical costs of keeping the US economy open during COVID-19 pandemic under different counterfactual scenarios. We model an unmitigated scenario and 12 mitigation scenarios which differ in compliance behavior to social distancing strategies and to the duration of the stay-home order. Under each scenario we estimate the number of people who are likely to get infected and require medical attention, hospitalization, and ventilators. Given the per capita medical cost for each of these health states, we compute the total medical costs for each scenario and show the tradeoffs between deaths, costs, infections, compliance and the duration of stay-home order. We also consider the hospital bed capacity of each Hospital Referral Region (HRR) in the US to estimate the deficit in beds each HRR will likely encounter given the demand for hospital beds. We consider a case where HRRs share hospital beds among the neighboring HRRs during a surge in demand beyond the available beds and the impact it has in controlling additional deaths.

15 citations


Journal ArticleDOI
TL;DR: The observation that increased team performance in the game, resulting in increased monetary earnings for all players, did not produce a measured increase in collective identity among them, serves as an exemplar of using abductive looping in the social sciences.
Abstract: Group or collective identity is an individual’s cognitive, moral, and emotional connection with a broader community, category, practice, or institution. There are many different contexts in which collective identity operates, and a host of application domains where collective identity is important. Collective identity is studied across myriad academic disciplines. Consequently, there is interest in understanding the collective identity formation process. In laboratory and other settings, collective identity is fostered through priming a group of human subjects. However, there have been no works in developing agent-based models for simulating collective identity formation processes. Our focus is understanding a game that is designed to produce collective identity within a group. To study this process, we build an online game platform; perform and analyze controlled laboratory experiments involving teams; build, exercise, and evaluate network-based agent-based models; and form and evaluate hypotheses about collective identity. We conduct these steps in multiple abductive iterations of experiments and modeling to improve our understanding of collective identity as this looping process unfolds. Our work serves as an exemplar of using abductive looping in the social sciences. Findings on collective identity include the observation that increased team performance in the game, resulting in increased monetary earnings for all players, did not produce a measured increase in collective identity among them.

15 citations


Posted ContentDOI
24 Aug 2020-medRxiv
TL;DR: This work is the first to model in near real-time, the interplay of human mobility, epidemic dynamics and public policies across multiple spatial resolutions and at a global scale and finds that population mixing has decreased considerably as the pandemic has progressed.
Abstract: This work quantifies mobility changes observed during the different phases of the pandemic world-wide at multiple resolutions -- county, state, country -- using an anonymized aggregate mobility map that captures population flows between geographic cells of size 5 km 2 . As we overlay the global mobility map with epidemic incidence curves and dates of government interventions, we observe that as case counts rose, mobility fell and has since then seen a slow but steady increase in flows. Further, in order to understand mixing within a region, we propose a new metric to quantify the effect of social distancing on the basis of mobility.Taking two very different countries sampled from the global spectrum, We analyze in detail the mobility patterns of the United States (US) and India. We then carry out a counterfactual analysis of delaying the lockdown and show that a one week delay would have doubled the reported number of cases in the US and India. Finally, we quantify the effect of college students returning back to school for the fall semester on COVID-19 dynamics in the surrounding community. We employ the data from a recent university outbreak (reported on August 16, 2020) to infer possible R eff values and mobility flows combined with daily prevalence data and census data to obtain an estimate of new cases that might arrive on a college campus. We find that maintaining social distancing at existing levels would be effective in mitigating the extra seeding of cases. However, potential behavioral change and increased social interaction amongst students (30% increase in R eff ) along with extra seeding can increase the number of cases by 20% over a period of one month in the encompassing county. To our knowledge, this work is the first to model in near real-time, the interplay of human mobility, epidemic dynamics and public policies across multiple spatial resolutions and at a global scale.

14 citations


Journal ArticleDOI
TL;DR: In this article, the authors survey the data landscape around COVID-19, with a focus on how such datasets have aided modeling and response through different stages so far in the pandemic.
Abstract: Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include: where did the disease start, how is it spreading, who are at risk, and how to control the spread. There are a large number of complex factors driving the spread of pandemics, and, as a result, multiple modeling techniques play an increasingly important role in shaping public policy and decision-making. As different countries and regions go through phases of the pandemic, the questions and data availability also change. Especially of interest is aligning model development and data collection to support response efforts at each stage of the pandemic. The COVID-19 pandemic has been unprecedented in terms of real-time collection and dissemination of a number of diverse datasets, ranging from disease outcomes, to mobility, behaviors, and socio-economic factors. The data sets have been critical from the perspective of disease modeling and analytics to support policymakers in real time. In this overview article, we survey the data landscape around COVID-19, with a focus on how such datasets have aided modeling and response through different stages so far in the pandemic. We also discuss some of the current challenges and the needs that will arise as we plan our way out of the pandemic.

13 citations


Posted ContentDOI
15 Dec 2020-medRxiv
TL;DR: This work proposes a recurrent message passing graph neural network that embeds spatio-temporal disease dynamics and human mobility dynamics for daily state-level new confirmed cases forecasting and shows that the spatial and temporal dynamic mobility graph leveraged by thegraph neural network enables better long-term forecasting performance compared to baselines.
Abstract: Disease dynamics, human mobility, and public policies co-evolve during a pandemic such as COVID-19. Understanding dynamic human mobility changes and spatial interaction patterns are crucial for understanding and forecasting COVID-19 dynamics. We introduce a novel graph-based neural network(GNN) to incorporate global aggregated mobility flows for a better understanding of the impact of human mobility on COVID-19 dynamics as well as better forecasting of disease dynamics. We propose a recurrent message passing graph neural network that embeds spatio-temporal disease dynamics and human mobility dynamics for daily state-level new confirmed cases forecasting. This work represents one of the early papers on the use of GNNs to forecast COVID-19 incidence dynamics and our methods are competitive to existing methods. We show that the spatial and temporal dynamic mobility graph leveraged by the graph neural network enables better long-term forecasting performance compared to baselines.

Posted Content
TL;DR: This article reviews some of the important mathematical models used to support the ongoing planning and response efforts in the COVID-19 pandemic and discusses their use, their mathematical form and their scope.
Abstract: COVID-19 pandemic represents an unprecedented global health crisis in the last 100 years. Its economic, social and health impact continues to grow and is likely to end up as one of the worst global disasters since the 1918 pandemic and the World Wars. Mathematical models have played an important role in the ongoing crisis; they have been used to inform public policies and have been instrumental in many of the social distancing measures that were instituted worldwide. In this article we review some of the important mathematical models used to support the ongoing planning and response efforts. These models differ in their use, their mathematical form and their scope.

Posted Content
TL;DR: This work designs and analyzes multiple recurrent neural network-based deep learning models and combines them using the stacking ensemble technique to incorporate the effects of multiple factors in COVID-19 spread and proposes clustering-based training for high-resolution forecasting.
Abstract: The COVID-19 pandemic represents the most significant public health disaster since the 1918 influenza pandemic. During pandemics such as COVID-19, timely and reliable spatio-temporal forecasting of epidemic dynamics is crucial. Deep learning-based time series models for forecasting have recently gained popularity and have been successfully used for epidemic forecasting. Here we focus on the design and analysis of deep learning-based models for COVID-19 forecasting. We implement multiple recurrent neural network-based deep learning models and combine them using the stacking ensemble technique. In order to incorporate the effects of multiple factors in COVID-19 spread, we consider multiple sources such as COVID-19 confirmed and death case count data and testing data for better predictions. To overcome the sparsity of training data and to address the dynamic correlation of the disease, we propose clustering-based training for high-resolution forecasting. The methods help us to identify the similar trends of certain groups of regions due to various spatio-temporal effects. We examine the proposed method for forecasting weekly COVID-19 new confirmed cases at county-, state-, and country-level. A comprehensive comparison between different time series models in COVID-19 context is conducted and analyzed. The results show that simple deep learning models can achieve comparable or better performance when compared with more complicated models. We are currently integrating our methods as a part of our weekly forecasts that we provide state and federal authorities.

Posted ContentDOI
31 Oct 2020-medRxiv
TL;DR: The utility of the COVID-19 dashboard is illustrated by describing how it can be used to support data story-telling - an important emerging area in data science.
Abstract: The COVID-19 pandemic brought to the forefront an unprecedented need for experts, as well as citizens, to visualize spatio-temporal disease surveillance data. Web application dashboards were quickly developed to fill this gap, including those built by JHU, WHO, and CDC, but all of these dashboards supported a particular niche view of the pandemic (ie, current status or specific regions). In this paper, we describe our work developing our own COVID-19 Surveillance Dashboard, available at https://nssac.bii.virginia.edu/covid-19/dashboard/, which offers a universal view of the pandemic while also allowing users to focus on the details that interest them. From the beginning, our goal was to provide a simple visual way to compare, organize, and track near-real-time surveillance data as the pandemic progresses. Our dashboard includes a number of advanced features for zooming, filtering, categorizing and visualizing multiple time series on a single canvas. In developing this dashboard, we have also identified 6 key metrics we call the 6Cs standard which we propose as a standard for the design and evaluation of real-time epidemic science dashboards. Our dashboard was one of the first released to the public, and remains one of the most visited and highly used. Our group uses it to support federal, state and local public health authorities, and it is used by people worldwide to track the pandemic evolution, build their own dashboards, and support their organizations as they plan their responses to the pandemic. We illustrate the utility of our dashboard by describing how it can be used to support data story-telling - an important emerging area in data science.

Proceedings ArticleDOI
10 Dec 2020
TL;DR: The COVID-19 Surveillance Dashboard as mentioned in this paper provides a unique view of the pandemic while also allowing users to focus on the details that interest them, such as current status or specific regions.
Abstract: The COVID-19 pandemic brought to the forefront an unprecedented need for experts, as well as citizens, to visualize spatio-temporal disease surveillance data. Web application dashboards were quickly developed to fill t his g ap, b ut a ll of these dashboards supported a particular niche view of the pandemic (ie, current status or specific r egions). I n t his paper, we describe our work developing our COVID-19 Surveillance Dashboard, which offers a unique view of the pandemic while also allowing users to focus on the details that interest them. From the beginning, our goal was to provide a simple visual tool for comparing, organizing, and tracking near-real-time surveillance data as the pandemic progresses. In developing this dashboard, we also identified 6 key metrics which we propose as a standard for the design and evaluation of real-time epidemic science dashboards. Our dashboard was one of the first released to the public, and continues to be actively visited. Our own group uses it to support federal, state and local public health authorities, and it is used by individuals worldwide to track the evolution of the COVID-19 pandemic, build their own dashboards, and support their organizations as they plan their responses to the pandemic.

Posted Content
TL;DR: Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information (TDEFSI) as discussed by the authors is an epidemic forecasting framework that integrates the strengths of deep neural networks and high-resolution simulations of epidemic processes over networks.
Abstract: Influenza-like illness (ILI) places a heavy social and economic burden on our society. Traditionally, ILI surveillance data is updated weekly and provided at a spatially coarse resolution. Producing timely and reliable high-resolution spatiotemporal forecasts for ILI is crucial for local preparedness and optimal interventions. We present TDEFSI (Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information), an epidemic forecasting framework that integrates the strengths of deep neural networks and high-resolution simulations of epidemic processes over networks. TDEFSI yields accurate high-resolution spatiotemporal forecasts using low-resolution time series data. During the training phase, TDEFSI uses high-resolution simulations of epidemics that explicitly model spatial and social heterogeneity inherent in urban regions as one component of training data. We train a two-branch recurrent neural network model to take both within-season and between-season low-resolution observations as features, and output high-resolution detailed forecasts. The resulting forecasts are not just driven by observed data but also capture the intricate social, demographic and geographic attributes of specific urban regions and mathematical theories of disease propagation over networks. We focus on forecasting the incidence of ILI and evaluate TDEFSI's performance using synthetic and real-world testing datasets at the state and county levels in the USA. The results show that, at the state level, our method achieves comparable/better performance than several state-of-the-art methods. At the county level, TDEFSI outperforms the other methods. The proposed method can be applied to other infectious diseases as well.

Journal ArticleDOI
TL;DR: Analysis indicates that regional trade plays an important role in the spread of T. absoluta, a devastating pest of tomato in Nepal, and a robust network-based approach is proposed to model seasonal flow of agricultural produce and examine its role in pest spread.

Posted ContentDOI
23 Nov 2020-medRxiv
TL;DR: This work focuses on 50 land-grant university counties across the country and shows high correlation between proximity statistics and COVID-19 case rates for several LGUCs during the period around Fall 2020 reopenings, and shows how features such as total population, population affiliated with university, median income and case rate intensity could explain some of the observed high correlation.
Abstract: Reopening of colleges and universities for the Fall semester of 2020 across the United States has caused significant COVID-19 case spikes, requiring reactive responses such as temporary closures and switching to online learning. Until sufficient levels of immunity are reached through vaccination, Institutions of Higher Education will need to balance academic operations with COVID-19 spread risk within and outside the student community. In this work, we study the impact of proximity statistics obtained from high resolution mobility traces in predicting case rate surges in university counties. We focus on 50 land-grant university counties (LGUCs) across the country and show high correlation (PCC > 0.6) between proximity statistics and COVID-19 case rates for several LGUCs during the period around Fall 2020 reopenings. These observations provide a lead time of up to ∼3 weeks in preparing resources and planning containment efforts. We also show how features such as total population, population affiliated with university, median income and case rate intensity could explain some of the observed high correlation. We believe these easily explainable mobility metrics along with other disease surveillance indicators can help universities be better prepared for the Spring 2021 semester.

Proceedings ArticleDOI
10 Dec 2020
TL;DR: In this article, a deep learning-based time series model was used for epidemic forecasting in the COVID-19 pandemic, where multiple recurrent neural network-based deep learning models were implemented and combined using the stacking ensemble technique.
Abstract: The COVID-19 pandemic represents the most significant public health disaster since the 1918 influenza pandemic. During pandemics such as COVID-19, timely and reliable spatio-temporal forecasting of epidemic dynamics is crucial. Deep learning-based time series models for forecasting have recently gained popularity and have been successfully used for epidemic forecasting. Here we focus on the design and analysis of deep learning-based models for COVID-19 forecasting. We implement multiple recurrent neural network-based deep learning models and combine them using the stacking ensemble technique. In order to incorporate the effects of multiple factors in COVID-19 spread, we consider multiple sources such as COVID-19 confirmed and death case count data and testing data for better predictions. To overcome the sparsity of training data and to address the dynamic correlation of the disease, we propose clustering-based training for high-resolution forecasting. The methods help us to identify the similar trends of certain groups of regions due to various spatio-temporal effects. We examine the proposed method for forecasting weekly COVID-19 new confirmed cases at county-, state-, and country-level. A comprehensive comparison between different time series models in COVID-19 context is conducted and analyzed. The results show that simple deep learning models can achieve comparable or better performance when compared with more complicated models. We are currently integrating our methods as a part of our weekly forecasts that we provide state and federal authorities.

Posted Content
TL;DR: The data landscape around COVID-19 is surveyed, with a focus on how a number of diverse datasets have aided modeling and response through different stages so far in the pandemic.
Abstract: Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include: where did the disease start, how is it spreading, who is at risk, and how to control the spread. There are a large number of complex factors driving the spread of pandemics, and, as a result, multiple modeling techniques play an increasingly important role in shaping public policy and decision making. As different countries and regions go through phases of the pandemic, the questions and data availability also changes. Especially of interest is aligning model development and data collection to support response efforts at each stage of the pandemic. The COVID-19 pandemic has been unprecedented in terms of real-time collection and dissemination of a number of diverse datasets, ranging from disease outcomes, to mobility, behaviors, and socio-economic factors. The data sets have been critical from the perspective of disease modeling and analytics to support policymakers in real-time. In this overview article, we survey the data landscape around COVID-19, with a focus on how such datasets have aided modeling and response through different stages so far in the pandemic. We also discuss some of the current challenges and the needs that will arise as we plan our way out of the pandemic.

Proceedings ArticleDOI
10 Dec 2020
TL;DR: In this paper, the authors proposed a methodology to generate realistic synthetic power distribution networks for a given geographical region, which is not the actual distribution system, but its functionality is very similar to the real distribution network.
Abstract: It is well known that physical interdependencies exist between networked civil infrastructures such as transportation and power system networks. In order to analyze complex nonlinear correlations between such networks, datasets pertaining to such real infrastructures are required. However, such data are not readily available due to their proprietary nature. This work proposes a methodology to generate realistic synthetic power distribution networks for a given geographical region. A network generated in this manner is not the actual distribution system, but its functionality is very similar to the real distribution network. The synthetic network connects high voltage substations to individual residential consumers through primary and secondary distribution networks. Here, the distribution network is generated by solving an optimization problem which minimizes the overall length of the network subject to structural and power flow constraints. This work also incorporates identification of long high voltage feeders originating from substations and connecting remotely situated customers in rural geographic locations while maintaining voltage regulation within acceptable limits. The proposed methodology is applied to the state of Virginia and creates synthetic distribution networks which are validated by comparing them to actual power distribution networks at the same location.

Journal ArticleDOI
24 Nov 2020-PLOS ONE
TL;DR: This work describes the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses.
Abstract: There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the proposed pipeline framework is elaborated by several case studies of networked social science experiments.

Posted ContentDOI
05 Oct 2020-medRxiv
TL;DR: The Mobility-Augmented SEIR model (mbox{MA-SEIR}) is developed that leverages Google's aggregate and anonymized mobility data to augment classic compartmental models and provides insight into the role of near real-time aggregate mobility data in disease spread modeling by quantifying substantial changes in how populations move both locally and globally.
Abstract: Timely interventions and early preparedness of healthcare resources are crucial measures to tackle the COVID-19 disease. To aid these efforts, we developed the Mobility-Augmented SEIR model (MA-SEIR) that leverages Google’s aggregate and anonymized mobility data to augment classic compartmental models. We show in a retrospective analysis how this method can be applied at an early stage in the COVID-19 epidemic to forecast its subsequent spread and onset in different geographic regions, with minimal parameterization of the model. This provides insight into the role of near real-time aggregate mobility data in disease spread modeling by quantifying substantial changes in how populations move both locally and globally. These changes would be otherwise very hard to capture using less timely data.

Journal ArticleDOI
31 May 2020
TL;DR: This article presents message passing interface-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model that are experimentally verified to scale very well to a large number of processing elements (PEs), providing near-linear speedups.
Abstract: Recently, there has been substantial interest in the study of various random networks as mathematical models of complex systems. As real-life complex systems grow larger, the ability to generate progressively large random networks becomes all the more important. This motivates the need for efficient parallel algorithms for generating such networks. Naive parallelization of sequential algorithms for generating random networks is inefficient due to inherent dependencies among the edges and the possibility of creating duplicate (parallel) edges. In this article, we present message passing interface-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model. Our algorithms are experimentally verified to scale very well to a large number of processing elements (PEs), providing near-linear speedups. The algorithms have been exercised with regard to scale and speed to generate scale-free networks with one trillion edges in 6 minutes using 1,000 PEs.

Posted Content
TL;DR: It is proved that consistency and correct-consistency of an ensemble learner is not less than the average consistency andCorrect-consistsency of individual learners and corrects can be improved with a probability by combining learners with accuracy not lessthan the average accuracy of ensemble component learners.
Abstract: Deep learning classifiers are assisting humans in making decisions and hence the user's trust in these models is of paramount importance. Trust is often a function of constant behavior. From an AI model perspective it means given the same input the user would expect the same output, especially for correct outputs, or in other words consistently correct outputs. This paper studies a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input. We formally define consistency and correct-consistency of a learning model. We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners and correct-consistency can be improved with a probability by combining learners with accuracy not less than the average accuracy of ensemble component learners. To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value.

Posted ContentDOI
24 Dec 2020-medRxiv
TL;DR: In this article, the authors apply federated learning methods to clinical and epidemiological research across a spectrum of units of federated and model architectures, and show that federated models can achieve the same level of accuracy, precision, and generalizability as standard centralized statistical models whilst achieving significantly stronger privacy protections.
Abstract: Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show on a diverse set of health studies that federated models can achieve the same level of accuracy, precision, and generalizability, and result in the same interpretation as standard centralized statistical models whilst achieving significantly stronger privacy protections. This work is the first to apply modern and general federated learning methods to clinical and epidemiological research -- across a spectrum of units of federation and model architectures. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science -- aspects that used to be at odds with each other.

Journal ArticleDOI
03 Apr 2020
TL;DR: Using a discrete dynamical system model for a networked social system, this work establishes bounds on the number of queries needed to learn the local functions under both active query and PAC learning models.
Abstract: Using a discrete dynamical system model for a networked social system, we consider the problem of learning a class of local interaction functions in such networks. Our focus is on learning local functions which are based on pairwise disjoint coalitions formed from the neighborhood of each node. Our work considers both active query and PAC learning models. We establish bounds on the number of queries needed to learn the local functions under both models. We also establish a complexity result regarding efficient consistent learners for such functions. Our experimental results on synthetic and real social networks demonstrate how the number of queries depends on the structure of the underlying network and number of coalitions.

Proceedings Article
01 Jan 2020
TL;DR: In this article, the authors study a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input.
Abstract: Deep learning classifiers are assisting humans in making decisions and hence the user's trust in these models is of paramount importance. Trust is often a function of constant behavior. From an AI model perspective it means given the same input the user would expect the same output, especially for correct outputs, or in other words consistently correct outputs. This paper studies a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input. We formally define consistency and correct-consistency of a learning model. We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners and correct-consistency can be improved with a probability by combining learners with accuracy not less than the average accuracy of ensemble component learners. To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value.

Proceedings ArticleDOI
15 Jun 2020
TL;DR: This talk will describe the group's work developing scalable and pervasive computing-based concepts, theories and tools for planning, forecasting and response in the event of epidemics and outline directions for future work.
Abstract: The COVID-19 pandemic represents an unprecedented global crisis and serves as a reminder of the social, economic and health burden of infectious diseases. The ongoing trends towards urbanization, global travel, climate change and a generally older and immuno-compromised population continue to make epidemic planning and control challenging. Recent advances in computing, AI, and bigdata have created new opportunities for realizing the vision of real-time epidemic science. In this talk I will describe our group's work developing scalable and pervasive computing-based concepts, theories and tools for planning, forecasting and response in the event of epidemics. I will draw on our work in supporting federal agencies as they plan and respond to the COVID-19 pandemic outbreak. I will end the talk by outlining directions for future work.

Proceedings ArticleDOI
10 Dec 2020
TL;DR: In this article, QueST, an agent-based discrete event queuing network simulation system, and STEERS, an iterative routing algorithm that uses QueST for designing and evaluating large scale evacuation plans in terms of total egress time and congestion/bottlenecks occurring during evacuation.
Abstract: Evacuation planning methods aim to design routes and schedules to relocate people to safety in the event of natural or man-made disasters. The primary goal is to minimize casualties which often requires the evacuation process to be completed as soon as possible. In this paper, we present QueST, an agent-based discrete event queuing network simulation system, and STEERS, an iterative routing algorithm that uses QueST for designing and evaluating large scale evacuation plans in terms of total egress time and congestion/bottlenecks occurring during evacuation. We use the Houston Metropolitan Area, which consists of nine US counties and spans an area of 9,444 square miles as a case study, and compare the performance of STEERS with two existing route planning methods. We find that STEERS is either better or comparable to these methods in terms of total evacuation time and congestion faced by the evacuees. We also analyze the large volume of data generated by the simulation process to gain insights about the scenarios arising from following the evacuation routes prescribed by these methods.