scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Applications in 2020"


Journal ArticleDOI
TL;DR: In this article, the authors identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al. and provide solutions with implications for the broader field, including the broader cancer screening.
Abstract: In their study, McKinney et al. showed the high potential of artificial intelligence for breast cancer screening. However, the lack of detailed methods and computer code undermines its scientific value. We identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al and provide solutions with implications for the broader field.

166 citations


Repository
Fotios Petropoulos, Daniele Apiletti1, Vassilios Assimakopoulos2, Mohamed Zied Babai3, Devon K. Barrow4, Souhaib Ben Taieb5, Christoph Bergmeir6, Ricardo J. Bessa, Jakub Bijak7, John E. Boylan8, Jethro Browell9, Claudio Carnevale10, Jennifer L. Castle11, Pasquale Cirillo12, Michael P. Clements13, Clara Cordeiro14, Clara Cordeiro15, Fernando Luiz Cyrino Oliveira16, Shari De Baets17, Alexander Dokumentov, Joanne Ellison7, Piotr Fiszeder18, Philip Hans Franses19, David T. Frazier6, Michael Gilliland20, M. Sinan Gönül, Paul Goodwin21, Luigi Grossi22, Yael Grushka-Cockayne23, Mariangela Guidolin22, Massimo Guidolin24, Ulrich Gunter25, Xiaojia Guo26, Renato Guseo22, Nigel Harvey27, David F. Hendry11, Ross Hollyman21, Tim Januschowski28, Jooyoung Jeon29, Victor Richmond R. Jose30, Yanfei Kang31, Anne B. Koehler32, Stephan Kolassa8, Nikolaos Kourentzes33, Nikolaos Kourentzes8, Sonia Leva, Feng Li34, Konstantia Litsiou35, Spyros Makridakis36, Gael M. Martin6, Andrew B. Martinez37, Andrew B. Martinez38, Sheik Meeran, Theodore Modis, Konstantinos Nikolopoulos39, Dilek Önkal, Alessia Paccagnini40, Alessia Paccagnini41, Anastasios Panagiotelis42, Ioannis P. Panapakidis43, Jose M. Pavía44, Manuela Pedio24, Manuela Pedio45, Diego J. Pedregal46, Pierre Pinson47, Patrícia Ramos48, David E. Rapach49, J. James Reade13, Bahman Rostami-Tabar50, Michał Rubaszek51, Georgios Sermpinis9, Han Lin Shang52, Evangelos Spiliotis2, Aris A. Syntetos50, Priyanga Dilini Talagala53, Thiyanga S. Talagala54, Len Tashman55, Dimitrios D. Thomakos56, Thordis L. Thorarinsdottir57, Ezio Todini58, Juan Ramón Trapero Arenas46, Xiaoqian Wang31, Robert L. Winkler59, Alisa Yusupova8, Florian Ziel60 
Polytechnic University of Turin1, National Technical University of Athens2, KEDGE Business School3, University of Birmingham4, University of Mons5, Monash University6, University of Southampton7, Lancaster University8, University of Glasgow9, University of Brescia10, University of Oxford11, Zürcher Fachhochschule12, University of Reading13, University of the Algarve14, University of Lisbon15, Pontifical Catholic University of Rio de Janeiro16, Ghent University17, Nicolaus Copernicus University in Toruń18, Erasmus University Rotterdam19, SAS Institute20, University of Bath21, University of Padua22, University of Virginia23, Bocconi University24, MODUL University Vienna25, University of Maryland, College Park26, University College London27, Amazon.com28, KAIST29, Georgetown University30, Beihang University31, Miami University32, University of Skövde33, Central University of Finance and Economics34, Manchester Metropolitan University35, University of Nicosia36, George Washington University37, United States Department of the Treasury38, Durham University39, Australian National University40, University College Dublin41, University of Sydney42, University of Thessaly43, University of Valencia44, University of Bristol45, University of Castilla–La Mancha46, Technical University of Denmark47, Polytechnic Institute of Porto48, Saint Louis University49, Cardiff University50, Warsaw School of Economics51, Macquarie University52, University of Moratuwa53, University of Sri Jayewardenepura54, International Institute of Minnesota55, National and Kapodistrian University of Athens56, Norwegian Computing Center57, University of Bologna58, Duke University59, University of Duisburg-Essen60
TL;DR: A non-systematic review of the theory and the practice of forecasting, offering a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts.
Abstract: Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

163 citations


Posted Content
TL;DR: A semi-mechanistic Bayesian hierarchical model is extended that infers the impact of these interventions and estimates the number of infections over time in Europe following the emergence of a novel coronavirus and its spread outside of China.
Abstract: Following the emergence of a novel coronavirus (SARS-CoV-2) and its spread outside of China, Europe has experienced large epidemics. In response, many European countries have implemented unprecedented non-pharmaceutical interventions including case isolation, the closure of schools and universities, banning of mass gatherings and/or public events, and most recently, wide-scale social distancing including local and national lockdowns. In this technical update, we extend a semi-mechanistic Bayesian hierarchical model that infers the impact of these interventions and estimates the number of infections over time. Our methods assume that changes in the reproductive number - a measure of transmission - are an immediate response to these interventions being implemented rather than broader gradual changes in behaviour. Our model estimates these changes by calculating backwards from temporal data on observed to estimate the number of infections and rate of transmission that occurred several weeks prior, allowing for a probabilistic time lag between infection and death. In this update we extend our original model [Flaxman, Mishra, Gandy et al 2020, Report #13, Imperial College London] to include (a) population saturation effects, (b) prior uncertainty on the infection fatality ratio, (c) a more balanced prior on intervention effects and (d) partial pooling of the lockdown intervention covariate. We also (e) included another 3 countries (Greece, the Netherlands and Portugal). The model code is available at this https URL We are now reporting the results of our updated model online at this https URL We estimated parameters jointly for all M=14 countries in a single hierarchical model. Inference is performed in the probabilistic programming language Stan using an adaptive Hamiltonian Monte Carlo (HMC) sampler.

118 citations


Journal ArticleDOI
TL;DR: A methodology that embraces these three virtues of data mining from a small dataset is proposed, which aims at fine-tuning the parameters of an individual forecastingmodel for the highest possible accuracy.
Abstract: Epidemic is a rapid and wide spread of infectious disease threatening many lives and economy damages. It is important to fore-tell the epidemic lifetime so to decide on timely and remedic actions. These measures include closing borders, schools, suspending community services and commuters. Resuming such curfews depends on the momentum of the outbreak and its rate of decay. Being able to accurately forecast the fate of an epidemic is an extremely important but difficult task. Due to limited knowledge of the novel disease, the high uncertainty involved and the complex societal-political factors that influence the widespread of the new virus, any forecast is anything but reliable. Another factor is the insufficient amount of available data. Data samples are often scarce when an epidemic just started. With only few training samples on hand, finding a forecasting model which offers forecast at the best efforts is a big challenge in machine learning. In the past, three popular methods have been proposed, they include 1) augmenting the existing little data, 2) using a panel selection to pick the best forecasting model from several models, and 3) fine-tuning the parameters of an individual forecastingmodel for the highest possible accuracy. In this paper, a methodology that embraces these three virtues of data mining from a small dataset is proposed...

102 citations


Posted ContentDOI
TL;DR: In this article, the authors estimate the effect of stay-at-home orders using a difference-in-differences design that accounts for local variation in factors like health systems and demographics and for unmeasured temporal variation in national mitigation actions and access to tests.
Abstract: Governments issue "stay at home" orders to reduce the spread of contagious diseases, but the magnitude of such orders' effectiveness is uncertain. In the United States these orders were not coordinated at the national level during the coronavirus disease 2019 (COVID-19) pandemic, which creates an opportunity to use spatial and temporal variation to measure the policies' effect with greater accuracy. Here, we combine data on the timing of stay-at-home orders with daily confirmed COVID-19 cases and fatalities at the county level in the United States. We estimate the effect of stay-at-home orders using a difference-in-differences design that accounts for unmeasured local variation in factors like health systems and demographics and for unmeasured temporal variation in factors like national mitigation actions and access to tests. Compared to counties that did not implement stay-at-home orders, the results show that the orders are associated with a 30.2 percent (11.0 to 45.2) reduction in weekly cases after one week, a 40.0 percent (23.4 to 53.0) reduction after two weeks, and a 48.6 percent (31.1 to 61.7) reduction after three weeks. Stay-at-home orders are also associated with a 59.8 percent (18.3 to 80.2) reduction in weekly fatalities after three weeks. These results suggest that stay-at-home orders reduced confirmed cases by 390,000 (170,000 to 680,000) and fatalities by 41,000 (27,000 to 59,000) within the first three weeks in localities where they were implemented.

87 citations


Journal ArticleDOI
TL;DR: In this paper, the causal effects of the confounding factors on COVID-19 counts in the contiguous US were explored using various relevant approaches, including local and global spatial regression models and machine learning.
Abstract: Since December 2019, the world has been witnessing the gigantic effect of an unprecedented global pandemic called Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) - COVID-19. So far, 38,619,674 confirmed cases and 1,093,522 confirmed deaths due to COVID-19 have been reported. In the United States (US), the cases and deaths are recorded as 7,833,851 and 215,199. Several timely researches have discussed the local and global effects of the confounding factors on COVID-19 casualties in the US. However, most of these studies considered little about the time varying associations between and among these factors, which are crucial for understanding the outbreak of the present pandemic. Therefore, this study adopts various relevant approaches, including local and global spatial regression models and machine learning to explore the causal effects of the confounding factors on COVID-19 counts in the contiguous US. Totally five spatial regression models, spatial lag model (SLM), ordinary least square (OLS), spatial error model (SEM), geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR), are performed at the county scale to take into account the scale effects on modelling. For COVID-19 cases, ethnicity, crime, and income factors are found to be the strongest covariates and explain the maximum model variances. For COVID-19 deaths, both (domestic and international) migration and income factors play a crucial role in explaining spatial differences of COVID-19 death counts across counties. The local coefficient of determination (R2) values derived from the GWR and MGWR models are found very high over the Wisconsin-Indiana-Michigan (the Great Lake) region, as well as several parts of Texas, California, Mississippi and Arkansas.

72 citations


Journal ArticleDOI
TL;DR: This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data that can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.
Abstract: This paper describes an open data set of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters were collected from 19 sites across North America and Europe, with one or more meters per building measuring whole building electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data were used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. GEPIII was a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

58 citations


Posted Content
TL;DR: A time series model is proposed to analyze the trend pattern of the incidence of COVID-19 outbreak and it is shown that a time-dependent quadratic trend successfully captures the incidence patterns of the disease.
Abstract: The ongoing pandemic of Coronavirus disease (COVID-19) emerged in Wuhan, China in the end of 2019. It has already affected more than 300,000 people, with the number of deaths nearing 13000 across the world. As it has been posing a huge threat to global public health, it is of utmost importance to identify the rate at which the disease is spreading. In this study, we propose a time series model to analyze the trend pattern of the incidence of COVID-19 outbreak. We also incorporate information on total or partial lockdown, wherever available, into the model. The model is concise in structure, and using appropriate diagnostic measures, we showed that a time-dependent quadratic trend successfully captures the incidence pattern of the disease. We also estimate the basic reproduction number across different countries, and find that it is consistent except for the United States of America. The above statistical analysis is able to shed light on understanding the trends of the outbreak, and gives insight on what epidemiological stage a region is in. This has the potential to help in prompting policies to address COVID-19 pandemic in different countries.

58 citations


Posted Content
TL;DR: The results for three models predicting such complications due to COVID-19 are presented, with each model having varying levels of predictive effectiveness at the expense of ease of implementation.
Abstract: COVID-19 is an acute respiratory disease that has been classified as a pandemic by the World Health Organization. Characterization of this disease is still in its early stages. However, it is known to have high mortality rates, particularly among individuals with preexisting medical conditions. Creating models to identify individuals who are at the greatest risk for severe complications due to COVID-19 will be useful for outreach campaigns to help mitigate the disease's worst effects. While information specific to COVID-19 is limited, a model using complications due to other upper respiratory infections can be used as a proxy to help identify those individuals who are at the greatest risk. We present the results for three models predicting such complications, with each model increasing predictive effectiveness at the expense of ease of implementation.

55 citations


Posted Content
TL;DR: The number of confirmed cases of severe acute respiratory syndrome coronavirus (COVID-19) in China has increased significantly in the past few months, and the number of cases is likely to continue to increase.
Abstract: Background: Severe acute respiratory syndrome Coronavirus 2019 (COVID-19) has been firstly detected in China at the end of 2019 and it spread in few months all

54 citations


Journal ArticleDOI
TL;DR: A machine leaning based framework to forecast corn yields in three US Corn Belt states considering complete and partial in-season weather knowledge is provided and it is suggested that weather features corresponding to weather in weeks 18–24 (May 1st to June 1st) are the most important input features.
Abstract: The emerge of new technologies to synthesize and analyze big data with high-performance computing, has increased our capacity to more accurately predict crop yields. Recent research has shown that Machine learning (ML) can provide reasonable predictions, faster, and with higher flexibility compared to simulation crop modeling. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district, and state level scales. Results show that ensemble models based on weighted average of the base learners outperform individual models. Specifically, the proposed ensemble model could achieve best prediction accuracy (RRMSE of 7.8%) and least mean bias error (-6.06 bu/acre) compared to other developed models. Comparing our proposed model forecasts with the literature demonstrates the superiority of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveal that decent yield forecasts can be made as early as June 1st. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18-24 (May 1st to June 1st) are the most important input features.

Posted Content
TL;DR: Evaluating digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020 suggests that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.
Abstract: Non-pharmaceutical interventions (NPIs) have been crucial in curbing COVID-19 in the United States (US). Consequently, relaxing NPIs through a phased re-opening of the US amid still-high levels of COVID-19 susceptibility could lead to new epidemic waves. This calls for a COVID-19 early warning system. Here we evaluate multiple digital data streams as early warning indicators of increasing or decreasing state-level US COVID-19 activity between January and June 2020. We estimate the timing of sharp changes in each data stream using a simple Bayesian model that calculates in near real-time the probability of exponential growth or decay. Analysis of COVID-19-related activity on social network microblogs, Internet searches, point-of-care medical software, and a metapopulation mechanistic model, as well as fever anomalies captured by smart thermometer networks, shows exponential growth roughly 2-3 weeks prior to comparable growth in confirmed COVID-19 cases and 3-4 weeks prior to comparable growth in COVID-19 deaths across the US over the last 6 months. We further observe exponential decay in confirmed cases and deaths 5-6 weeks after implementation of NPIs, as measured by anonymized and aggregated human mobility data from mobile phones. Finally, we propose a combined indicator for exponential growth in multiple data streams that may aid in developing an early warning system for future COVID-19 outbreaks. These efforts represent an initial exploratory framework, and both continued study of the predictive power of digital indicators as well as further development of the statistical approach are needed.

Journal ArticleDOI
TL;DR: It emerges that internal mobility is more important than mobility across provinces and that the typical lagged positive effect of reduced human mobility on reducing excess deaths is around 14–20 days, meaning that mobility restrictions seem to have effectively contribute to save lives.
Abstract: Mobility data at EU scale can help understand the dynamics of the pandemic and possibly limit the impact of future waves. Still, since a reliable and consistent method to measure the evolution of contagion at international level is missing, a systematic analysis of the relationship between human mobility and virus spread has never been conducted. A notable exceptions are France and Italy, for which data on excess deaths, an indirect indicator which is generally considered to be less affected by national and regional assumptions, are available at department and municipality level, respectively. Using this information together with anonymised and aggregated mobile data, this study shows that mobility alone can explain up to 92% of the initial spread in these two EU countries, while it has a slow decay effect after lockdown measures, meaning that mobility restrictions seem to have effectively contribute to save lives. It also emerges that internal mobility is more important than mobility across provinces and that the typical lagged positive effect of reduced human mobility on reducing excess deaths is around 14-20 days. An analogous analysis relative to Spain, for which an IgG SARS-Cov-2 antibody screening study at province level is used instead of excess deaths statistics, confirms the findings. The same approach adopted in this study can be easily extended to other European countries, as soon as reliable data on the spreading of the virus at a suitable level of granularity will be available. Looking at past data, relative to the initial phase of the outbreak in EU Member States, this study shows in which extent the spreading of the virus and human mobility are connected.

Journal ArticleDOI
Philip Nadler1, Shuo Wang1, Rossella Arcucci1, Xian Yang1, Yike Guo1 
TL;DR: In this article, an epidemiological model for forecasting and policy evaluation which incorporates new data in real-time through variational data assimilation is proposed, which is parsimonious and extendable, allowing for the incorporation of additional data and parameters of interest.
Abstract: The global pandemic of the 2019-nCov requires the evaluation of policy interventions to mitigate future social and economic costs of quarantine measures worldwide. We propose an epidemiological model for forecasting and policy evaluation which incorporates new data in real-time through variational data assimilation. We analyze and discuss infection rates in China, the US and Italy. In particular, we develop a custom compartmental SIR model fit to variables related to the epidemic in Chinese cities, named SITR model. We compare and discuss model results which conducts updates as new observations become available. A hybrid data assimilation approach is applied to make results robust to initial conditions. We use the model to do inference on infection numbers as well as parameters such as the disease transmissibility rate or the rate of recovery. The parameterisation of the model is parsimonious and extendable, allowing for the incorporation of additional data and parameters of interest. This allows for scalability and the extension of the model to other locations or the adaption of novel data sources.

Posted Content
TL;DR: A novel nonparametric space-time disease transmission model is developed for the epidemic data to study the spatial-temporal pattern in the spread of COVID-19 at the county level and to forecast how this outbreak may unfold through time and space in the future.
Abstract: Epidemic modeling is an essential tool to understand the spread of the novel coronavirus and ultimately assist in disease prevention, policymaking, and resource allocation. In this article, we establish a state of the art interface between classic mathematical and statistical models and propose a novel space-time epidemic modeling framework to study the spatial-temporal pattern in the spread of infectious disease. We propose a quasi-likelihood approach via the penalized spline approximation and alternatively reweighted least-squares technique to estimate the model. Furthermore, we provide a short-term and long-term county-level prediction of the infected/death count for the U.S. by accounting for the control measures, health service resources, and other local features. Utilizing spatiotemporal analysis, our proposed model enhances the dynamics of the epidemiological mechanism and dissects the spatiotemporal structure of the spreading disease. To assess the uncertainty associated with the prediction, we develop a projection band based on the envelope of the bootstrap forecast paths. The performance of the proposed method is evaluated by a simulation study. We apply the proposed method to model and forecast the spread of COVID-19 at both county and state levels in the United States.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper developed a simulation-based statistic test for the local indicator of colocation quotient (LCLQ) and applied the indicator to examine the association of land use facilities with crime patterns.
Abstract: Most existing point-based colocation methods are global measures (e.g., join count statistic, cross K function, and global colocation quotient). Most recently, a local indicator such as the local colocation quotient is proposed to capture the variability of colocation across areas. Our research advances this line of work by developing a simulation-based statistic test for the local indicator of colocation quotient (LCLQ). The study applies the indicator to examine the association of land use facilities with crime patterns. Moreover, we use the street network distance in addition to the traditional Euclidean distance in defining neighbors since human activities (including facilities and crimes) usually occur along a street network. The method is applied to analyze the colocation of three types of crimes and three categories of facilities in a city in Jiangsu Province, China. The findings demonstrate the value of the proposed method in colocation analysis of crime and facilities, and in general colocation analysis of point data.

Journal ArticleDOI
TL;DR: In this article, two time series decomposition methods are developed for short-term forecasting of the CO2 emissions of electricity, which are in turn benchmarked against a set of state-of-the-art models.
Abstract: The world is facing major challenges related to global warming and emissions of greenhouse gases is a major causing factor. In 2017, energy industries accounted for 46% of all CO2 emissions globally, which shows a large potential for reduction. This paper proposes a novel short-term CO2 emissions forecast to enable intelligent scheduling of flexible electricity consumption to minimize the resulting CO2 emissions. Two proposed time series decomposition methods are developed for short-term forecasting of the CO2 emissions of electricity. These are in turn bench-marked against a set of state-of-the-art models. The result is a new forecasting method with a 48-hour horizon targeted the day-ahead electricity market. Forecasting benchmarks for France show that the new method has a mean absolute percentage error that is 25% lower than the best performing state-of-the-art model. Further, application of the forecast for scheduling flexible electricity consumption is studied for five European countries. Scheduling a flexible block of 4 hours of electricity consumption in a 24 hour interval can on average reduce the resulting CO2 emissions by 25% in France, 17% in Germany, 69% in Norway, 20% in Denmark, and just 3% in Poland when compared to consuming at random intervals during the day.

Journal ArticleDOI
TL;DR: In this paper, a statistical framework is proposed to combine social media data with traditional survey data to produce timely ''nowcasts'' of migrant stocks by state in the United States, and the model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach to account for correlations across age, time and space.
Abstract: Measuring and forecasting migration patterns, and how they change over time, has important implications for understanding broader population trends, for designing policy effectively and for allocating resources. However, data on migration and mobility are often lacking, and those that do exist are not available in a timely manner. Social media data offer new opportunities to provide more up-to-date demographic estimates and to complement more traditional data sources. Facebook, for example, can be thought of as a large digital census that is regularly updated. However, its users are not representative of the underlying population. This paper proposes a statistical framework to combine social media data with traditional survey data to produce timely `nowcasts' of migrant stocks by state in the United States. The model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach, to account for correlations across age, time and space. We illustrate the results for migrants from Mexico, India and Germany, and show that the model outperforms alternatives that rely solely on either social media or survey data.

Journal ArticleDOI
TL;DR: The results show that the hybrid models, except for ARIMA-ETS, are better at capturing the linear and non-linear epidemic patterns, by outperforming the respective single models; and the number of COVID-19-related hospitalized with mild symptoms and in ICU will rapidly increase in the next weeks.
Abstract: Coronavirus disease (COVID-19) is a severe ongoing novel pandemic that has emerged in Wuhan, China, in December 2019. As of October 13, the outbreak has spread rapidly across the world, affecting over 38 million people, and causing over 1 million deaths. In this article, I analysed several time series forecasting methods to predict the spread of COVID-19 second wave in Italy, over the period after October 13, 2020. I used an autoregressive model (ARIMA), an exponential smoothing state space model (ETS), a neural network autoregression model (NNAR), and the following hybrid combinations of them: ARIMA-ETS, ARIMA-NNAR, ETS-NNAR, and ARIMA-ETS-NNAR. About the data, I forecasted the number of patients hospitalized with mild symptoms, and in intensive care units (ICU). The data refer to the period February 21, 2020-October 13, 2020 and are extracted from the website of the Italian Ministry of Health (this http URL). The results show that i) the hybrid models, except for ARIMA-ETS, are better at capturing the linear and non-linear epidemic patterns, by outperforming the respective single models; and ii) the number of COVID-19-related hospitalized with mild symptoms and in ICU will rapidly increase in the next weeks, by reaching the peak in about 50-60 days, i.e. in mid-December 2020, at least. To tackle the upcoming COVID-19 second wave, on one hand, it is necessary to hire healthcare workers and implement sufficient hospital facilities, protective equipment, and ordinary and intensive care beds; and on the other hand, it may be useful to enhance social distancing by improving public transport and adopting the double-shifts schooling system, for example.

Posted Content
TL;DR: In this article, adaptive generalized additive models using Kalman filters and fine-tuning to adjust to new electricity consumption patterns were introduced to forecast the electricity demand during the French lockdown period, where they demonstrate their ability to significantly reduce prediction errors compared to traditional models.
Abstract: The coronavirus disease 2019 (COVID-19) pandemic has urged many governments in the world to enforce a strict lockdown where all nonessential businesses are closed and citizens are ordered to stay at home. One of the consequences of this policy is a significant change in electricity consumption patterns. Since load forecasting models rely on calendar or meteorological information and are trained on historical data, they fail to capture the significant break caused by the lockdown and have exhibited poor performances since the beginning of the pandemic. This makes the scheduling of the electricity production challenging, and has a high cost for both electricity producers and grid operators. In this paper we introduce adaptive generalized additive models using Kalman filters and fine-tuning to adjust to new electricity consumption patterns. Additionally, knowledge from the lockdown in Italy is transferred to anticipate the change of behavior in France. The proposed methods are applied to forecast the electricity demand during the French lockdown period, where they demonstrate their ability to significantly reduce prediction errors compared to traditional models. Finally expert aggregation is used to leverage the specificities of each predictions and enhance results even further.

Proceedings ArticleDOI
TL;DR: It is illustrated how linking large-scale administrative data can enable auditing mobility data for bias in the absence of demographic information and ground truth labels, and it is shown that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
Abstract: Anonymized smartphone-based mobility data has been widely adopted in devising and evaluating COVID-19 response strategies such as the targeting of public health resources. Yet little attention has been paid to measurement validity and demographic bias, due in part to the lack of documentation about which users are represented as well as the challenge of obtaining ground truth data on unique visits and demographics. We illustrate how linking large-scale administrative data can enable auditing mobility data for bias in the absence of demographic information and ground truth labels. More precisely, we show that linking voter roll data -- containing individual-level voter turnout for specific voting locations along with race and age -- can facilitate the construction of rigorous bias and reliability tests. These tests illuminate a sampling bias that is particularly noteworthy in the pandemic context: older and non-white voters are less likely to be captured by mobility data. We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.

Journal ArticleDOI
TL;DR: The CovEWS early warning system (CovEWS), a risk scoring system for assessing COVID-19 related mortality risk that was developed using data amounting to a total of over 2863 years of observation time from a cohort of 66 430 patients seen at over 69 healthcare institutions, could enable earlier intervention, and may therefore help in preventing or mitigating COIDs related mortality.
Abstract: Coronavirus Disease 2019 (COVID-19) is an emerging respiratory disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with rapid human-to-human transmission and a high case fatality rate particularly in older patients. Due to the exponential growth of infections, many healthcare systems across the world are under pressure to care for increasing amounts of at-risk patients. Given the high number of infected patients, identifying patients with the highest mortality risk early is critical to enable effective intervention and optimal prioritisation of care. Here, we present the COVID-19 Early Warning System (CovEWS), a clinical risk scoring system for assessing COVID-19 related mortality risk. CovEWS provides continuous real-time risk scores for individual patients with clinically meaningful predictive performance up to 192 hours (8 days) in advance, and is automatically derived from patients' electronic health records (EHRs) using machine learning. We trained and evaluated CovEWS using de-identified data from a cohort of 66430 COVID-19 positive patients seen at over 69 healthcare institutions in the United States (US), Australia, Malaysia and India amounting to an aggregated total of over 2863 years of patient observation time. On an external test cohort of 5005 patients, CovEWS predicts COVID-19 related mortality from $78.8\%$ ($95\%$ confidence interval [CI]: $76.0$, $84.7\%$) to $69.4\%$ ($95\%$ CI: $57.6, 75.2\%$) specificity at a sensitivity greater than $95\%$ between respectively 1 and 192 hours prior to observed mortality events - significantly outperforming existing generic and COVID-19 specific clinical risk scores. CovEWS could enable clinicians to intervene at an earlier stage, and may therefore help in preventing or mitigating COVID-19 related mortality.

Journal ArticleDOI
TL;DR: The results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT-BART-NP) consistently yields the best performance, in terms of bias, precision, and expected regret.
Abstract: Methods for estimating heterogeneous treatment effect in observational data have largely focused on continuous or binary outcomes, and have been relatively less vetted with survival outcomes. Using flexible machine learning methods in the counterfactual framework is a promising approach to address challenges due to complex individual characteristics, to which treatments need to be tailored. To evaluate the operating characteristics of recent survival machine learning methods for the estimation of treatment effect heterogeneity and inform better practice, we carry out a comprehensive simulation study presenting a wide range of settings describing confounded heterogeneous survival treatment effects and varying degrees of covariate overlap. Our results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT-BART-NP) consistently yields the best performance, in terms of bias, precision and expected regret. Moreover, the credible interval estimators from AFT-BART-NP provide close to nominal frequentist coverage for the individual survival treatment effect when the covariate overlap is at least moderate. Including a non-parametrically estimated propensity score as an additional fixed covariate in the AFT-BART-NP model formulation can further improve its efficiency and frequentist coverage. Finally, we demonstrate the application of flexible causal machine learning estimators through a comprehensive case study examining the heterogeneous survival effects of two radiotherapy approaches for localized high-risk prostate cancer.

Posted Content
TL;DR: This work proposes a new ensemble method for probabilistic forecasting, which borrows strength across the households while accommodating their individual idiosyncrasies, and is an extension of regression stacking (Breiman, 1996) where the mixture weights are modelled using linear combinations of parametric, smooth or random effects.
Abstract: Future grid management systems will coordinate distributed production and storage resources to manage, in a cost effective fashion, the increased load and variability brought by the electrification of transportation and by a higher share of weather dependent production. Electricity demand forecasts at a low level of aggregation will be key inputs for such systems. We focus on forecasting demand at the individual household level, which is more challenging than forecasting aggregate demand, due to the lower signal-to-noise ratio and to the heterogeneity of consumption patterns across households. We propose a new ensemble method for probabilistic forecasting, which borrows strength across the households while accommodating their individual idiosyncrasies. In particular, we develop a set of models or 'experts' which capture different demand dynamics and we fit each of them to the data from each household. Then we construct an aggregation of experts where the ensemble weights are estimated on the whole data set, the main innovation being that we let the weights vary with the covariates by adopting an additive model structure. In particular, the proposed aggregation method is an extension of regression stacking (Breiman, 1996) where the mixture weights are modelled using linear combinations of parametric, smooth or random effects. The methods for building and fitting additive stacking models are implemented by the gamFactory R package, available at this https URL.

Journal ArticleDOI
TL;DR: Causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes; help educate the public and decision-makers alike; avoid unsound comparisons; and answer a range of causal questions pertaining to the pandemic, subject to transparently stated assumptions.
Abstract: We point out an instantiation of Simpson's paradox in Covid-19 case fatality rates (CFRs): comparing data of 44,672 cases from China with early reports from Italy (9th March), we find that CFRs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and mortality. As a case study, we then investigate total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age, and thus facilitates a more transparent comparison of CFRs across countries throughout the evolution of the Covid-19 pandemic.

Posted Content
TL;DR: In this article, an autoregressive integrated moving average (ARIMA) model is used to forecast the epidemic trend over the period after April 4, 2020, by using the Italian epidemiological data at national and regional level.
Abstract: Coronavirus disease (COVID-2019) is a severe ongoing novel pandemic that is spreading quickly across the world. Italy, that is widely considered one of the main epicenters of the pandemic, has registered the highest COVID-2019 death rates and death toll in the world, to the present day. In this article I estimate an autoregressive integrated moving average (ARIMA) model to forecast the epidemic trend over the period after April 4, 2020, by using the Italian epidemiological data at national and regional level. The data refer to the number of daily confirmed cases officially registered by the Italian Ministry of Health (this http URL) for the period February 20 to April 4, 2020. The main advantage of this model is that it is easy to manage and fit. Moreover, it may give a first understanding of the basic trends, by suggesting the hypothetic epidemic's inflection point and final size.

Posted Content
TL;DR: The proposed novel discrete grey seasonal model, abbreviated as , is put forward by incorporating the seasonal dummy variables into the conventional model and significantly outperforms the other benchmark models in terms of several error criteria.
Abstract: In order to accurately describe real systems with seasonal disturbances, which normally appear monthly or quarterly cycles, a novel discrete grey seasonal model, abbreviated as , is put forward by incorporating the seasonal dummy variables into the conventional model. Moreover, the mechanism and properties of this proposed model are discussed in depth, revealing the inherent differences from the existing seasonal grey models. For validation and explanation purposes, the proposed model is implemented to describe three actual cases with monthly and quarterly seasonal fluctuations (quarterly wind power production, quarterly PM10, and monthly natural gas consumption), in comparison with five competing models involving grey prediction models , conventional econometric technology , and artificial intelligences . Experimental results from the cases consistently demonstrated that the proposed model significantly outperforms the other benchmark models in terms of several error criteria. Moreover, further discussions about the influences of different sequence lengths on the forecasting performance reveal that the proposed model still performs the best with strong robustness and high reliability in addressing seasonal sequences. In general, the new model is validated to be a powerful and promising methodology for handling sequences with seasonal fluctuations.

Posted Content
TL;DR: The results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns, and show spatial infection patterns based on geographical as well as social distances.
Abstract: Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between March 3rd and June 22nd, 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographic as well as social distances.

Posted Content
TL;DR: Future empirical work should focus on driver-initiated transitions, overtakes, silent failures, complex traffic situations, and adverse driving environments, to identify safety concerns and gaps between crash types and current areas of focus in the current research.
Abstract: Automated vehicle technology promises to reduce the societal impact of traffic crashes. Early investigations of this technology suggest that significant safety issues remain during control transfers between the automation and human drivers and automation interactions with the transportation system. In order to address these issues, it is critical to understand both the behavior of human drivers during these events and the environments where they occur. This article analyzes automated vehicle crash narratives from the California Department of Motor Vehicles automated vehicle crash database to identify safety concerns and gaps between crash types and current areas of focus in the current research. The database was analyzed using probabilistic topic modeling of open-ended crash narratives. Topic modeling analysis identified five themes in the database: driver-initiated transition crashes, sideswipe crashes during left-side overtakes, and rear-end collisions while the vehicle was stopped at an intersection, in a turn lane, and when the crash involved oncoming traffic. Many crashes represented by the driver-initiated transitions topic were also associated with the side-swipe collisions. A substantial portion of the side-swipe collisions also involved motorcycles. These findings highlight previously raised safety concerns with transitions of control and interactions between vehicles in automated mode and the transportation social network. In response to these findings, future empirical work should focus on driver-initiated transitions, overtakes, silent failures, complex traffic situations, and adverse driving environments. Beyond this future work, the topic modeling analysis method may be used as a tool to monitor emergent safety issues.

Posted Content
TL;DR: This work studies various nonadaptive test designs for the first stage of two-stage group testing, and derives a new lower bound for the total number of tests required, finding that a first-stage design with constant tests per item and constant items per test is extremely close to optimal.
Abstract: Inspired by applications in testing for COVID-19, we consider a variant of two-stage group testing we call "conservative" two-stage testing, where every item declared to be defective must be definitively confirmed by being tested by itself in the second stage. We study this in the linear regime where the prevalence is fixed while the number of items is large. We study various nonadaptive test designs for the first stage, and derive a new lower bound for the total number of tests required. We find that a first-stage design with constant tests per item and constant items per test due to Broder and Kumar (arXiv:2004.01684) is extremely close to optimal. Simulations back up the theoretical results.