Showing papers in "arXiv: Applications in 2020"

PDF

Open Access

Journal Article•DOI•

The importance of transparency and reproducibility in artificial intelligence research

[...]

Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Levi Waldron, Bo Wang, Chris McIntosh, Anshul Kundaje, Casey S. Greene, Michael M. Hoffman, Jeffrey T. Leek, Wolfgang Huber, Alvis Brazma, Joelle Pineau, Robert Tibshirani, Trevor Hastie, John P. A. Ioannidis, John Quackenbush, Hugo J.W.L. Aerts - Show less +15 more

28 Feb 2020-arXiv: Applications

TL;DR: In this article, the authors identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al. and provide solutions with implications for the broader field, including the broader cancer screening.

...read moreread less

Abstract: In their study, McKinney et al. showed the high potential of artificial intelligence for breast cancer screening. However, the lack of detailed methods and computer code undermines its scientific value. We identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al and provide solutions with implications for the broader field.

...read moreread less

166 citations

Repository•

Forecasting: theory and practice

[...]

Fotios Petropoulos, Daniele Apiletti¹, Vassilios Assimakopoulos², Mohamed Zied Babai³, Devon K. Barrow⁴, Souhaib Ben Taieb⁵, Christoph Bergmeir⁶, Ricardo J. Bessa, Jakub Bijak⁷, John E. Boylan⁸, Jethro Browell⁹, Claudio Carnevale¹⁰, Jennifer L. Castle¹¹, Pasquale Cirillo¹², Michael P. Clements¹³, Clara Cordeiro¹⁴, Clara Cordeiro¹⁵, Fernando Luiz Cyrino Oliveira¹⁶, Shari De Baets¹⁷, Alexander Dokumentov, Joanne Ellison⁷, Piotr Fiszeder¹⁸, Philip Hans Franses¹⁹, David T. Frazier⁶, Michael Gilliland²⁰, M. Sinan Gönül, Paul Goodwin²¹, Luigi Grossi²², Yael Grushka-Cockayne²³, Mariangela Guidolin²², Massimo Guidolin²⁴, Ulrich Gunter²⁵, Xiaojia Guo²⁶, Renato Guseo²², Nigel Harvey²⁷, David F. Hendry¹¹, Ross Hollyman²¹, Tim Januschowski²⁸, Jooyoung Jeon²⁹, Victor Richmond R. Jose³⁰, Yanfei Kang³¹, Anne B. Koehler³², Stephan Kolassa⁸, Nikolaos Kourentzes³³, Nikolaos Kourentzes⁸, Sonia Leva, Feng Li³⁴, Konstantia Litsiou³⁵, Spyros Makridakis³⁶, Gael M. Martin⁶, Andrew B. Martinez³⁷, Andrew B. Martinez³⁸, Sheik Meeran, Theodore Modis, Konstantinos Nikolopoulos³⁹, Dilek Önkal, Alessia Paccagnini⁴⁰, Alessia Paccagnini⁴¹, Anastasios Panagiotelis⁴², Ioannis P. Panapakidis⁴³, Jose M. Pavía⁴⁴, Manuela Pedio²⁴, Manuela Pedio⁴⁵, Diego J. Pedregal⁴⁶, Pierre Pinson⁴⁷, Patrícia Ramos⁴⁸, David E. Rapach⁴⁹, J. James Reade¹³, Bahman Rostami-Tabar⁵⁰, Michał Rubaszek⁵¹, Georgios Sermpinis⁹, Han Lin Shang⁵², Evangelos Spiliotis², Aris A. Syntetos⁵⁰, Priyanga Dilini Talagala⁵³, Thiyanga S. Talagala⁵⁴, Len Tashman⁵⁵, Dimitrios D. Thomakos⁵⁶, Thordis L. Thorarinsdottir⁵⁷, Ezio Todini⁵⁸, Juan Ramón Trapero Arenas⁴⁶, Xiaoqian Wang³¹, Robert L. Winkler⁵⁹, Alisa Yusupova⁸, Florian Ziel⁶⁰ - Show less +81 more•Institutions (60)

Polytechnic University of Turin¹, National Technical University of Athens², KEDGE Business School³, University of Birmingham⁴, University of Mons⁵, Monash University⁶, University of Southampton⁷, Lancaster University⁸, University of Glasgow⁹, University of Brescia¹⁰, University of Oxford¹¹, Zürcher Fachhochschule¹², University of Reading¹³, University of the Algarve¹⁴, University of Lisbon¹⁵, Pontifical Catholic University of Rio de Janeiro¹⁶, Ghent University¹⁷, Nicolaus Copernicus University in Toruń¹⁸, Erasmus University Rotterdam¹⁹, SAS Institute²⁰, University of Bath²¹, University of Padua²², University of Virginia²³, Bocconi University²⁴, MODUL University Vienna²⁵, University of Maryland, College Park²⁶, University College London²⁷, Amazon.com²⁸, KAIST²⁹, Georgetown University³⁰, Beihang University³¹, Miami University³², University of Skövde³³, Central University of Finance and Economics³⁴, Manchester Metropolitan University³⁵, University of Nicosia³⁶, George Washington University³⁷, United States Department of the Treasury³⁸, Durham University³⁹, Australian National University⁴⁰, University College Dublin⁴¹, University of Sydney⁴², University of Thessaly⁴³, University of Valencia⁴⁴, University of Bristol⁴⁵, University of Castilla–La Mancha⁴⁶, Technical University of Denmark⁴⁷, Polytechnic Institute of Porto⁴⁸, Saint Louis University⁴⁹, Cardiff University⁵⁰, Warsaw School of Economics⁵¹, Macquarie University⁵², University of Moratuwa⁵³, University of Sri Jayewardenepura⁵⁴, International Institute of Minnesota⁵⁵, National and Kapodistrian University of Athens⁵⁶, Norwegian Computing Center⁵⁷, University of Bologna⁵⁸, Duke University⁵⁹, University of Duisburg-Essen⁶⁰

04 Dec 2020-arXiv: Applications

TL;DR: A non-systematic review of the theory and the practice of forecasting, offering a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts.

...read moreread less

Abstract: Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

...read moreread less

163 citations

Posted Content•

Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in European countries: technical description update

[...]

Seth Flaxman, Swapnil Mishra, Axel Gandy, H. Juliette T. Unwin, Helen Coupland, Thomas A. Mellan, Harrison Zhu, Tresnia Berah, Jeffrey W. Eaton, Pablo N P Guzman, Nora Schmit, Lucia Callizo, C. R. Whittaker, Peter Winskill, Xiaoyue Xi, Azra C. Ghani, Christl A. Donnelly, Steven Riley, Lucy C Okell, Michaela A. C. Vollmer, Neil M. Ferguson, Samir Bhatt¹ - Show less +18 more•Institutions (1)

Imperial College London¹

23 Apr 2020-arXiv: Applications

TL;DR: A semi-mechanistic Bayesian hierarchical model is extended that infers the impact of these interventions and estimates the number of infections over time in Europe following the emergence of a novel coronavirus and its spread outside of China.

...read moreread less

Abstract: Following the emergence of a novel coronavirus (SARS-CoV-2) and its spread outside of China, Europe has experienced large epidemics. In response, many European countries have implemented unprecedented non-pharmaceutical interventions including case isolation, the closure of schools and universities, banning of mass gatherings and/or public events, and most recently, wide-scale social distancing including local and national lockdowns. In this technical update, we extend a semi-mechanistic Bayesian hierarchical model that infers the impact of these interventions and estimates the number of infections over time. Our methods assume that changes in the reproductive number - a measure of transmission - are an immediate response to these interventions being implemented rather than broader gradual changes in behaviour. Our model estimates these changes by calculating backwards from temporal data on observed to estimate the number of infections and rate of transmission that occurred several weeks prior, allowing for a probabilistic time lag between infection and death. In this update we extend our original model [Flaxman, Mishra, Gandy et al 2020, Report #13, Imperial College London] to include (a) population saturation effects, (b) prior uncertainty on the infection fatality ratio, (c) a more balanced prior on intervention effects and (d) partial pooling of the lockdown intervention covariate. We also (e) included another 3 countries (Greece, the Netherlands and Portugal). The model code is available at this https URL We are now reporting the results of our updated model online at this https URL We estimated parameters jointly for all M=14 countries in a single hierarchical model. Inference is performed in the probabilistic programming language Stan using an adaptive Hamiltonian Monte Carlo (HMC) sampler.

...read moreread less

118 citations

Journal Article•DOI•

Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV Novel Coronavirus Outbreak

[...]

Simon Fong, Gloria Li, Nilanjan Dey, Rubén González Crespo, Enrique Herrera-Viedma - Show less +1 more

24 Mar 2020-arXiv: Applications

TL;DR: A methodology that embraces these three virtues of data mining from a small dataset is proposed, which aims at fine-tuning the parameters of an individual forecastingmodel for the highest possible accuracy.

...read moreread less

Abstract: Epidemic is a rapid and wide spread of infectious disease threatening many lives and economy damages. It is important to fore-tell the epidemic lifetime so to decide on timely and remedic actions. These measures include closing borders, schools, suspending community services and commuters. Resuming such curfews depends on the momentum of the outbreak and its rate of decay. Being able to accurately forecast the fate of an epidemic is an extremely important but difficult task. Due to limited knowledge of the novel disease, the high uncertainty involved and the complex societal-political factors that influence the widespread of the new virus, any forecast is anything but reliable. Another factor is the insufficient amount of available data. Data samples are often scarce when an epidemic just started. With only few training samples on hand, finding a forecasting model which offers forecast at the best efforts is a big challenge in machine learning. In the past, three popular methods have been proposed, they include 1) augmenting the existing little data, 2) using a panel selection to pick the best forecasting model from several models, and 3) fine-tuning the parameters of an individual forecastingmodel for the highest possible accuracy. In this paper, a methodology that embraces these three virtues of data mining from a small dataset is proposed...

...read moreread less

102 citations

Posted Content•DOI•

The effect of stay-at-home orders on COVID-19 cases and fatalities in the United States

[...]

James H. Fowler¹, Seth J. Hill, Nick Obradovich, Remy Levin•Institutions (1)

University of California, San Diego¹

17 Apr 2020-arXiv: Applications

TL;DR: In this article, the authors estimate the effect of stay-at-home orders using a difference-in-differences design that accounts for local variation in factors like health systems and demographics and for unmeasured temporal variation in national mitigation actions and access to tests.

...read moreread less

Abstract: Governments issue "stay at home" orders to reduce the spread of contagious diseases, but the magnitude of such orders' effectiveness is uncertain. In the United States these orders were not coordinated at the national level during the coronavirus disease 2019 (COVID-19) pandemic, which creates an opportunity to use spatial and temporal variation to measure the policies' effect with greater accuracy. Here, we combine data on the timing of stay-at-home orders with daily confirmed COVID-19 cases and fatalities at the county level in the United States. We estimate the effect of stay-at-home orders using a difference-in-differences design that accounts for unmeasured local variation in factors like health systems and demographics and for unmeasured temporal variation in factors like national mitigation actions and access to tests. Compared to counties that did not implement stay-at-home orders, the results show that the orders are associated with a 30.2 percent (11.0 to 45.2) reduction in weekly cases after one week, a 40.0 percent (23.4 to 53.0) reduction after two weeks, and a 48.6 percent (31.1 to 61.7) reduction after three weeks. Stay-at-home orders are also associated with a 59.8 percent (18.3 to 80.2) reduction in weekly fatalities after three weeks. These results suggest that stay-at-home orders reduced confirmed cases by 390,000 (170,000 to 680,000) and fatalities by 41,000 (27,000 to 59,000) within the first three weeks in localities where they were implemented.

...read moreread less

87 citations

Journal Article•DOI•

Spatiotemporal effects of the causal factors on COVID-19 incidences in the contiguous United States

[...]

Arabinda Maiti, Qi Zhang, Srikanta Sannigrahi, Suvamoy Pramanik¹, Suman Chakraborti, Francesco Pilla - Show less +2 more•Institutions (1)

Jawaharlal Nehru University¹

29 Oct 2020-arXiv: Applications

TL;DR: In this paper, the causal effects of the confounding factors on COVID-19 counts in the contiguous US were explored using various relevant approaches, including local and global spatial regression models and machine learning.

...read moreread less

Abstract: Since December 2019, the world has been witnessing the gigantic effect of an unprecedented global pandemic called Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) - COVID-19. So far, 38,619,674 confirmed cases and 1,093,522 confirmed deaths due to COVID-19 have been reported. In the United States (US), the cases and deaths are recorded as 7,833,851 and 215,199. Several timely researches have discussed the local and global effects of the confounding factors on COVID-19 casualties in the US. However, most of these studies considered little about the time varying associations between and among these factors, which are crucial for understanding the outbreak of the present pandemic. Therefore, this study adopts various relevant approaches, including local and global spatial regression models and machine learning to explore the causal effects of the confounding factors on COVID-19 counts in the contiguous US. Totally five spatial regression models, spatial lag model (SLM), ordinary least square (OLS), spatial error model (SEM), geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR), are performed at the county scale to take into account the scale effects on modelling. For COVID-19 cases, ethnicity, crime, and income factors are found to be the strongest covariates and explain the maximum model variances. For COVID-19 deaths, both (domestic and international) migration and income factors play a crucial role in explaining spatial differences of COVID-19 death counts across counties. The local coefficient of determination (R2) values derived from the GWR and MGWR models are found very high over the Wisconsin-Indiana-Michigan (the Great Lake) region, as well as several parts of Texas, California, Mississippi and Arkansas.

...read moreread less

72 citations

Journal Article•DOI•

The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition

[...]

Clayton Miller¹, Anjukan Kathirgamanathan², Bianca Picchetti, Pandarasamy Arjunan, June Young Park³, Zoltan Nagy³, Paul Raftery⁴, Brodie W. Hobson⁵, Zixiao Shi⁵, Forrest Meggers⁶ - Show less +6 more•Institutions (6)

National University of Singapore¹, University College Dublin², University of Texas at Austin³, University of California, Berkeley⁴, Carleton University⁵, Princeton University⁶

03 Jun 2020-arXiv: Applications

TL;DR: This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data that can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

...read moreread less

Abstract: This paper describes an open data set of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters were collected from 19 sites across North America and Europe, with one or more meters per building measuring whole building electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data were used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. GEPIII was a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

...read moreread less

58 citations

Posted Content•

A time series method to analyze incidence pattern and estimate reproduction number of COVID-19

[...]

Soudeep Deb, Manidipa Majumdar

24 Mar 2020-arXiv: Applications

TL;DR: A time series model is proposed to analyze the trend pattern of the incidence of COVID-19 outbreak and it is shown that a time-dependent quadratic trend successfully captures the incidence patterns of the disease.

...read moreread less

Abstract: The ongoing pandemic of Coronavirus disease (COVID-19) emerged in Wuhan, China in the end of 2019. It has already affected more than 300,000 people, with the number of deaths nearing 13000 across the world. As it has been posing a huge threat to global public health, it is of utmost importance to identify the rate at which the disease is spreading. In this study, we propose a time series model to analyze the trend pattern of the incidence of COVID-19 outbreak. We also incorporate information on total or partial lockdown, wherever available, into the model. The model is concise in structure, and using appropriate diagnostic measures, we showed that a time-dependent quadratic trend successfully captures the incidence pattern of the disease. We also estimate the basic reproduction number across different countries, and find that it is consistent except for the United States of America. The above statistical analysis is able to shed light on understanding the trends of the outbreak, and gives insight on what epidemiological stage a region is in. This has the potential to help in prompting policies to address COVID-19 pandemic in different countries.

...read moreread less

58 citations

Posted Content•

Building a COVID-19 Vulnerability Index

[...]

David DeCaprio, Joseph Gartner, Thadeus Burgess, Kristian Garcia, Sarthak Kothari, Shaayan Sayed, Carol J. McCall - Show less +3 more

16 Mar 2020-arXiv: Applications

TL;DR: The results for three models predicting such complications due to COVID-19 are presented, with each model having varying levels of predictive effectiveness at the expense of ease of implementation.

...read moreread less

Abstract: COVID-19 is an acute respiratory disease that has been classified as a pandemic by the World Health Organization. Characterization of this disease is still in its early stages. However, it is known to have high mortality rates, particularly among individuals with preexisting medical conditions. Creating models to identify individuals who are at the greatest risk for severe complications due to COVID-19 will be useful for outreach campaigns to help mitigate the disease's worst effects. While information specific to COVID-19 is limited, a model using complications due to other upper respiratory infections can be used as a proxy to help identify those individuals who are at the greatest risk. We present the results for three models predicting such complications, with each model increasing predictive effectiveness at the expense of ease of implementation.

...read moreread less

55 citations

Posted Content•

Modelling and Predicting the Spatio-Temporal Spread of Coronavirus Disease 2019 (COVID-19) in Italy

[...]

Giuliani, Diego, Dickson, Maria Michela, Espa, Giuseppe, Santi, Flavio

14 Mar 2020-arXiv: Applications

TL;DR: The number of confirmed cases of severe acute respiratory syndrome coronavirus (COVID-19) in China has increased significantly in the past few months, and the number of cases is likely to continue to increase.

...read moreread less

Abstract: Background: Severe acute respiratory syndrome Coronavirus 2019 (COVID-19) has been firstly detected in China at the end of 2019 and it spread in few months all

...read moreread less

54 citations

Journal Article•DOI•

Forecasting Corn Yield with Machine Learning Ensembles

[...]

Mohsen Shahhosseini¹, Guiping Hu¹, Sotirios V. Archontoulis¹•Institutions (1)

Iowa State University¹

18 Jan 2020-arXiv: Applications

TL;DR: A machine leaning based framework to forecast corn yields in three US Corn Belt states considering complete and partial in-season weather knowledge is provided and it is suggested that weather features corresponding to weather in weeks 18–24 (May 1st to June 1st) are the most important input features.

...read moreread less

Abstract: The emerge of new technologies to synthesize and analyze big data with high-performance computing, has increased our capacity to more accurately predict crop yields. Recent research has shown that Machine learning (ML) can provide reasonable predictions, faster, and with higher flexibility compared to simulation crop modeling. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district, and state level scales. Results show that ensemble models based on weighted average of the base learners outperform individual models. Specifically, the proposed ensemble model could achieve best prediction accuracy (RRMSE of 7.8%) and least mean bias error (-6.06 bu/acre) compared to other developed models. Comparing our proposed model forecasts with the literature demonstrates the superiority of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveal that decent yield forecasts can be made as early as June 1st. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18-24 (May 1st to June 1st) are the most important input features.

...read moreread less

Posted Content•

An Early Warning Approach to Monitor COVID-19 Activity with Multiple Digital Traces in Near Real-Time

[...]

Nicole E. Kogan¹, Nicole E. Kogan², Leonardo Clemente¹, P. Liautaud², Justin Kaashoek², Justin Kaashoek¹, Nicholas B. Link², Nicholas B. Link¹, Andre T. Nguyen³, Andre T. Nguyen¹, Andre T. Nguyen⁴, Fred Lu⁵, Fred Lu¹, Peter Huybers², Bernd Resch², Bernd Resch⁶, Clemens Havas⁶, Andreas Petutschnig⁶, Jessica T. Davis⁷, Matteo Chinazzi⁷, Backtosch Mustafa⁸, Backtosch Mustafa¹, William P. Hanage², Alessandro Vespignani⁷, Mauricio Santillana² - Show less +21 more•Institutions (8)

Boston Children's Hospital¹, Harvard University², University of Maryland, Baltimore County³, Booz Allen Hamilton⁴, Stanford University⁵, University of Salzburg⁶, Northeastern University⁷, University of Hamburg⁸

01 Jul 2020-arXiv: Applications

TL;DR: Evaluating digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020 suggests that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.

...read moreread less

Abstract: Non-pharmaceutical interventions (NPIs) have been crucial in curbing COVID-19 in the United States (US). Consequently, relaxing NPIs through a phased re-opening of the US amid still-high levels of COVID-19 susceptibility could lead to new epidemic waves. This calls for a COVID-19 early warning system. Here we evaluate multiple digital data streams as early warning indicators of increasing or decreasing state-level US COVID-19 activity between January and June 2020. We estimate the timing of sharp changes in each data stream using a simple Bayesian model that calculates in near real-time the probability of exponential growth or decay. Analysis of COVID-19-related activity on social network microblogs, Internet searches, point-of-care medical software, and a metapopulation mechanistic model, as well as fever anomalies captured by smart thermometer networks, shows exponential growth roughly 2-3 weeks prior to comparable growth in confirmed COVID-19 cases and 3-4 weeks prior to comparable growth in COVID-19 deaths across the US over the last 6 months. We further observe exponential decay in confirmed cases and deaths 5-6 weeks after implementation of NPIs, as measured by anonymized and aggregated human mobility data from mobile phones. Finally, we propose a combined indicator for exponential growth in multiple data streams that may aid in developing an early warning system for future COVID-19 outbreaks. These efforts represent an initial exploratory framework, and both continued study of the predictive power of digital indicators as well as further development of the statistical approach are needed.

...read moreread less

Journal Article•DOI•

Human mobility and COVID-19 initial dynamics

[...]

Stefano Maria Iacus, Carlos Santamaria, Francesco Sermi, Spyridon Spyratos, Dario Tarchi, Michele Vespe - Show less +2 more

05 Jun 2020-arXiv: Applications

TL;DR: It emerges that internal mobility is more important than mobility across provinces and that the typical lagged positive effect of reduced human mobility on reducing excess deaths is around 14–20 days, meaning that mobility restrictions seem to have effectively contribute to save lives.

...read moreread less

Abstract: Mobility data at EU scale can help understand the dynamics of the pandemic and possibly limit the impact of future waves. Still, since a reliable and consistent method to measure the evolution of contagion at international level is missing, a systematic analysis of the relationship between human mobility and virus spread has never been conducted. A notable exceptions are France and Italy, for which data on excess deaths, an indirect indicator which is generally considered to be less affected by national and regional assumptions, are available at department and municipality level, respectively. Using this information together with anonymised and aggregated mobile data, this study shows that mobility alone can explain up to 92% of the initial spread in these two EU countries, while it has a slow decay effect after lockdown measures, meaning that mobility restrictions seem to have effectively contribute to save lives. It also emerges that internal mobility is more important than mobility across provinces and that the typical lagged positive effect of reduced human mobility on reducing excess deaths is around 14-20 days. An analogous analysis relative to Spain, for which an IgG SARS-Cov-2 antibody screening study at province level is used instead of excess deaths statistics, confirms the findings. The same approach adopted in this study can be easily extended to other European countries, as soon as reliable data on the spreading of the virus at a suitable level of granularity will be available. Looking at past data, relative to the initial phase of the outbreak in EU Member States, this study shows in which extent the spreading of the virus and human mobility are connected.

...read moreread less

Journal Article•DOI•

An Epidemiological Modelling Approach for Covid19 via Data Assimilation

[...]

Philip Nadler¹, Shuo Wang¹, Rossella Arcucci¹, Xian Yang¹, Yike Guo¹ - Show less +1 more•Institutions (1)

Imperial College London¹

25 Apr 2020-arXiv: Applications

TL;DR: In this article, an epidemiological model for forecasting and policy evaluation which incorporates new data in real-time through variational data assimilation is proposed, which is parsimonious and extendable, allowing for the incorporation of additional data and parameters of interest.

...read moreread less

Abstract: The global pandemic of the 2019-nCov requires the evaluation of policy interventions to mitigate future social and economic costs of quarantine measures worldwide. We propose an epidemiological model for forecasting and policy evaluation which incorporates new data in real-time through variational data assimilation. We analyze and discuss infection rates in China, the US and Italy. In particular, we develop a custom compartmental SIR model fit to variables related to the epidemic in Chinese cities, named SITR model. We compare and discuss model results which conducts updates as new observations become available. A hybrid data assimilation approach is applied to make results robust to initial conditions. We use the model to do inference on infection numbers as well as parameters such as the disease transmissibility rate or the rate of recovery. The parameterisation of the model is parsimonious and extendable, allowing for the incorporation of additional data and parameters of interest. This allows for scalability and the extension of the model to other locations or the adaption of novel data sources.

...read moreread less

Posted Content•

Spatiotemporal Dynamics, Nowcasting and Forecasting of COVID-19 in the United States

[...]

Li Wang¹, Guannan Wang, Lei Gao, Xinyi Li, Shan Yu, Myungjin Kim, Yueying Wang, Zhiling Gu - Show less +4 more•Institutions (1)

Iowa State University¹

29 Apr 2020-arXiv: Applications

TL;DR: A novel nonparametric space-time disease transmission model is developed for the epidemic data to study the spatial-temporal pattern in the spread of COVID-19 at the county level and to forecast how this outbreak may unfold through time and space in the future.

...read moreread less

Abstract: Epidemic modeling is an essential tool to understand the spread of the novel coronavirus and ultimately assist in disease prevention, policymaking, and resource allocation. In this article, we establish a state of the art interface between classic mathematical and statistical models and propose a novel space-time epidemic modeling framework to study the spatial-temporal pattern in the spread of infectious disease. We propose a quasi-likelihood approach via the penalized spline approximation and alternatively reweighted least-squares technique to estimate the model. Furthermore, we provide a short-term and long-term county-level prediction of the infected/death count for the U.S. by accounting for the control measures, health service resources, and other local features. Utilizing spatiotemporal analysis, our proposed model enhances the dynamics of the epidemiological mechanism and dissects the spatiotemporal structure of the spreading disease. To assess the uncertainty associated with the prediction, we develop a projection band based on the envelope of the bootstrap forecast paths. The performance of the proposed method is evaluated by a simulation study. We apply the proposed method to model and forecast the spread of COVID-19 at both county and state levels in the United States.

...read moreread less

Journal Article•DOI•

Local Indicator of Colocation Quotient with a Statistical Significance Test: Examining Spatial Association of Crime and Facilities

[...]

Fahui Wang¹, Yujie Hu¹, Shuai Wang², Xiaojuan Li²•Institutions (2)

Louisiana State University¹, Capital Normal University²

30 May 2020-arXiv: Applications

TL;DR: Li et al. as mentioned in this paper developed a simulation-based statistic test for the local indicator of colocation quotient (LCLQ) and applied the indicator to examine the association of land use facilities with crime patterns.

...read moreread less

Abstract: Most existing point-based colocation methods are global measures (e.g., join count statistic, cross K function, and global colocation quotient). Most recently, a local indicator such as the local colocation quotient is proposed to capture the variability of colocation across areas. Our research advances this line of work by developing a simulation-based statistic test for the local indicator of colocation quotient (LCLQ). The study applies the indicator to examine the association of land use facilities with crime patterns. Moreover, we use the street network distance in addition to the traditional Euclidean distance in defining neighbors since human activities (including facilities and crimes) usually occur along a street network. The method is applied to analyze the colocation of three types of crimes and three categories of facilities in a city in Jiangsu Province, China. The findings demonstrate the value of the proposed method in colocation analysis of crime and facilities, and in general colocation analysis of point data.

...read moreread less

Journal Article•DOI•

Short-term CO2 emissions forecasting based on decomposition approaches and its impact on electricity market scheduling.

[...]

Neeraj Dhanraj Bokde¹, Bo Tranberg¹, Gorm Bruun Andresen¹•Institutions (1)

Aarhus University¹

24 Mar 2020-arXiv: Applications

TL;DR: In this article, two time series decomposition methods are developed for short-term forecasting of the CO2 emissions of electricity, which are in turn benchmarked against a set of state-of-the-art models.

...read moreread less

Abstract: The world is facing major challenges related to global warming and emissions of greenhouse gases is a major causing factor. In 2017, energy industries accounted for 46% of all CO2 emissions globally, which shows a large potential for reduction. This paper proposes a novel short-term CO2 emissions forecast to enable intelligent scheduling of flexible electricity consumption to minimize the resulting CO2 emissions. Two proposed time series decomposition methods are developed for short-term forecasting of the CO2 emissions of electricity. These are in turn bench-marked against a set of state-of-the-art models. The result is a new forecasting method with a 48-hour horizon targeted the day-ahead electricity market. Forecasting benchmarks for France show that the new method has a mean absolute percentage error that is 25% lower than the best performing state-of-the-art model. Further, application of the forecast for scheduling flexible electricity consumption is studied for five European countries. Scheduling a flexible block of 4 hours of electricity consumption in a 24 hour interval can on average reduce the resulting CO2 emissions by 25% in France, 17% in Germany, 69% in Norway, 20% in Denmark, and just 3% in Poland when compared to consuming at random intervals during the day.

...read moreread less

Journal Article•DOI•

Combining social media and survey data to nowcast migrant stocks in the United States

[...]

Monica Alexander¹, Kivan Polimis², Emilio Zagheni³•Institutions (3)

University of Toronto¹, University of Washington², Max Planck Society³

05 Mar 2020-arXiv: Applications

TL;DR: In this paper, a statistical framework is proposed to combine social media data with traditional survey data to produce timely ''nowcasts'' of migrant stocks by state in the United States, and the model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach to account for correlations across age, time and space.

...read moreread less

Abstract: Measuring and forecasting migration patterns, and how they change over time, has important implications for understanding broader population trends, for designing policy effectively and for allocating resources. However, data on migration and mobility are often lacking, and those that do exist are not available in a timely manner. Social media data offer new opportunities to provide more up-to-date demographic estimates and to complement more traditional data sources. Facebook, for example, can be thought of as a large digital census that is regularly updated. However, its users are not representative of the underlying population. This paper proposes a statistical framework to combine social media data with traditional survey data to produce timely `nowcasts' of migrant stocks by state in the United States. The model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach, to account for correlations across age, time and space. We illustrate the results for migrants from Mexico, India and Germany, and show that the model outperforms alternatives that rely solely on either social media or survey data.

...read moreread less

Journal Article•DOI•

Comparison of ARIMA, ETS, NNAR and hybrid models to forecast the second wave of COVID-19 hospitalizations in Italy

[...]

Gaetano Perone¹•Institutions (1)

University of Bergamo¹

22 Oct 2020-arXiv: Applications

TL;DR: The results show that the hybrid models, except for ARIMA-ETS, are better at capturing the linear and non-linear epidemic patterns, by outperforming the respective single models; and the number of COVID-19-related hospitalized with mild symptoms and in ICU will rapidly increase in the next weeks.

...read moreread less

Abstract: Coronavirus disease (COVID-19) is a severe ongoing novel pandemic that has emerged in Wuhan, China, in December 2019. As of October 13, the outbreak has spread rapidly across the world, affecting over 38 million people, and causing over 1 million deaths. In this article, I analysed several time series forecasting methods to predict the spread of COVID-19 second wave in Italy, over the period after October 13, 2020. I used an autoregressive model (ARIMA), an exponential smoothing state space model (ETS), a neural network autoregression model (NNAR), and the following hybrid combinations of them: ARIMA-ETS, ARIMA-NNAR, ETS-NNAR, and ARIMA-ETS-NNAR. About the data, I forecasted the number of patients hospitalized with mild symptoms, and in intensive care units (ICU). The data refer to the period February 21, 2020-October 13, 2020 and are extracted from the website of the Italian Ministry of Health (this http URL). The results show that i) the hybrid models, except for ARIMA-ETS, are better at capturing the linear and non-linear epidemic patterns, by outperforming the respective single models; and ii) the number of COVID-19-related hospitalized with mild symptoms and in ICU will rapidly increase in the next weeks, by reaching the peak in about 50-60 days, i.e. in mid-December 2020, at least. To tackle the upcoming COVID-19 second wave, on one hand, it is necessary to hire healthcare workers and implement sufficient hospital facilities, protective equipment, and ordinary and intensive care beds; and on the other hand, it may be useful to enhance social distancing by improving public transport and adopting the double-shifts schooling system, for example.

...read moreread less

Posted Content•

Adaptive Methods for Short-Term Electricity Load Forecasting During COVID-19 Lockdown in France

[...]

David Obst¹, Joseph de Vilmarest¹, Yannig Goude¹•Institutions (1)

Électricité de France¹

14 Sep 2020-arXiv: Applications

TL;DR: In this article, adaptive generalized additive models using Kalman filters and fine-tuning to adjust to new electricity consumption patterns were introduced to forecast the electricity demand during the French lockdown period, where they demonstrate their ability to significantly reduce prediction errors compared to traditional models.

...read moreread less

Abstract: The coronavirus disease 2019 (COVID-19) pandemic has urged many governments in the world to enforce a strict lockdown where all nonessential businesses are closed and citizens are ordered to stay at home. One of the consequences of this policy is a significant change in electricity consumption patterns. Since load forecasting models rely on calendar or meteorological information and are trained on historical data, they fail to capture the significant break caused by the lockdown and have exhibited poor performances since the beginning of the pandemic. This makes the scheduling of the electricity production challenging, and has a high cost for both electricity producers and grid operators. In this paper we introduce adaptive generalized additive models using Kalman filters and fine-tuning to adjust to new electricity consumption patterns. Additionally, knowledge from the lockdown in Italy is transferred to anticipate the change of behavior in France. The proposed methods are applied to forecast the electricity demand during the French lockdown period, where they demonstrate their ability to significantly reduce prediction errors compared to traditional models. Finally expert aggregation is used to leverage the specificities of each predictions and enhance results even further.

...read moreread less

Proceedings Article•DOI•

Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy

[...]

Amanda Coston¹, Neel Guha², Derek Ouyang², Lisa Lu², Alexandra Chouldechova¹, Daniel E. Ho² - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, Stanford University²

14 Nov 2020-arXiv: Applications

TL;DR: It is illustrated how linking large-scale administrative data can enable auditing mobility data for bias in the absence of demographic information and ground truth labels, and it is shown that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.

...read moreread less

Abstract: Anonymized smartphone-based mobility data has been widely adopted in devising and evaluating COVID-19 response strategies such as the targeting of public health resources. Yet little attention has been paid to measurement validity and demographic bias, due in part to the lack of documentation about which users are represented as well as the challenge of obtaining ground truth data on unique visits and demographics. We illustrate how linking large-scale administrative data can enable auditing mobility data for bias in the absence of demographic information and ground truth labels. More precisely, we show that linking voter roll data -- containing individual-level voter turnout for specific voting locations along with race and age -- can facilitate the construction of rigorous bias and reliability tests. These tests illuminate a sampling bias that is particularly noteworthy in the pandemic context: older and non-white voters are less likely to be captured by mobility data. We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.

...read moreread less

Journal Article•DOI•

Real-time Prediction of COVID-19 related Mortality using Electronic Health Records

[...]

Patrick Schwab¹, Arash Mehrjou², Sonali Parbhoo³, Leo Anthony Celi⁴, Jürgen Hetzel⁵, Markus Hofer⁶, Bernhard Schölkopf⁷, Bernhard Schölkopf², Stefan Bauer², Stefan Bauer⁸ - Show less +6 more•Institutions (8)

Hoffmann-La Roche¹, Max Planck Society², Harvard University³, Beth Israel Deaconess Medical Center⁴, University of Tübingen⁵, Winterthur Museum, Garden and Library⁶, ETH Zurich⁷, Canadian Institute for Advanced Research⁸

31 Aug 2020-arXiv: Applications

TL;DR: The CovEWS early warning system (CovEWS), a risk scoring system for assessing COVID-19 related mortality risk that was developed using data amounting to a total of over 2863 years of observation time from a cohort of 66 430 patients seen at over 69 healthcare institutions, could enable earlier intervention, and may therefore help in preventing or mitigating COIDs related mortality.

...read moreread less

Abstract: Coronavirus Disease 2019 (COVID-19) is an emerging respiratory disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with rapid human-to-human transmission and a high case fatality rate particularly in older patients. Due to the exponential growth of infections, many healthcare systems across the world are under pressure to care for increasing amounts of at-risk patients. Given the high number of infected patients, identifying patients with the highest mortality risk early is critical to enable effective intervention and optimal prioritisation of care. Here, we present the COVID-19 Early Warning System (CovEWS), a clinical risk scoring system for assessing COVID-19 related mortality risk. CovEWS provides continuous real-time risk scores for individual patients with clinically meaningful predictive performance up to 192 hours (8 days) in advance, and is automatically derived from patients' electronic health records (EHRs) using machine learning. We trained and evaluated CovEWS using de-identified data from a cohort of 66430 COVID-19 positive patients seen at over 69 healthcare institutions in the United States (US), Australia, Malaysia and India amounting to an aggregated total of over 2863 years of patient observation time. On an external test cohort of 5005 patients, CovEWS predicts COVID-19 related mortality from $78.8\%$ ($95\%$ confidence interval [CI]: $76.0$, $84.7\%$) to $69.4\%$ ($95\%$ CI: $57.6, 75.2\%$) specificity at a sensitivity greater than $95\%$ between respectively 1 and 192 hours prior to observed mortality events - significantly outperforming existing generic and COVID-19 specific clinical risk scores. CovEWS could enable clinicians to intervene at an earlier stage, and may therefore help in preventing or mitigating COVID-19 related mortality.

...read moreread less

Journal Article•DOI•

Estimating heterogeneous survival treatment effect in observational data using machine learning

[...]

Liangyuan Hu¹, Jiayi Ji¹, Fan Li²•Institutions (2)

Icahn School of Medicine at Mount Sinai¹, Yale University²

17 Aug 2020-arXiv: Applications

TL;DR: The results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT-BART-NP) consistently yields the best performance, in terms of bias, precision, and expected regret.

...read moreread less

Abstract: Methods for estimating heterogeneous treatment effect in observational data have largely focused on continuous or binary outcomes, and have been relatively less vetted with survival outcomes. Using flexible machine learning methods in the counterfactual framework is a promising approach to address challenges due to complex individual characteristics, to which treatments need to be tailored. To evaluate the operating characteristics of recent survival machine learning methods for the estimation of treatment effect heterogeneity and inform better practice, we carry out a comprehensive simulation study presenting a wide range of settings describing confounded heterogeneous survival treatment effects and varying degrees of covariate overlap. Our results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT-BART-NP) consistently yields the best performance, in terms of bias, precision and expected regret. Moreover, the credible interval estimators from AFT-BART-NP provide close to nominal frequentist coverage for the individual survival treatment effect when the covariate overlap is at least moderate. Including a non-parametrically estimated propensity score as an additional fixed covariate in the AFT-BART-NP model formulation can further improve its efficiency and frequentist coverage. Finally, we demonstrate the application of flexible causal machine learning estimators through a comprehensive case study examining the heterogeneous survival effects of two radiotherapy approaches for localized high-risk prostate cancer.

...read moreread less

Posted Content•

Additive stacking for disaggregate electricity demand forecasting

[...]

Christian Capezza¹, Biagio Palumbo¹, Yannig Goude², Simon N. Wood³, Matteo Fasiolo⁴ - Show less +1 more•Institutions (4)

University of Naples Federico II¹, Électricité de France², University of Edinburgh³, University of Bristol⁴

20 May 2020-arXiv: Applications

TL;DR: This work proposes a new ensemble method for probabilistic forecasting, which borrows strength across the households while accommodating their individual idiosyncrasies, and is an extension of regression stacking (Breiman, 1996) where the mixture weights are modelled using linear combinations of parametric, smooth or random effects.

...read moreread less

Abstract: Future grid management systems will coordinate distributed production and storage resources to manage, in a cost effective fashion, the increased load and variability brought by the electrification of transportation and by a higher share of weather dependent production. Electricity demand forecasts at a low level of aggregation will be key inputs for such systems. We focus on forecasting demand at the individual household level, which is more challenging than forecasting aggregate demand, due to the lower signal-to-noise ratio and to the heterogeneity of consumption patterns across households. We propose a new ensemble method for probabilistic forecasting, which borrows strength across the households while accommodating their individual idiosyncrasies. In particular, we develop a set of models or 'experts' which capture different demand dynamics and we fit each of them to the data from each household. Then we construct an aggregation of experts where the ensemble weights are estimated on the whole data set, the main innovation being that we let the weights vary with the covariates by adopting an additive model structure. In particular, the proposed aggregation method is an extension of regression stacking (Breiman, 1996) where the mixture weights are modelled using linear combinations of parametric, smooth or random effects. The methods for building and fitting additive stacking models are implemented by the gamFactory R package, available at this https URL.

...read moreread less

Journal Article•DOI•

Simpson's paradox in Covid-19 case fatality rates: a mediation analysis of age-related causal effects

[...]

Julius von Kügelgen¹, Luigi Gresele¹, Bernhard Schölkopf¹•Institutions (1)

Max Planck Society¹

14 May 2020-arXiv: Applications

TL;DR: Causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes; help educate the public and decision-makers alike; avoid unsound comparisons; and answer a range of causal questions pertaining to the pandemic, subject to transparently stated assumptions.

...read moreread less

Abstract: We point out an instantiation of Simpson's paradox in Covid-19 case fatality rates (CFRs): comparing data of 44,672 cases from China with early reports from Italy (9th March), we find that CFRs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and mortality. As a case study, we then investigate total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age, and thus facilitates a more transparent comparison of CFRs across countries throughout the evolution of the Covid-19 pandemic.

...read moreread less

Posted Content•

An ARIMA model to forecast the spread and the final size of COVID-2019 epidemic in Italy

[...]

Gaetano Perone

01 Apr 2020-arXiv: Applications

TL;DR: In this article, an autoregressive integrated moving average (ARIMA) model is used to forecast the epidemic trend over the period after April 4, 2020, by using the Italian epidemiological data at national and regional level.

...read moreread less

Abstract: Coronavirus disease (COVID-2019) is a severe ongoing novel pandemic that is spreading quickly across the world. Italy, that is widely considered one of the main epicenters of the pandemic, has registered the highest COVID-2019 death rates and death toll in the world, to the present day. In this article I estimate an autoregressive integrated moving average (ARIMA) model to forecast the epidemic trend over the period after April 4, 2020, by using the Italian epidemiological data at national and regional level. The data refer to the number of daily confirmed cases officially registered by the Italian Ministry of Health (this http URL) for the period February 20 to April 4, 2020. The main advantage of this model is that it is easy to manage and fit. Moreover, it may give a first understanding of the basic trends, by suggesting the hypothetic epidemic's inflection point and final size.

...read moreread less

Posted Content•

A novel discrete grey seasonal model and its applications

[...]

Weijie Zhou, Jiao Pan, Song Ding, Xiaoli Wu

25 Mar 2020-arXiv: Applications

TL;DR: The proposed novel discrete grey seasonal model, abbreviated as , is put forward by incorporating the seasonal dummy variables into the conventional model and significantly outperforms the other benchmark models in terms of several error criteria.

...read moreread less

Abstract: In order to accurately describe real systems with seasonal disturbances, which normally appear monthly or quarterly cycles, a novel discrete grey seasonal model, abbreviated as , is put forward by incorporating the seasonal dummy variables into the conventional model. Moreover, the mechanism and properties of this proposed model are discussed in depth, revealing the inherent differences from the existing seasonal grey models. For validation and explanation purposes, the proposed model is implemented to describe three actual cases with monthly and quarterly seasonal fluctuations (quarterly wind power production, quarterly PM10, and monthly natural gas consumption), in comparison with five competing models involving grey prediction models , conventional econometric technology , and artificial intelligences . Experimental results from the cases consistently demonstrated that the proposed model significantly outperforms the other benchmark models in terms of several error criteria. Moreover, further discussions about the influences of different sequence lengths on the forecasting performance reveal that the proposed model still performs the best with strong robustness and high reliability in addressing seasonal sequences. In general, the new model is validated to be a powerful and promising methodology for handling sequences with seasonal fluctuations.

...read moreread less

Posted Content•

On the Interplay of Regional Mobility, Social Connectedness, and the Spread of COVID-19 in Germany

[...]

Cornelius Fritz¹, Göran Kauermann¹•Institutions (1)

Ludwig Maximilian University of Munich¹

07 Aug 2020-arXiv: Applications

TL;DR: The results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns, and show spatial infection patterns based on geographical as well as social distances.

...read moreread less

Abstract: Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between March 3rd and June 22nd, 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographic as well as social distances.

...read moreread less

Posted Content•

Crash Themes in Automated Vehicles: A Topic Modeling Analysis of the California Department of Motor Vehicles Automated Vehicle Crash Database

[...]

Hananeh Alambeigi¹, Anthony D. McDonald¹, Srinivas R. Tankasala¹•Institutions (1)

Texas A&M University¹

29 Jan 2020-arXiv: Applications

TL;DR: Future empirical work should focus on driver-initiated transitions, overtakes, silent failures, complex traffic situations, and adverse driving environments, to identify safety concerns and gaps between crash types and current areas of focus in the current research.

...read moreread less

Abstract: Automated vehicle technology promises to reduce the societal impact of traffic crashes. Early investigations of this technology suggest that significant safety issues remain during control transfers between the automation and human drivers and automation interactions with the transportation system. In order to address these issues, it is critical to understand both the behavior of human drivers during these events and the environments where they occur. This article analyzes automated vehicle crash narratives from the California Department of Motor Vehicles automated vehicle crash database to identify safety concerns and gaps between crash types and current areas of focus in the current research. The database was analyzed using probabilistic topic modeling of open-ended crash narratives. Topic modeling analysis identified five themes in the database: driver-initiated transition crashes, sideswipe crashes during left-side overtakes, and rear-end collisions while the vehicle was stopped at an intersection, in a turn lane, and when the crash involved oncoming traffic. Many crashes represented by the driver-initiated transitions topic were also associated with the side-swipe collisions. A substantial portion of the side-swipe collisions also involved motorcycles. These findings highlight previously raised safety concerns with transitions of control and interactions between vehicles in automated mode and the transportation social network. In response to these findings, future empirical work should focus on driver-initiated transitions, overtakes, silent failures, complex traffic situations, and adverse driving environments. Beyond this future work, the topic modeling analysis method may be used as a tool to monitor emergent safety issues.

...read moreread less

Posted Content•

Conservative two-stage group testing.

[...]

Matthew Aldridge¹•Institutions (1)

University of Leeds¹

06 May 2020-arXiv: Applications

TL;DR: This work studies various nonadaptive test designs for the first stage of two-stage group testing, and derives a new lower bound for the total number of tests required, finding that a first-stage design with constant tests per item and constant items per test is extremely close to optimal.

...read moreread less

Abstract: Inspired by applications in testing for COVID-19, we consider a variant of two-stage group testing we call "conservative" two-stage testing, where every item declared to be defective must be definitively confirmed by being tested by itself in the second stage. We study this in the linear regime where the prevalence is fixed while the number of items is large. We study various nonadaptive test designs for the first stage, and derive a new lower bound for the total number of tests required. We find that a first-stage design with constant tests per item and constant items per test due to Broder and Kumar (arXiv:2004.01684) is extremely close to optimal. Simulations back up the theoretical results.

...read moreread less

Collapse