Showing papers in "arXiv: Applications in 2017"

PDF

Open Access

Posted Content•

A large-scale analysis of racial disparities in police stops across the United States

[...]

Emma Pierson, Camelia Simoiu, Jan Overgoor, Sam Corbett-Davies, Vignesh Ramachandran, Cheryl Phillips, Sharad Goel - Show less +3 more

18 Jun 2017-arXiv: Applications

TL;DR: It is found that black drivers were less likely to be stopped after sunset, when a ‘veil of darkness’ masks one’s race, suggesting bias in stop decisions and evidence that the bar for searching black and Hispanic drivers was lower than that for searching white drivers.

...read moreread less

Abstract: To assess racial disparities in police interactions with the public, we compiled and analyzed a dataset detailing over 60 million state patrol stops conducted in 20 U.S. states between 2011 and 2015. We find that black drivers are stopped more often than white drivers relative to their share of the driving-age population, but that Hispanic drivers are stopped less often than whites. Among stopped drivers -- and after controlling for age, gender, time, and location -- blacks and Hispanics are more likely to be ticketed, searched, and arrested than white drivers. These disparities may reflect differences in driving behavior, and are not necessarily the result of bias. In the case of search decisions, we explicitly test for discrimination by examining both the rate at which drivers are searched and the likelihood searches turn up contraband. We find evidence that the bar for searching black and Hispanic drivers is lower than for searching whites. Finally, we find that legalizing recreational marijuana in Washington and Colorado reduced the total number of searches and misdemeanors for all race groups, though a race gap still persists. We conclude by offering recommendations for improving data collection, analysis, and reporting by law enforcement agencies.

...read moreread less

180 citations

Journal Article•DOI•

Trends in European flood risk over the past 150 years

[...]

Dominik Paprotny, Antonia Sebastian, Oswaldo Morales-Nápoles, Sebastiaan N. Jonkman

30 Oct 2017-arXiv: Applications

TL;DR: It is shown that since 1870 there has been an increase in area inundated by floods in Europe, but a reduction in fatalities and economic losses, although caution that smaller floods remain underreported.

...read moreread less

Abstract: Flood risk changes in time and is influenced by both natural and socio-economic trends and interactions. In Europe, previous studies of historical flood losses corrected for demographic and economic growth ("normalized") have been limited in temporal and spatial extent, leading to an incomplete representation in trends of losses over time. In this study we utilize a gridded reconstruction of flood exposure in 37 European countries and a new database of damaging floods since 1870. Our results indicate that since 1870 there has been an increase in annually inundated area and number of persons affected, contrasted by a substantial decrease in flood fatalities, after correcting for change in flood exposure. For more recent decades we also found a considerable decline in financial losses per year. We estimate, however, that there is large underreporting of smaller floods beyond most recent years, and show that underreporting has a substantial impact on observed trends.

...read moreread less

154 citations

Posted Content•

Why So Many Published Sensitivity Analyses Are False. A Systematic Review of Sensitivity Analysis Practices

[...]

Andrea Saltelli¹, Andrea Saltelli², Ksenia Aleksankina³, William E. Becker, P Fennell⁴, Federico Ferretti, Niels Holst⁵, Sushan Li⁶, Qiongli Wu⁷ - Show less +5 more•Institutions (7)

University of Bergen¹, Open University of Catalonia², University of Edinburgh³, University College London⁴, Aarhus University⁵, Technische Universität Darmstadt⁶, Chinese Academy of Sciences⁷

30 Nov 2017-arXiv: Applications

TL;DR: Sensitivity analysis has much to offer for a very large class of applications, such as model selection, calibration, optimization, quality assurance and many others as discussed by the authors, but the use of sensitivity analysis has been criticised.

...read moreread less

Abstract: Sensitivity analysis (SA) has much to offer for a very large class of applications, such as model selection, calibration, optimization, quality assurance and many others. Sensitivity analysis offers crucial contextual information regarding a prediction by answering the question "Which uncertain input factors are responsible for the uncertainty in the prediction?" SA is distinct from uncertainty analysis (UA), which instead addresses the question "How uncertain is the prediction?" As we discuss in the present paper much confusion exists in the use of these terms. A proper uncertainty analysis of the output of a mathematical model needs to map what the model does when the input factors are left free to vary over their range of existence. A fortiori, this is true of a sensitivity analysis. Despite this, most UA and SA still explore the input space; moving along mono-dimensional corridors which leave the space of variation of the input factors mostly unscathed. We use results from a bibliometric analysis to show that many published SA fail the elementary requirement to properly explore the space of the input factors. The results, while discipline-dependent, point to a worrying lack of standards and of recognized good practices. The misuse of sensitivity analysis in mathematical modelling is at least as serious as the misuse of the p-test in statistical modelling. Mature methods have existed for about two decades to produce a defensible sensitivity analysis. We end by offering a rough guide for proper use of the methods.

...read moreread less

122 citations

Posted Content•

An algorithm for removing sensitive information: application to race-independent recidivism prediction

[...]

James E. Johndrow, Kristian Lum

15 Mar 2017-arXiv: Applications

TL;DR: This article proposed a method to eliminate bias from predictive models by removing all information regarding protected variables from the data to which the models will ultimately be trained, and applied their proposed method to a dataset on the criminal histories of individuals at the time of sentencing to produce "race-neutral" predictions of re-arrest.

...read moreread less

Abstract: Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing or augmenting human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions they produce will be equally free from prejudice. There is growing recognition that employing algorithms does not remove the potential for bias, and can even amplify it if the training data were generated by a process that is itself biased. In this paper, we provide a probabilistic notion of algorithmic bias. We propose a method to eliminate bias from predictive models by removing all information regarding protected variables from the data to which the models will ultimately be trained. Unlike previous work in this area, our framework is general enough to accommodate data on any measurement scale. Motivated by models currently in use in the criminal justice system that inform decisions on pre-trial release and parole, we apply our proposed method to a dataset on the criminal histories of individuals at the time of sentencing to produce "race-neutral" predictions of re-arrest. In the process, we demonstrate that a common approach to creating "race-neutral" models-- omitting race as a covariate-- still results in racially disparate predictions. We then demonstrate that the application of our proposed method to these data removes racial disparities from predictions with minimal impact on predictive accuracy.

...read moreread less

82 citations

Posted Content•

Simple rules for complex decisions

[...]

Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, Daniel G. Goldstein - Show less +1 more

15 Feb 2017-arXiv: Applications

TL;DR: A new method-select-regress-and-round-for constructing simple rules that perform well for complex decisions, which significantly outperform judges and are on par with decisions derived from random forests trained on all available features.

...read moreread less

Abstract: From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do.

...read moreread less

76 citations

Book Chapter•DOI•

The Five Factor Model of personality and evaluation of drug consumption risk

[...]

Elaine Fehrman, Awaz K. Muhammad¹, Evgeny M. Mirkes¹, Vincent Egan², Alexander N. Gorban¹ - Show less +1 more•Institutions (2)

University of Leicester¹, University of Nottingham²

05 Jul 2017-arXiv: Applications

TL;DR: In this paper, an online survey methodology was employed to collect data including personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information.

...read moreread less

Abstract: The problem of evaluating an individual’s risk of drug consumption and misuse is highly important and novel. An online survey methodology was employed to collect data including personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis using a relative information gain model demonstrates the existence of a group of drugs (amphetamines, cannabis, cocaine, ecstasy, legal highs, LSD, and magic mushrooms) with strongly correlated consumption. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug. A number of classification methods were employed (decision tree, random forest, k-nearest neighbours, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression, and naive Bayes) and the most effective method selected for each drug. The quality of classification was surprisingly high. The best results with sensitivity and specificity being greater than 75% were achieved for cannabis, crack, ecstasy, legal highs, LSD, and volatile substance abuse. Sensitivity and specificity greater than 70% were achieved for amphetamines, amyl nitrite, benzodiazepines, chocolate, caffeine, heroin, ketamine, methadone, and nicotine. The poorest result was obtained for prediction of alcohol consumption.

...read moreread less

71 citations

Posted Content•

Point process-based modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster

[...]

Luigi Lombardo¹, Thomas Opitz², Raphaël Huser¹•Institutions (2)

King Abdullah University of Science and Technology¹, Institut national de la recherche agronomique²

10 Aug 2017-arXiv: Applications

TL;DR: A stochastic modeling approach based on spatial point processes of log-Gaussian Cox type for a collection of around 5000 landslide events provoked by a precipitation trigger in Sicily, Italy, featuring a spatial latent effect defined at the slope unit level, allowing us to assess the spatial influence that remains unexplained by the covariates in the model.

...read moreread less

Abstract: We develop a stochastic modeling approach based on spatial point processes of log-Gaussian Cox type for a collection of around 5000 landslide events provoked by a precipitation trigger in Sicily, Italy. Through the embedding into a hierarchical Bayesian estimation framework, we can use the Integrated Nested Laplace Approximation methodology to make inference and obtain the posterior estimates. Several mapping units are useful to partition a given study area in landslide prediction studies. These units hierarchically subdivide the geographic space from the highest grid-based resolution to the stronger morphodynamic-oriented slope units. Here we integrate both mapping units into a single hierarchical model, by treating the landslide triggering locations as a random point pattern. This approach diverges fundamentally from the unanimously used presence-absence structure for areal units since we focus on modeling the expected landslide count jointly within the two mapping units. Predicting this landslide intensity provides more detailed and complete information as compared to the classically used susceptibility mapping approach based on relative probabilities. To illustrate the model's versatility, we compute absolute probability maps of landslide occurrences and check its predictive power over space. While the landslide community typically produces spatial predictive models for landslides only in the sense that covariates are spatially distributed, no actual spatial dependence has been explicitly integrated so far for landslide susceptibility. Our novel approach features a spatial latent effect defined at the slope unit level, allowing us to assess the spatial influence that remains unexplained by the covariates in the model.

...read moreread less

61 citations

Posted Content•

A practical guide and software for analysing pairwise comparison experiments.

[...]

María Pérez-Ortiz, Rafal Mantiuk

11 Dec 2017-arXiv: Applications

TL;DR: This paper improves on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low.

...read moreread less

Abstract: Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.

...read moreread less

60 citations

Posted Content•

Skill of global raw and postprocessed ensemble predictions of rainfall over northern tropical Africa

[...]

Peter Vogel¹, Peter Knippertz¹, Andreas H. Fink¹, Andreas Schlueter¹, Tilmann Gneiting² - Show less +1 more•Institutions (2)

Karlsruhe Institute of Technology¹, Heidelberg Institute for Theoretical Studies²

15 Aug 2017-arXiv: Applications

TL;DR: In this paper, the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa.

...read moreread less

Abstract: Accumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, we analyze the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, we apply state-of-the-art statistical postprocessing methods in form of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS), and verify against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated, unreliable, and underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. Differences between raw ensemble and climatological forecasts are large, and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles, but - somewhat disappointingly - typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007-2014, but overall have little added value compared to climatology. We suspect that the parametrization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.

...read moreread less

54 citations

Journal Article•

Evaluating Partisan Gerrymandering in Wisconsin

[...]

Robert Ravier, Jonathan C. Mattingly, Gregory Herschlag

07 Sep 2017-arXiv: Applications

TL;DR: This paper examined the extent of gerrymandering for the 2010 General Assembly district map of Wisconsin and found that there is substantial variability in the election outcome depending on what maps are used.

...read moreread less

Abstract: We examine the extent of gerrymandering for the 2010 General Assembly district map of Wisconsin. We find that there is substantial variability in the election outcome depending on what maps are used. We also found robust evidence that the district maps are highly gerrymandered and that this gerrymandering likely altered the partisan make up of the Wisconsin General Assembly in some elections. Compared to the distribution of possible redistricting plans for the General Assembly, Wisconsin's chosen plan is an outlier in that it yields results that are highly skewed to the Republicans when the statewide proportion of Democratic votes comprises more than 50-52% of the overall vote (with the precise threshold depending on the election considered). Wisconsin's plan acts to preserve the Republican majority by providing extra Republican seats even when the Democratic vote increases into the range when the balance of power would shift for the vast majority of redistricting plans.

...read moreread less

54 citations

Journal Article•DOI•

Predictive modelling of training loads and injury in Australian football

[...]

David Carey¹, Kok-Leong Ong¹, Rod Whiteley², Kay M. Crossley¹, Justin Crow¹, Meg E. Morris¹ - Show less +2 more•Institutions (2)

La Trobe University¹, Qatar Airways²

14 Jun 2017-arXiv: Applications

TL;DR: Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting limited application as a daily decision tool for practitioners.

...read moreread less

Abstract: To investigate whether training load monitoring data could be used to predict injuries in elite Australian football players, data were collected from elite athletes over 3 seasons at an Australian football club. Loads were quantified using GPS devices, accelerometers and player perceived exertion ratings. Absolute and relative training load metrics were calculated for each player each day (rolling average, exponentially weighted moving average, acute:chronic workload ratio, monotony and strain). Injury prediction models (regularised logistic regression, generalised estimating equations, random forests and support vector machines) were built for non-contact, non-contact time-loss and hamstring specific injuries using the first two seasons of data. Injury predictions were generated for the third season and evaluated using the area under the receiver operator characteristic (AUC). Predictive performance was only marginally better than chance for models of non-contact and non-contact time-loss injuries (AUC$<$0.65). The best performing model was a multivariate logistic regression for hamstring injuries (best AUC=0.76). Learning curves suggested logistic regression was underfitting the load-injury relationship and that using a more complex model or increasing the amount of model building data may lead to future improvements. Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting they are limited as a daily decision tool for practitioners. Focusing the modelling approach on specific injury types and increasing the amount of training data may lead to the development of improved predictive models for injury prevention.

...read moreread less

Posted Content•

Dynamic Bayesian Influenza Forecasting in the United States with Hierarchical Discrepancy

[...]

Dave Osthus, James R. Gattiker, Reid Priedhorsky, Sara Y. Del Valle

30 Aug 2017-arXiv: Applications

TL;DR: A dynamic Bayesian (DB) flu forecasting model that exploits model discrepancy through a hierarchical model is proposed in this article, which allows forecasts of partially observed flu seasons to borrow discrepancy information from previously observed flu season, and outperforms all models that competed in the CDC's 2015--2016 flu forecasting challenge.

...read moreread less

Abstract: Timely and accurate forecasts of seasonal influenza would assist public health decision-makers in planning intervention strategies, efficiently allocating resources, and possibly saving lives. For these reasons, influenza forecasts are consequential. Producing timely and accurate influenza forecasts, however, have proven challenging due to noisy and limited data, an incomplete understanding of the disease transmission process, and the mismatch between the disease transmission process and the data-generating process. In this paper, we introduce a dynamic Bayesian (DB) flu forecasting model that exploits model discrepancy through a hierarchical model. The DB model allows forecasts of partially observed flu seasons to borrow discrepancy information from previously observed flu seasons. We compare the DB model to all models that competed in the CDC's 2015--2016 flu forecasting challenge. The DB model outperformed all models, indicating the DB model is a leading influenza forecasting model.

...read moreread less

Posted Content•

Likelihood Ratio as Weight of Forensic Evidence: A Closer Look

[...]

Steven P. Lund, Hari Iyer

26 Apr 2017-arXiv: Applications

TL;DR: In this article, the authors argue that decision theory does not exempt the presentation of a likelihood ratio from uncertainty characterization, which is required to assess the fitness for purpose of any transferred quantity.

...read moreread less

Abstract: The forensic science community has increasingly sought quantitative methods for conveying the weight of evidence. Experts from many forensic laboratories summarize their findings in terms of a likelihood ratio. Several proponents of this approach have argued that Bayesian reasoning proves it to be normative. We find this likelihood ratio paradigm to be unsupported by arguments of Bayesian decision theory, which applies only to personal decision making and not to the transfer of information from an expert to a separate decision maker. We further argue that decision theory does not exempt the presentation of a likelihood ratio from uncertainty characterization, which is required to assess the fitness for purpose of any transferred quantity. We propose the concept of a lattice of assumptions leading to an uncertainty pyramid as a framework for assessing the uncertainty in an evaluation of a likelihood ratio. We demonstrate the use of these concepts with illustrative examples regarding the refractive index of glass and automated comparison scores for fingerprints.

...read moreread less

Journal Article•DOI•

Joint Smoothing, Tracking, and Forecasting Based on Continuous-Time Target Trajectory Fitting

[...]

Tiancheng Li, Huimin Chen, Shudong Sun, Juan M. Corchado

07 Aug 2017-arXiv: Applications

TL;DR: In this article, a continuous time state estimation framework is proposed for a class of targets subject to smooth motion processes, e.g., the target moves with nearly constant acceleration or affected by insignificant noises.

...read moreread less

Abstract: We present a continuous time state estimation framework that unifies traditionally individual tasks of smoothing, tracking, and forecasting (STF), for a class of targets subject to smooth motion processes, e.g., the target moves with nearly constant acceleration or affected by insignificant noises. Fundamentally different from the conventional Markov transition formulation, the state process is modeled by a continuous trajectory function of time (FoT) and the STF problem is formulated as an online data fitting problem with the goal of finding the trajectory FoT that best fits the observations in a sliding time-window. Then, the state of the target, whether the past (namely, smoothing), the current (filtering) or the near-future (forecasting), can be inferred from the FoT. Our framework releases stringent statistical modeling of the target motion in real time, and is applicable to a broad range of real world targets of significance such as passenger aircraft and ships which move on scheduled, (segmented) smooth paths but little statistical knowledge is given about their real time movement and even about the sensors. In addition, the proposed STF framework inherits the advantages of data fitting for accommodating arbitrary sensor revisit time, target maneuvering and missed detection. The proposed method is compared with state of the art estimators in scenarios of either maneuvering or non-maneuvering target.

...read moreread less

Posted Content•

Fairer and more accurate, but for whom?

[...]

Alexandra Chouldechova, Max G'Sell

30 Jun 2017-arXiv: Applications

TL;DR: A model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced, with a primary focus on identifying sub groups where the models differ in terms of fairness-related quantities such as racial or gender disparities is introduced.

...read moreread less

Abstract: Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of cases. In this paper we introduce a model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced. Our primary focus is on identifying subgroups where the models differ in terms of fairness-related quantities such as racial or gender disparities. We present experimental results from a recidivism prediction task and a hypothetical lending example.

...read moreread less

Posted Content•

Mixed Effects Models are Sometimes Terrible

[...]

Christopher Eager, Joseph Roy

05 Jan 2017-arXiv: Applications

TL;DR: The parsimonious convergence hypothesis (PCH) as mentioned in this paper has been used to explain the non-convergence of mixed-effects models when a known maximal effect structure is used to generate the data.

...read moreread less

Abstract: Mixed-effects models have emerged as the gold standard of statistical analysis in different sub-fields of linguistics (Baayen, Davidson & Bates, 2008; Johnson, 2009; Barr, et al, 2013; Gries, 2015). One problematic feature of these models is their failure to converge under maximal (or even near-maximal) random effects structures. The lack of convergence is relatively unaddressed in linguistics and when it is addressed has resulted in statistical practices (e.g. Jaeger, 2009; Gries, 2015; Bates, et al, 2015b) that are premised on the idea that non-convergence is an indication that a random effects structure is over-specified (or not parsimonious), the parsimonious convergence hypothesis (PCH). We test the PCH by running simulations in lme4 under two sets of assumptions for both a linear dependent variable and a binary dependent variable in order to assess the rate of non-convergence for both types of mixed effects models when a known maximal effect structure is used to generate the data (i.e. when non-convergence cannot be explained by random effects with zero variance). Under the PCH, lack of convergence is treated as evidence against a more maximal random effects structure, but that result is not upheld with our simulations. We provide an alternative model, fully specified Bayesian models implemented in rstan (Stan Development Team, 2016; Carpenter, et al, in press) that removed the convergence problems almost entirely in simulations of the same conditions. These results indicate that when there is known non-zero variance for all slopes and intercepts, under realistic distributions of data and with moderate to severe imbalance, mixed effects models in lme4 have moderate to high non-convergence rates which can cause linguistic researchers to wrongfully exclude random effect terms.

...read moreread less

Posted Content•

Generalised additive mixed models for dynamic analysis in linguistics: a practical introduction

[...]

Márton Sóskuthy

15 Mar 2017-arXiv: Applications

TL;DR: This article presented a hands-on introduction to generalized additive mixed models (GAMMs) in the context of linguistics with a particular focus on dynamic speech analysis (e.g. formant contours, pitch tracks, diachronic change).

...read moreread less

Abstract: This is a hands-on introduction to Generalised Additive Mixed Models (GAMMs) in the context of linguistics with a particular focus on dynamic speech analysis (e.g. formant contours, pitch tracks, diachronic change, etc.). The main goal is to explain some of the main ideas underlying GAMMs, and to provide a practical guide to frequentist significance testing using these models. The introduction covers a range of topics including basis functions, the smoothing penalty, random smooths, difference smooths, smooth interactions, model comparison and autocorrelation. It is divided into two parts. The first part looks at what GAMMs are, how they work and why/when we should use them. Although the reader can replicate some of the example analyses in this section, this is not essential. The second part is a tutorial introduction that illustrates the process of fitting and evaluating GAMMs in the R statistical software environment, and the reader is strongly encouraged to work through the examples on their own machine.

...read moreread less

Posted Content•

Robust Localization Using Range Measurements with Unknown and Bounded Errors

[...]

Xiufang Shi¹, Guoqiang Mao², Brian D. O. Anderson³, Zaiyue Yang¹, Jiming Chen¹ - Show less +1 more•Institutions (3)

Zhejiang University¹, University of Technology, Sydney², Hangzhou Dianzi University³

04 Jan 2017-arXiv: Applications

TL;DR: In this paper, the authors investigated a localization problem assuming unknown measurement error distribution except for a bound on the error, and formulated the localization problem as an optimization problem to minimize the worst-case estimation error, which is shown to be a nonconvex optimization problem.

...read moreread less

Abstract: Cooperative geolocation has attracted significant research interests in recent years. A large number of localization algorithms rely on the availability of statistical knowledge of measurement errors, which is often difficult to obtain in practice. Compared with the statistical knowledge of measurement errors, it can often be easier to obtain the measurement error bound. This work investigates a localization problem assuming unknown measurement error distribution except for a bound on the error. We first formulate this localization problem as an optimization problem to minimize the worst-case estimation error, which is shown to be a non-convex optimization problem. Then, relaxation is applied to transform it into a convex one. Furthermore, we propose a distributed algorithm to solve the problem, which will converge in a few iterations. Simulation results show that the proposed algorithms are more robust to large measurement errors than existing algorithms in the literature. Geometrical analysis providing additional insights is also provided.

...read moreread less

Journal Article•DOI•

Deep uncertainties in sea-level rise and storm surge projections: Implications for coastal flood risk management

[...]

P. Oddo¹, Ben Seiyon Lee¹, Gregory G. Garner², Vivek Srikrishnan¹, Patrick M. Reed³, Chris E. Forest¹, Klaus Keller⁴, Klaus Keller¹ - Show less +4 more•Institutions (4)

Pennsylvania State University¹, Princeton University², Cornell University³, Carnegie Mellon University⁴

24 May 2017-arXiv: Applications

TL;DR: In this paper, the authors implement and improve on a classic decision-analytical model (van Dantzig 1956) to capture trade-offs across conflicting stakeholder objectives, demonstrate the consequences of structural uncertainties in the sea-level rise and storm surge models, and identify the parametric uncertainties that most strongly influence each objective using global sensitivity analysis.

...read moreread less

Abstract: Sea-levels are rising in many areas around the world, posing risks to coastal communities and infrastructures. Strategies for managing these flood risks present decision challenges that require a combination of geophysical, economic, and infrastructure models. Previous studies have broken important new ground on the considerable tensions between the costs of upgrading infrastructure and the damages that could result from extreme flood events. However, many risk-based adaptation strategies remain silent on certain potentially important uncertainties, as well as the trade-offs between competing objectives. Here, we implement and improve on a classic decision-analytical model (van Dantzig 1956) to: (i) capture trade-offs across conflicting stakeholder objectives, (ii) demonstrate the consequences of structural uncertainties in the sea-level rise and storm surge models, and (iii) identify the parametric uncertainties that most strongly influence each objective using global sensitivity analysis. We find that the flood adaptation model produces potentially myopic solutions when formulated using traditional mean-centric decision theory. Moving from a single-objective problem formulation to one with multi-objective trade-offs dramatically expands the decision space, and highlights the need for compromise solutions to address stakeholder preferences. We find deep structural uncertainties that have large effects on the model outcome, with the storm surge parameters accounting for the greatest impacts. Global sensitivity analysis effectively identifies important parameter interactions that local methods overlook, and which could have critical implications for flood adaptation strategies.

...read moreread less

Posted Content•

Notes on Creating a Standardized Version of DVARS

[...]

Thomas E. Nichols

05 Apr 2017-arXiv: Applications

TL;DR: By constructing a sampling distribution for DVARS, this work can create a standardized version of DVARS that should be more similar across scanners and datasets.

...read moreread less

Abstract: By constructing a sampling distribution for DVARS we can create a standardized version of DVARS that should be more similar across scanners and datasets.

...read moreread less

Journal Article•

Redistricting: Drawing the Line

[...]

Sachet Bangia, Christy V. Graves, Gregory Herschlag, Han Sung Kang, Justin Luo, Jonathan C. Mattingly, Robert Ravier - Show less +3 more

12 Apr 2017-arXiv: Applications

TL;DR: It is found that the number of democratic and republican representatives elected varies drastically depending on how districts are drawn, and a plan produced by a bipartisan panel of retired judges is highly typical and representative.

...read moreread less

Abstract: We develop methods to evaluate whether a political districting accurately represents the will of the people. To explore and showcase our ideas, we concentrate on the congressional districts for the U.S. House of representatives and use the state of North Carolina and its redistrictings since the 2010 census. Using a Monte Carlo algorithm, we randomly generate over 24,000 redistrictings that are non-partisan and adhere to criteria from proposed legislation. Applying historical voting data to these random redistrictings, we find that the number of democratic and republican representatives elected varies drastically depending on how districts are drawn. Some results are more common, and we gain a clear range of expected election outcomes. Using the statistics of our generated redistrictings, we critique the particular congressional districtings used in the 2012 and 2016 NC elections as well as a districting proposed by a bipartisan redistricting commission. We find that the 2012 and 2016 districtings are highly atypical and not representative of the will of the people. On the other hand, our results indicate that a plan produced by a bipartisan panel of retired judges is highly typical and representative. Since our analyses are based on an ensemble of reasonable redistrictings of North Carolina, they provide a baseline for a given election which incorporates the geometry of the state's population distribution.

...read moreread less

Journal Article•DOI•

Can Data Generated by Connected Vehicles Enhance Safety? A proactive approach to intersection safety management

[...]

Mohsen Kamrani, Behram Wali, Asad J. Khattak

03 Sep 2017-arXiv: Applications

TL;DR: In this paper, the authors developed a unique database that integrates intersection crash and inventory data with more than 65 million real-world Basic Safety Messages logged by 3,000 connected vehicles, providing a more complete picture of operations and safety performance of intersections.

...read moreread less

Abstract: Traditionally, evaluation of intersection safety has been largely reactive, based on historical crash frequency data. However, the emerging data from Connected and Automated Vehicles (CAVs) can complement historical data and help in proactively identify intersections which have high levels of variability in instantaneous driving behaviors prior to the occurrence of crashes. Based on data from Safety Pilot Model Deployment in Ann Arbor, Michigan, this study developed a unique database that integrates intersection crash and inventory data with more than 65 million real-world Basic Safety Messages logged by 3,000 connected vehicles, providing a more complete picture of operations and safety performance of intersections. As a proactive safety measure and a leading indicator of safety, this study introduces location-based volatility (LBV), which quantifies variability in instantaneous driving decisions at intersections. LBV represents the driving performance of connected vehicle drivers traveling through a specific intersection. As such, by using coefficient of variation as a standardized measure of relative dispersion, LBVs are calculated for 116 intersections in Ann Arbor. To quantify relationships between intersection-specific volatilities and crash frequencies, rigorous fixed- and random-parameter Poisson regression models are estimated. While controlling for exposure related factors, the results provide evidence of statistically significant (5% level) positive association between intersection-specific volatility and crash frequencies for signalized intersections. The implications of the findings for proactive intersection safety management are discussed in the paper.

...read moreread less

Posted Content•

Markov Models for Health Economic Evaluations: The R Package heemod

[...]

Antoine Filipovic-Pierucci, Kevin Zarca, Isabelle Durand-Zaleski

10 Feb 2017-arXiv: Applications

TL;DR: This paper developed an R package for Markov models implementing most of the modelling and reporting features described in reference textbooks and guidelines: deterministic and probabilistic sensitivity analysis, heterogeneity analysis, time dependency on state-time and model-time, etc.

...read moreread less

Abstract: Health economic evaluation studies are widely used in public health to assess health strategies in terms of their cost-effectiveness and inform public policies. We developed an R package for Markov models implementing most of the modelling and reporting features described in reference textbooks and guidelines: deterministic and probabilistic sensitivity analysis, heterogeneity analysis, time dependency on state-time and model-time (semi-Markov and non-homogeneous Markov models), etc. In this paper we illustrate the features of heemod by building and analysing an example Markov model. We then explain the design and the underlying implementation of the package.

...read moreread less

Posted Content•

A Bayesian General Linear Modeling Approach to Cortical Surface fMRI Data Analysis

[...]

Amanda F. Mejia, Yu Ryan Yue, David Bolin, Finn Lindren, Martin A. Lindquist - Show less +1 more

03 Jun 2017-arXiv: Applications

TL;DR: In this article, a Bayesian spatial model for cortical surface fMRI (cs-fMRI) is proposed, which employs a class of sophisticated spatial processes to flexibly model latent activation fields.

...read moreread less

Abstract: Cortical surface fMRI (cs-fMRI) has recently grown in popularity versus traditional volumetric fMRI, as it allows for more meaningful spatial smoothing and is more compatible with the common assumptions of isotropy and stationarity in Bayesian spatial models. However, as no Bayesian spatial model has been proposed for cs-fMRI data, most analyses continue to employ the classical, voxel-wise general linear model (GLM) (Worsley and Friston 1995). Here, we propose a Bayesian GLM for cs-fMRI, which employs a class of sophisticated spatial processes to flexibly model latent activation fields. We use integrated nested Laplacian approximation (INLA), a highly accurate and efficient Bayesian computation technique (Rue et al. 2009). To identify regions of activation, we propose an excursions set method based on the joint posterior distribution of the latent fields, which eliminates the need for multiple comparisons correction. Finally, we address a gap in the existing literature by proposing a novel Bayesian approach for multi-subject analysis. The methods are validated and compared to the classical GLM through simulation studies and a motor task fMRI study from the Human Connectome Project. The proposed Bayesian approach results in smoother activation estimates, more accurate false positive control, and increased power to detect truly active regions.

...read moreread less

Journal Article•DOI•

Sequential Discrete Kalman Filter for Real-Time State Estimation in Power Distribution Systems: Theory and Implementation

[...]

Andreas Martin Kettner¹, Mario Paolone¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

27 Feb 2017-arXiv: Applications

TL;DR: In this article, the authors demonstrate the feasibility of implementing real-time state estimators (RTSEs) for ADNs in Field-Programmable Gate Arrays (FPGAs) by presenting an operational prototype.

...read moreread less

Abstract: This paper demonstrates the feasibility of implementing Real-Time State Estimators (RTSEs) for Active Distribution Networks (ADNs) in Field-Programmable Gate Arrays (FPGAs) by presenting an operational prototype. The prototype is based on a Linear State Estimator (LSE) that uses synchrophasor measurements from Phasor Measurement Units (PMUs). The underlying algorithm is the Sequential Discrete Kalman Filter (SDKF), an equivalent formulation of the Discrete Kalman Filter (DKF) for the case of uncorrelated measurement noise. In this regard, this work formally proves the equivalence the SDKF and the DKF, and highlights the suitability of the SDKF for an FPGA implementation by means of a computational complexity analysis. The developed prototype is validated using a case study adapted from the IEEE 34-node distribution test feeder.

...read moreread less

Posted Content•

Machine Learning Techniques for Mortality Modeling

[...]

Philippe Deprez, Pavel V. Shevchenko¹, Mario V. Wüthrich²•Institutions (2)

Macquarie University¹, ETH Zurich²

07 May 2017-arXiv: Applications

TL;DR: This paper illustrates how machine learning techniques allow us to analyze the quality of stochastic models and how these techniques can be used for differentiating the different causes of death in mortality modeling.

...read moreread less

Abstract: Various stochastic models have been proposed to estimate mortality rates. In this paper we illustrate how machine learning techniques allow us to analyze the quality of such mortality models. In addition, we present how these techniques can be used for differentiating the different causes of death in mortality modeling.

...read moreread less

Posted Content•

Plus-Minus Player Ratings for Soccer

[...]

Tarak Kharrat, Javier López Peña, Ian G. McHale

15 Jun 2017-arXiv: Applications

TL;DR: The ratings are used to examine who are the best players in European football, and how the players' ratings evolve over time are demonstrated, and light is shed on the debate regarding which is the strongest league.

...read moreread less

Abstract: The paper presents a plus-minus rating for use in association football (soccer). We first describe the standard plus-minus methodology as used in basketball and ice-hockey and then adapt it for use in soccer. The usual goal-differential plus-minus is considered before two variations are proposed. For the first variation, we present a methodology to calculate an expected goals plus-minus rating. The second variation makes use of in-play probabilities of match outcome to evaluate an expected points plus-minus rating. We use the ratings to examine who are the best players in European football, and demonstrate how the players' ratings evolve over time. Finally, we shed light on the debate regarding which is the strongest league. The model suggests the English Premier League is the strongest, with the German Bundesliga a close runner-up.

...read moreread less

Posted Content•

Sequential rerandomization.

[...]

Quan Zhou, Philip Ernst, Kari Lock Morgan, Donald B. Rubin, Anru Zhang - Show less +1 more

13 Jun 2017-arXiv: Applications

TL;DR: It is proved in the key result that given the same number of rerandomizations, in expected value, under certain mild assumptions, sequential re randomization achieves better covariate balance than rerandomization at one time.

...read moreread less

Abstract: The seminal work of Morgan and Rubin (2012) considers rerandomization for all the units at one time. In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Our work offers a mathematical framework for sequential rerandomization designs, where the experimental units are enrolled in groups. We formulate an adaptive rerandomization procedure for balancing treatment/control assignments over some continuous or binary covariates, using Mahalanobis distance as the imbalance measure. We prove in our key result, Theorem 3, that given the same number of rerandomizations (in expected value), under certain mild assumptions, sequential rerandomization achieves better covariate balance than rerandomization at one time.

...read moreread less

Journal Article•DOI•

Statistical Power in Longitudinal Network Studies

[...]

Christoph Stadtfeld¹, Tom A. B. Snijders², Christian Steglich³, Marijtje A. J. van Duijn•Institutions (3)

ETH Zurich¹, University of Oxford², Linköping University³

18 Jan 2017-arXiv: Applications

TL;DR: In this paper, the authors present a simulation-based procedure to evaluate statistical power of longitudinal social network studies in which stochastic actor-oriented models (SAOMs) are to be applied.

...read moreread less

Abstract: Longitudinal social network studies can easily suffer from insufficient statistical power. Studies that simultaneously investigate change of network ties and change of nodal attributes (selection and influence studies) are particularly at risk because the number of nodal observations is typically much lower than the number of observed tie variables. This paper presents a simulation-based procedure to evaluate statistical power of longitudinal social network studies in which stochastic actor-oriented models (SAOMs) are to be applied. Two detailed case studies illustrate how statistical power is strongly affected by network size, number of data collection waves, effect sizes, missing data, and participant turnover. These issues should thus be explored in the design phase of longitudinal social network studies.

...read moreread less

Posted Content•

Ranking soccer teams on basis of their current strength: a comparison of maximum likelihood approaches

[...]

Christophe Ley, Tom Van de Wiele, Hans Van Eetvelde

26 May 2017-arXiv: Applications

TL;DR: In this paper, the authors present ten different strength-based statistical models that they use to model soccer match outcomes with the aim of producing a new ranking, and compare the 10 models on basis of their predictive performance via the Rank Probability Score at the level of both domestic leagues and national teams.

...read moreread less

Abstract: We present ten different strength-based statistical models that we use to model soccer match outcomes with the aim of producing a new ranking. The models are of four main types: Thurstone-Mosteller, Bradley-Terry, Independent Poisson and Bivariate Poisson, and their common aspect is that the parameters are estimated via weighted maximum likelihood, the weights being a match importance factor and a time depreciation factor giving less weight to matches that are played a long time ago. Since our goal is to build a ranking reflecting the teams' current strengths, we compare the 10 models on basis of their predictive performance via the Rank Probability Score at the level of both domestic leagues and national teams. We find that the best models are the Bivariate and Independent Poisson models. We then illustrate the versatility and usefulness of our new rankings by means of three examples where the existing rankings fail to provide enough information or lead to peculiar results.

...read moreread less

Collapse