scispace - formally typeset
Search or ask a question
Author

Phyllis Wan

Other affiliations: Columbia University
Bio: Phyllis Wan is an academic researcher from Erasmus University Rotterdam. The author has contributed to research in topics: Preferential attachment & Covariance. The author has an hindex of 8, co-authored 20 publications receiving 201 citations. Previous affiliations of Phyllis Wan include Columbia University.

Papers
More filters
Journal ArticleDOI
TL;DR: It is proved that the joint distribution of in-degree and out-degree has jointly regularly varying tails.
Abstract: The research of the authors was supported by MURI ARO Grant W911NF-12-10385 to Cornell University

52 citations

Posted Content
TL;DR: In this paper, Szekely and Rizzo apply the ADCF to the residuals of an autoregressive process as a test of goodness of fit, and establish the relevant asymptotic theory for the sample auto- and cross-distance correlation functions.
Abstract: The use of empirical characteristic functions for inference problems, including estimation in some special parametric settings and testing for goodness of fit, has a long history dating back to the 70s (see for example, Feuerverger and Mureika (1977), Csorgo (1981a,1981b,1981c), Feuerverger (1993)). More recently, there has been renewed interest in using empirical characteristic functions in other inference settings. The distance covariance and correlation, developed by Szekely and Rizzo (2009) for measuring dependence and testing independence between two random vectors, are perhaps the best known illustrations of this. We apply these ideas to stationary univariate and multivariate time series to measure lagged auto- and cross-dependence in a time series. Assuming strong mixing, we establish the relevant asymptotic theory for the sample auto- and cross-distance correlation functions. We also apply the auto-distance correlation function (ADCF) to the residuals of an autoregressive processes as a test of goodness of fit. Under the null that an autoregressive model is true, the limit distribution of the empirical ADCF can differ markedly from the corresponding one based on an iid sequence. We illustrate the use of the empirical auto- and cross-distance correlation functions for testing dependence and cross-dependence of time series in a variety of different contexts.

42 citations

Journal ArticleDOI
TL;DR: This paper considers methods for fitting a 5-parameter linear preferential model to network data under two data scenarios and derives the maximum likelihood estimator of the parameters and shows that it is strongly consistent and asymptotically normal.
Abstract: Preferential attachment is an appealing mechanism for modeling power-law behavior of the degree distributions in directed social networks. In this paper, we consider methods for fitting a 5-parameter linear preferential model to network data under two data scenarios. In the case where full history of the network formation is given, we derive the maximum likelihood estimator of the parameters and show that it is strongly consistent and asymptotically normal. In the case where only a single-time snapshot of the network is available, we propose an estimation method which combines method of moments with an approximation to the likelihood. The resulting estimator is also strongly consistent and performs quite well compared to the MLE estimator. We illustrate both estimation procedures through simulated data and explore the usage of this model in a real data example.

37 citations

Journal ArticleDOI
TL;DR: In this paper, Szekely et al. apply the ADCF to the residuals of an autoregressive process as a test of goodness of fit, and establish the relevant asymptotic theory for the sample auto- and cross-distance correlation functions.
Abstract: The use of empirical characteristic functions for inference problems, including estimation in some special parametric settings and testing for goodness of fit, has a long history dating back to the 70s. More recently, there has been renewed interest in using empirical characteristic functions in other inference settings. The distance covariance and correlation, developed by Szekely et al. (Ann. Statist. 35 (2007) 2769–2794) and Szekely and Rizzo (Ann. Appl. Stat. 3 (2009) 1236–1265) for measuring dependence and testing independence between two random vectors, are perhaps the best known illustrations of this. We apply these ideas to stationary univariate and multivariate time series to measure lagged auto- and cross-dependence in a time series. Assuming strong mixing, we establish the relevant asymptotic theory for the sample auto- and cross-distance correlation functions. We also apply the auto-distance correlation function (ADCF) to the residuals of an autoregressive processes as a test of goodness of fit. Under the null that an autoregressive model is true, the limit distribution of the empirical ADCF can differ markedly from the corresponding one based on an i.i.d. sequence. We illustrate the use of the empirical auto- and cross-distance correlation functions for testing dependence and cross-dependence of time series in a variety of contexts.

36 citations

Journal ArticleDOI
TL;DR: This paper explores how the spherical-means algorithm can be applied in the analysis of only the extremal observations from a data set and shows how it can be adopted to find "prototypes" of extremal dependence by making use of multivariate extreme value analysis.
Abstract: The k-means clustering algorithm and its variant, the spherical k-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical k-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.

31 citations


Cited by
More filters
Journal Article
TL;DR: An independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator, or HSIC, is proposed.
Abstract: We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

1,134 citations

01 May 2013
TL;DR: In this paper, the authors review work on extreme events, their causes and consequences, by a group of European and American researchers involved in a three-year project on these topics.
Abstract: We review work on extreme events, their causes and consequences, by a group of European and American researchers involved in a three-year project on these topics. The review covers theoretical aspects of time series analysis and of extreme value theory, as well as of the deterministic modeling of extreme events, via continuous and discrete dynamic models. The applications include climatic, seismic and socio-economic events, along with their prediction. Two important results refer to (i) the complementarity of spectral analysis of a time series in terms of the continuous and the discrete part of its power spectrum; and (ii) the need for coupled modeling of natural and socio-economic systems. Both these results have implications for the study and prediction of natural hazards and their human impacts.

166 citations

01 Dec 2017
TL;DR: Analysis of a large ensemble of high-resolution initialised climate simulations shows that this event could have been anticipated, and that in the current climate there remains a high chance of exceeding the observed record monthly rainfall totals in many regions of the UK.
Abstract: In winter 2013/14 a succession of storms hit the UK leading to record rainfall and flooding in many regions including south east England. In the Thames river valley there was widespread flooding, with clean-up costs of over £1 billion. There was no observational precedent for this level of rainfall. Here we present analysis of a large ensemble of high-resolution initialised climate simulations to show that this event could have been anticipated, and that in the current climate there remains a high chance of exceeding the observed record monthly rainfall totals in many regions of the UK. In south east England there is a 7% chance of exceeding the current rainfall record in at least one month in any given winter. Expanding our analysis to some other regions of England and Wales the risk increases to a 34% chance of breaking a regional record somewhere each winter.A succession of storms during the 2013–2014 winter led to record flooding in the UK. Here, the authors use high-resolution climate simulations to show that this event could have been anticipated and that there remains a high chance of exceeding observed record monthly rainfall totals in many parts of the UK.

48 citations

Posted Content
TL;DR: The different forms of extremal dependence that can arise between the largest observations of a multivariate random vector are described and identification of groups of variables which can be concomitantly extreme is addressed.
Abstract: Extreme value statistics provides accurate estimates for the small occurrence probabilities of rare events. While theory and statistical tools for univariate extremes are well-developed, methods for high-dimensional and complex data sets are still scarce. Appropriate notions of sparsity and connections to other fields such as machine learning, graphical models and high-dimensional statistics have only recently been established. This article reviews the new domain of research concerned with the detection and modeling of sparse patterns in rare events. We first describe the different forms of extremal dependence that can arise between the largest observations of a multivariate random vector. We then discuss the current research topics including clustering, principal component analysis and graphical modeling for extremes. Identification of groups of variables which can be concomitantly extreme is also addressed. The methods are illustrated with an application to flood risk assessment.

45 citations