scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Functional Data Analysis

01 Jun 2016-Social Science Research Network (Annual Reviews)-Vol. 3, Iss: 1, pp 257-295
TL;DR: In this article, the authors provide an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA).
Abstract: With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. These are both examples of functional data, which has become a commonly encountered type of data. Functional data analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA). FPCA is an important dimension reduction tool, and in sparse data situations it can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional d...

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Despite its clear benefits for analyzing time series data, full appreciation of the key features and value of FDA have been limited to date, though the applications show its relevance to many public health and biomedical problems.
Abstract: Functional data analysis (FDA) is increasingly being used to better analyze, model and predict time series data. Key aspects of FDA include the choice of smoothing technique, data reduction, adjustment for clustering, functional linear modeling and forecasting methods. A systematic review using 11 electronic databases was conducted to identify FDA application studies published in the peer-review literature during 1995–2010. Papers reporting methodological considerations only were excluded, as were non-English articles. In total, 84 FDA application articles were identified; 75.0% of the reviewed articles have been published since 2005. Application of FDA has appeared in a large number of publications across various fields of sciences; the majority is related to biomedicine applications (21.4%). Overall, 72 studies (85.7%) provided information about the type of smoothing techniques used, with B-spline smoothing (29.8%) being the most popular. Functional principal component analysis (FPCA) for extracting information from functional data was reported in 51 (60.7%) studies. One-quarter (25.0%) of the published studies used functional linear models to describe relationships between explanatory and outcome variables and only 8.3% used FDA for forecasting time series data. Despite its clear benefits for analyzing time series data, full appreciation of the key features and value of FDA have been limited to date, though the applications show its relevance to many public health and biomedical problems. Wider application of FDA to all studies involving correlated measurements should allow better modeling of, and predictions from, such data in the future especially as FDA makes no a priori age and time effects assumptions.

228 citations

Journal ArticleDOI
TL;DR: In this paper, the performance of local linear smoothers for both mean and covariance functions with a general weighing scheme, which includes two commonly used schemes, equal weight per observation (OBS), and equal weight each subject (SUBJ), as two special cases, is investigated.
Abstract: Nonparametric estimation of mean and covariance functions is important in functional data analysis. We investigate the performance of local linear smoothers for both mean and covariance functions with a general weighing scheme, which includes two commonly used schemes, equal weight per observation (OBS), and equal weight per subject (SUBJ), as two special cases. We provide a comprehensive analysis of their asymptotic properties on a unified platform for all types of sampling plan, be it dense, sparse or neither. Three types of asymptotic properties are investigated in this paper: asymptotic normality, $L^{2}$ convergence and uniform convergence. The asymptotic theories are unified on two aspects: (1) the weighing scheme is very general; (2) the magnitude of the number $N_{i}$ of measurements for the $i$th subject relative to the sample size $n$ can vary freely. Based on the relative order of $N_{i}$ to $n$, functional data are partitioned into three types: non-dense, dense and ultra-dense functional data for the OBS and SUBJ schemes. These two weighing schemes are compared both theoretically and numerically. We also propose a new class of weighing schemes in terms of a mixture of the OBS and SUBJ weights, of which theoretical and numerical performances are examined and compared.

195 citations

Journal ArticleDOI
TL;DR: This paper provides a structured overview of the contents of this Special Issue of the Journal of Multivariate Analysis devoted to Functional Data Analysis and Related Topics, along with a brief survey of the field.

142 citations

Posted Content
TL;DR: An overview of FDA is provided, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA), an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed.
Abstract: With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. They are both examples of "functional data", which have become a prevailing type of data. Functional Data Analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA). FPCA is an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single or multiple index methods we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models, such as time warping, manifold learning, and dynamic modeling with empirical differential equations. The paper concludes with a brief discussion of future directions.

129 citations

Journal ArticleDOI
TL;DR: Overall, domain-general abilities were more important than domain-specific knowledge for mathematics learning in early grades but general abilities and domain- specific knowledge were equally important in later grades.
Abstract: The contributions of domain-general abilities and domain-specific knowledge to subsequent mathematics achievement were longitudinally assessed (n = 167) through 8th grade. First grade intelligence and working memory and prior grade reading achievement indexed domain-general effects and domain-specific effects were indexed by prior grade mathematics achievement and mathematical cognition measures of prior grade number knowledge, addition skills, and fraction knowledge. Use of functional data analysis enabled grade-by-grade estimation of overall domain-general and domain-specific effects on subsequent mathematics achievement, the relative importance of individual domain-general and domain-specific variables on this achievement, and linear and non-linear across-grade estimates of these effects. The overall importance of domain-general abilities for subsequent achievement was stable across grades, with working memory emerging as the most important domain-general ability in later grades. The importance of prior mathematical competencies on subsequent mathematics achievement increased across grades, with number knowledge and arithmetic skills critical in all grades and fraction knowledge in later grades. Overall, domain-general abilities were more important than domain-specific knowledge for mathematics learning in early grades but general abilities and domain-specific knowledge were equally important in later grades.

106 citations

References
More filters
Journal ArticleDOI
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations

Journal ArticleDOI
22 Dec 2000-Science
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Abstract: Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs-30,000 auditory nerve fibers or 10(6) optic nerve fibers-a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.

13,652 citations

Journal ArticleDOI

9,941 citations

Journal ArticleDOI
H. Sakoe1, S. Chiba1
TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
Abstract: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.

5,906 citations

Journal ArticleDOI
TL;DR: The class of generalized additive models is introduced, which replaces the linear form E fjXj by a sum of smooth functions E sj(Xj), and has the advantage of being completely auto- matic, i.e., no "detective work" is needed on the part of the statistician.
Abstract: Likelihood-based regression models such as the normal linear regression model and the linear logistic model, assume a linear (or some other parametric) form for the covariates $X_1, X_2, \cdots, X_p$. We introduce the class of generalized additive models which replaces the linear form $\sum \beta_jX_j$ by a sum of smooth functions $\sum s_j(X_j)$. The $s_j(\cdot)$'s are unspecified functions that are estimated using a scatterplot smoother, in an iterative procedure we call the local scoring algorithm. The technique is applicable to any likelihood-based regression model: the class of generalized linear models contains many of these. In this class the linear predictor $\eta = \Sigma \beta_jX_j$ is replaced by the additive predictor $\Sigma s_j(X_j)$; hence, the name generalized additive models. We illustrate the technique with binary response and survival data. In both cases, the method proves to be useful in uncovering nonlinear covariate effects. It has the advantage of being completely automatic, i.e., no "detective work" is needed on the part of the statistician. As a theoretical underpinning, the technique is viewed as an empirical method of maximizing the expected log likelihood, or equivalently, of minimizing the Kullback-Leibler distance to the true model.

2,708 citations