scispace - formally typeset
Search or ask a question

Showing papers by "Klaus Nordhausen published in 2021"


Journal ArticleDOI
TL;DR: A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.

18 citations



Journal ArticleDOI
TL;DR: An overview of many methods that are based on a joint diagonalization of scatter matrices, which range from the unsupervised context with invariant coordinate selection and blind source separation, to the supervised context with discriminant analysis and sliced inverse regression.

8 citations


Journal ArticleDOI
TL;DR: This letter investigates the use of SBSS as a preprocessing tool for spatial prediction and compares it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical data set.
Abstract: Multivariate measurements taken at irregularly sampled locations are a common form of data, for example, in geochemical analysis of soil. In practical considerations, predictions of these measurements at unobserved locations are of great interest. For standard multivariate spatial prediction methods it is mandatory to not only model spatial dependencies but also cross-dependencies which makes it a demanding task. Recently, a blind source separation (BSS) approach for spatial data was suggested. When using this spatial BSS (SBSS) method before the actual spatial prediction, modeling of spatial cross-dependencies is avoided, which in turn simplifies the spatial prediction task significantly. In this letter, we investigate the use of SBSS as a preprocessing tool for spatial prediction and compare it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical data set.

7 citations


Journal ArticleDOI
TL;DR: Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models, and it is shown how they can be applied to high-dimensional compositional time series.
Abstract: Many geological phenomena are regularly measured over time to follow developments and changes. For many of these phenomena, the absolute values are not of interest, but rather the relative information, which means that the data are compositional time series. Thus, the serial nature and the compositional geometry should be considered when analyzing the data. Multivariate time series are already challenging, especially if they are higher dimensional, and latent variable models are a popular way to deal with this kind of data. Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models. Here, several such methods and their assumptions are reviewed, and it is shown how they can be applied to high-dimensional compositional time series. Also, a novel blind source separation method is suggested which is quite flexible regarding the assumptions of the latent time series. The methodology is illustrated using simulations and in an application to light absorbance data from water samples taken from a small stream in Lower Austria.

7 citations


Journal ArticleDOI
TL;DR: Methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression are reviewed in the R package tsBSS.
Abstract: Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first reducing the dimension of the series and then modelling the resulting uncorrelated signals univariately, avoiding the need for any covariance parameters. A popular and effective framework for this is blind source separation. In this paper we review the dimension reduction tools for time series available in the R package tsBSS. These include methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression. Several examples are provided to illustrate the functionality of the package.

7 citations


Journal ArticleDOI
TL;DR: In this paper, three popular linear dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI), and sliced inverse regression (SIR), are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space.

6 citations


Journal ArticleDOI
TL;DR: The R-package REPPlab is an R interface for the Java program EPP-lab that implements four projection indices and three biologically inspired optimization algorithms and proposes new tools for plotting and combining the results and specific tools for outlier detection.
Abstract: The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful as a preprocessing step to find clusters or as an out...

3 citations


Book ChapterDOI
TL;DR: This paper examines an approach of how to apply independent component analysis on compositional data by respecting the nature of the former and demonstrates the usefulness of this procedure on a metabolomic data set.
Abstract: Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is challenging as the application of standard multivariate analysis tools on the raw observations can lead to spurious results. Hence, it is appropriate to apply certain transformations prior to further analysis. One popular multivariate data analysis tool is independent component analysis. Independent component analysis aims to find statistically independent components in the data and as such might be seen as an extension to principal component analysis. In this paper, we examine an approach of how to apply independent component analysis on compositional data by respecting the nature of the latter and demonstrate the usefulness of this procedure on a metabolomics dataset.

3 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors proposed an automated way of determining the optimal number of low-rank components in dimension reduction of image data based on the combination of two-dimensional principal component analysis and an augmentation estimator.
Abstract: We propose an automated way of determining the optimal number of low-rank components in dimension reduction of image data. The method is based on the combination of two-dimensional principal component analysis and an augmentation estimator proposed recently in the literature. Intuitively, the main idea is to combine a scree plot with information extracted from the eigenvectors of a variation matrix. Simulation studies show that the method provides accurate estimates and a demonstration with a finger data set showcases its performance in practice.

2 citations


Posted Content
TL;DR: In this article, the estimation of the linear discriminant with projection pursuit was studied and it was shown that projection pursuit is able to achieve efficiency equal to LDA when groups are arbitrarily well-separated and their sizes are reasonably balanced.
Abstract: We study the estimation of the linear discriminant with projection pursuit, a method that is blind in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), an unblind estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit is able to achieve efficiency equal to LDA when the groups are arbitrarily well-separated and their sizes are reasonably balanced. We conclude with a real data example and a simulation study investigating the validity of the obtained asymptotic formulas for finite samples.

Journal ArticleDOI
TL;DR: In this article, a tensorial-independent component analysis (TICA) method is proposed based on TJADE and k-JADE, which achieves the consistency and the limiting distribution of TJADE under mild assumptions and offers notable improvement in computational speed.
Abstract: We propose a novel method for tensorial-independent component analysis. Our approach is based on TJADE and k-JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of k-JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large-scale video data, for which neither TJADE nor k-JADE is feasible. Finally, an experimental procedure is proposed to select the values of a set of tuning parameters. Supplementary material including the R-code for running the examples and the proofs of the theoretical results is available online.

Posted Content
TL;DR: In this paper, an adaptation of SBSS that uses scatter matrices based on differences was recently suggested in the literature, and they formalize these ideas, suggest an adapted sBSS method and show its usefulness on synthetic and real data.
Abstract: Multivariate measurements taken at different spatial locations occur frequently in practice. Proper analysis of such data needs to consider not only dependencies on-sight but also dependencies in and in-between variables as a function of spatial separation. Spatial Blind Source Separation (SBSS) is a recently developed unsupervised statistical tool that deals with such data by assuming that the observable data is formed by a linear latent variable model. In SBSS the latent variable is assumed to be constituted by weakly stationary random fields which are uncorrelated. Such a model is appealing as further analysis can be carried out on the marginal distributions of the latent variables, interpretations are straightforward as the model is assumed to be linear, and not all components of the latent field might be of interest which acts as a form of dimension reduction. The weakly stationarity assumption of SBSS implies that the mean of the data is constant for all sample locations, which might be too restricting in practical applications. Therefore, an adaptation of SBSS that uses scatter matrices based on differences was recently suggested in the literature. In our contribution we formalize these ideas, suggest an adapted SBSS method and show its usefulness on synthetic and real data.

Posted Content
TL;DR: In this article, the authors propose projection pursuit for data that admit a natural representation in matrix form, which is shown to recover the optimally separating projection for two-group Gaussian mixtures in the absence of any label information.
Abstract: We develop projection pursuit for data that admit a natural representation in matrix form. For projection indices, we propose extensions of the classical kurtosis and Mardia's multivariate kurtosis. The first index estimates projections for both sides of the matrices simultaneously, while the second index finds the two projections separately. Both indices are shown to recover the optimally separating projection for two-group Gaussian mixtures in the full absence of any label information. We further establish the strong consistency of the corresponding sample estimators. Simulations and a real data example on hand-written postal code data are used to demonstrate the method.

Posted Content
TL;DR: In this article, the authors extend the SBSS model to adjust for these stationarity violations, present three novel estimators and establish the identifiability and affine equivariance property of the unmixing matrix functionals defining these estimators.
Abstract: Regional data analysis is concerned with the analysis and modeling of measurements that are spatially separated by specifically accounting for typical features of such data. Namely, measurements in close proximity tend to be more similar than the ones further separated. This might hold also true for cross-dependencies when multivariate spatial data is considered. Often, scientists are interested in linear transformations of such data which are easy to interpret and might be used as dimension reduction. Recently, for that purpose spatial blind source separation (SBSS) was introduced which assumes that the observed data are formed by a linear mixture of uncorrelated, weakly stationary random fields. However, in practical applications, it is well-known that when the spatial domain increases in size the weak stationarity assumptions can be violated in the sense that the second order dependency is varying over the domain which leads to non-stationary analysis. In our work we extend the SBSS model to adjust for these stationarity violations, present three novel estimators and establish the identifiability and affine equivariance property of the unmixing matrix functionals defining these estimators. In an extensive simulation study, we investigate the performance of our estimators and also show their use in the analysis of a geochemical dataset which is derived from the GEMAS geochemical mapping project.

Posted Content
TL;DR: In this paper, the authors consider the stationary subspace analysis (SSA) in a more general multivariate time series setting and propose SSA methods which are able to detect nonstationarities in mean, variance and autocorrelation.
Abstract: In stationary subspace analysis (SSA) one assumes that the observable p-variate time series is a linear mixture of a k-variate nonstationary time series and a (p-k)-variate stationary time series. The aim is then to estimate the unmixing matrix which transforms the observed multivariate time series onto stationary and nonstationary components. In the classical approach multivariate data are projected onto stationary and nonstationary subspaces by minimizing a Kullback-Leibler divergence between Gaussian distributions, and the method only detects nonstationarities in the first two moments. In this paper we consider SSA in a more general multivariate time series setting and propose SSA methods which are able to detect nonstationarities in mean, variance and autocorrelation, or in all of them. Simulation studies illustrate the performances of proposed methods, and it is shown that especially the method that detects all three types of nonstationarities performs well in various time series settings. The paper is concluded with an illustrative example.