Showing papers by "Klaus Nordhausen published in 2021"

PDF

Open Access

Journal Article•DOI•

Robust linear regression for high‐dimensional data: An overview

[...]

Peter Filzmoser¹, Klaus Nordhausen•Institutions (1)

01 Jul 2021-Wiley Interdisciplinary Reviews: Computational Statistics

TL;DR: A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.

...read moreread less

18 citations

Journal Article•DOI•

A review of second-order blind identification methods

[...]

Yan Pan¹, Markus Matilainen², Sara Taskinen¹, Klaus Nordhausen¹, Klaus Nordhausen³ - Show less +1 more•Institutions (3)

University of Jyväskylä¹, Turku University Hospital², Vienna University of Technology³

07 Feb 2021-Wiley Interdisciplinary Reviews: Computational Statistics

17 citations

Journal Article•DOI•

On the usage of joint diagonalization in multivariate statistics

[...]

Waldo Moya Fernández¹, Klaus Nordhausen¹, Anne Ruiz-Gazen²•Institutions (2)

University of Jyväskylä¹, University of Toulouse²

30 Sep 2021-Journal of Multivariate Analysis

TL;DR: An overview of many methods that are based on a joint diagonalization of scatter matrices, which range from the unsupervised context with invariant coordinate selection and blind source separation, to the supervised context with discriminant analysis and sliced inverse regression.

...read moreread less

8 citations

Journal Article•DOI•

On Cokriging, Neural Networks, and Spatial Blind Source Separation for Multivariate Spatial Prediction

[...]

Christoph Muehlmann¹, Klaus Nordhausen¹, Mengxi Yi¹•Institutions (1)

Vienna University of Technology¹

01 Nov 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: This letter investigates the use of SBSS as a preprocessing tool for spatial prediction and compares it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical data set.

...read moreread less

Abstract: Multivariate measurements taken at irregularly sampled locations are a common form of data, for example, in geochemical analysis of soil. In practical considerations, predictions of these measurements at unobserved locations are of great interest. For standard multivariate spatial prediction methods it is mandatory to not only model spatial dependencies but also cross-dependencies which makes it a demanding task. Recently, a blind source separation (BSS) approach for spatial data was suggested. When using this spatial BSS (SBSS) method before the actual spatial prediction, modeling of spatial cross-dependencies is avoided, which in turn simplifies the spatial prediction task significantly. In this letter, we investigate the use of SBSS as a preprocessing tool for spatial prediction and compare it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical data set.

...read moreread less

7 citations

Journal Article•DOI•

Blind Source Separation for Compositional Time Series.

[...]

Klaus Nordhausen¹, Gregor Fischer¹, Peter Filzmoser¹•Institutions (1)

Vienna University of Technology¹

01 Jul 2021-Mathematical Geosciences

TL;DR: Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models, and it is shown how they can be applied to high-dimensional compositional time series.

...read moreread less

Abstract: Many geological phenomena are regularly measured over time to follow developments and changes. For many of these phenomena, the absolute values are not of interest, but rather the relative information, which means that the data are compositional time series. Thus, the serial nature and the compositional geometry should be considered when analyzing the data. Multivariate time series are already challenging, especially if they are higher dimensional, and latent variable models are a popular way to deal with this kind of data. Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models. Here, several such methods and their assumptions are reviewed, and it is shown how they can be applied to high-dimensional compositional time series. Also, a novel blind source separation method is suggested which is quite flexible regarding the assumptions of the latent time series. The methodology is illustrated using simulations and in an application to light absorbance data from water samples taken from a small stream in Lower Austria.

...read moreread less

7 citations

Journal Article•DOI•

Dimension Reduction for Time Series in a Blind Source Separation Context Using R

[...]

Klaus Nordhausen¹, Markus Matilainen, Jari Miettinen, Joni Virta², Sara Taskinen³ - Show less +1 more•Institutions (3)

Vienna University of Technology¹, University of Turku², University of Jyväskylä³

12 Jul 2021-Journal of Statistical Software

TL;DR: Methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression are reviewed in the R package tsBSS.

...read moreread less

Abstract: Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first reducing the dimension of the series and then modelling the resulting uncorrelated signals univariately, avoiding the need for any covariance parameters. A popular and effective framework for this is blind source separation. In this paper we review the dimension reduction tools for time series available in the R package tsBSS. These include methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression. Several examples are provided to illustrate the functionality of the package.

...read moreread less

7 citations

Journal Article•DOI•

Asymptotic and bootstrap tests for subspace dimension

[...]

Klaus Nordhausen¹, Hannu Oja², David E. Tyler³•Institutions (3)

University of Jyväskylä¹, University of Turku², Rutgers University³

08 Sep 2021-Journal of Multivariate Analysis

TL;DR: In this paper, three popular linear dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI), and sliced inverse regression (SIR), are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space.

...read moreread less

6 citations

Journal Article•DOI•

REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit

[...]

Daniel Fischer, Alain Berro¹, Klaus Nordhausen², Anne Ruiz-Gazen¹•Institutions (2)

University of Toulouse¹, Vienna University of Technology²

02 Nov 2021-Communications in Statistics - Simulation and Computation

TL;DR: The R-package REPPlab is an R interface for the Java program EPP-lab that implements four projection indices and three biologically inspired optimization algorithms and proposes new tools for plotting and combining the results and specific tools for outlier detection.

...read moreread less

Abstract: The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful as a preprocessing step to find clusters or as an out...

...read moreread less

3 citations

Book Chapter•DOI•

Independent Component Analysis for Compositional Data

[...]

Christoph Muehlmann¹, Kamila Fačevicová, Alžběta Gardlo, Hana Janečková, Klaus Nordhausen¹ - Show less +1 more•Institutions (1)

Vienna University of Technology¹

01 Jan 2021-arXiv: Methodology

TL;DR: This paper examines an approach of how to apply independent component analysis on compositional data by respecting the nature of the former and demonstrates the usefulness of this procedure on a metabolomic data set.

...read moreread less

Abstract: Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is challenging as the application of standard multivariate analysis tools on the raw observations can lead to spurious results. Hence, it is appropriate to apply certain transformations prior to further analysis. One popular multivariate data analysis tool is independent component analysis. Independent component analysis aims to find statistically independent components in the data and as such might be seen as an extension to principal component analysis. In this paper, we examine an approach of how to apply independent component analysis on compositional data by respecting the nature of the latter and demonstrate the usefulness of this procedure on a metabolomics dataset.

...read moreread less

3 citations

Proceedings Article•DOI•

Dimension Estimation in Two-Dimensional PCA

[...]

Una Radojicic¹, Niko Lictzen², Klaus Nordhausen³, Joni Virta¹•Institutions (3)

Vienna University of Technology¹, University of Turku², University of Jyväskylä³

13 Sep 2021

TL;DR: In this paper, the authors proposed an automated way of determining the optimal number of low-rank components in dimension reduction of image data based on the combination of two-dimensional principal component analysis and an augmentation estimator.

...read moreread less

Abstract: We propose an automated way of determining the optimal number of low-rank components in dimension reduction of image data. The method is based on the combination of two-dimensional principal component analysis and an augmentation estimator proposed recently in the literature. Intuitively, the main idea is to combine a scree plot with information extracted from the eigenvectors of a variation matrix. Simulation studies show that the method provides accurate estimates and a demonstration with a finger data set showcases its performance in practice.

...read moreread less

2 citations

Posted Content•

Large-Sample Properties of Blind Estimation of the Linear Discriminant Using Projection Pursuit

[...]

Una Radojicic¹, Klaus Nordhausen², Joni Virta³•Institutions (3)

Vienna University of Technology¹, University of Jyväskylä², University of Turku³

08 Mar 2021-arXiv: Statistics Theory

TL;DR: In this article, the estimation of the linear discriminant with projection pursuit was studied and it was shown that projection pursuit is able to achieve efficiency equal to LDA when groups are arbitrarily well-separated and their sizes are reasonably balanced.

...read moreread less

Abstract: We study the estimation of the linear discriminant with projection pursuit, a method that is blind in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), an unblind estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit is able to achieve efficiency equal to LDA when the groups are arbitrarily well-separated and their sizes are reasonably balanced. We conclude with a real data example and a simulation study investigating the validity of the obtained asymptotic formulas for finite samples.

...read moreread less

Journal Article•DOI•

Fast tensorial JADE.

[...]

Joni Virta¹, Joni Virta², Niko Lietzén¹, Pauliina Ilmonen¹, Klaus Nordhausen³ - Show less +1 more•Institutions (3)

Aalto University¹, University of Turku², Vienna University of Technology³

01 Mar 2021-Scandinavian Journal of Statistics

TL;DR: In this article, a tensorial-independent component analysis (TICA) method is proposed based on TJADE and k-JADE, which achieves the consistency and the limiting distribution of TJADE under mild assumptions and offers notable improvement in computational speed.

...read moreread less

Abstract: We propose a novel method for tensorial-independent component analysis. Our approach is based on TJADE and k-JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of k-JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large-scale video data, for which neither TJADE nor k-JADE is feasible. Finally, an experimental procedure is proposed to select the values of a set of tuning parameters. Supplementary material including the R-code for running the examples and the proofs of the theoretical results is available online.

...read moreread less

Posted Content•

Spatial Blind Source Separation in the Presence of a Drift

[...]

Christoph Muehlmann¹, Peter Filzmoser¹, Klaus Nordhausen²•Institutions (2)

Vienna University of Technology¹, University of Jyväskylä²

31 Aug 2021-arXiv: Methodology

TL;DR: In this paper, an adaptation of SBSS that uses scatter matrices based on differences was recently suggested in the literature, and they formalize these ideas, suggest an adapted sBSS method and show its usefulness on synthetic and real data.

...read moreread less

Abstract: Multivariate measurements taken at different spatial locations occur frequently in practice. Proper analysis of such data needs to consider not only dependencies on-sight but also dependencies in and in-between variables as a function of spatial separation. Spatial Blind Source Separation (SBSS) is a recently developed unsupervised statistical tool that deals with such data by assuming that the observable data is formed by a linear latent variable model. In SBSS the latent variable is assumed to be constituted by weakly stationary random fields which are uncorrelated. Such a model is appealing as further analysis can be carried out on the marginal distributions of the latent variables, interpretations are straightforward as the model is assumed to be linear, and not all components of the latent field might be of interest which acts as a form of dimension reduction. The weakly stationarity assumption of SBSS implies that the mean of the data is constant for all sample locations, which might be too restricting in practical applications. Therefore, an adaptation of SBSS that uses scatter matrices based on differences was recently suggested in the literature. In our contribution we formalize these ideas, suggest an adapted SBSS method and show its usefulness on synthetic and real data.

...read moreread less

Posted Content•

Kurtosis-based projection pursuit for matrix-valued data

[...]

Una Radojicic¹, Klaus Nordhausen, Joni Virta²•Institutions (2)

Vienna University of Technology¹, University of Turku²

09 Sep 2021-arXiv: Statistics Theory

TL;DR: In this article, the authors propose projection pursuit for data that admit a natural representation in matrix form, which is shown to recover the optimally separating projection for two-group Gaussian mixtures in the absence of any label information.

...read moreread less

Abstract: We develop projection pursuit for data that admit a natural representation in matrix form. For projection indices, we propose extensions of the classical kurtosis and Mardia's multivariate kurtosis. The first index estimates projections for both sides of the matrices simultaneously, while the second index finds the two projections separately. Both indices are shown to recover the optimally separating projection for two-group Gaussian mixtures in the full absence of any label information. We further establish the strong consistency of the corresponding sample estimators. Simulations and a real data example on hand-written postal code data are used to demonstrate the method.

...read moreread less

Posted Content•

Blind source separation for non-stationary random fields

[...]

Christoph Muehlmann¹, François Bachoc², Klaus Nordhausen³•Institutions (3)

Vienna University of Technology¹, University of Toulouse², University of Jyväskylä³

05 Jul 2021-arXiv: Methodology

TL;DR: In this article, the authors extend the SBSS model to adjust for these stationarity violations, present three novel estimators and establish the identifiability and affine equivariance property of the unmixing matrix functionals defining these estimators.

...read moreread less

Abstract: Regional data analysis is concerned with the analysis and modeling of measurements that are spatially separated by specifically accounting for typical features of such data. Namely, measurements in close proximity tend to be more similar than the ones further separated. This might hold also true for cross-dependencies when multivariate spatial data is considered. Often, scientists are interested in linear transformations of such data which are easy to interpret and might be used as dimension reduction. Recently, for that purpose spatial blind source separation (SBSS) was introduced which assumes that the observed data are formed by a linear mixture of uncorrelated, weakly stationary random fields. However, in practical applications, it is well-known that when the spatial domain increases in size the weak stationarity assumptions can be violated in the sense that the second order dependency is varying over the domain which leads to non-stationary analysis. In our work we extend the SBSS model to adjust for these stationarity violations, present three novel estimators and establish the identifiability and affine equivariance property of the unmixing matrix functionals defining these estimators. In an extensive simulation study, we investigate the performance of our estimators and also show their use in the analysis of a geochemical dataset which is derived from the GEMAS geochemical mapping project.

...read moreread less

Posted Content•

Stationary subspace analysis based on second-order statistics

[...]

Waldo Moya Fernández¹, Lea Flumian, Markus Matilainen, Klaus Nordhausen, Sara Taskinen - Show less +1 more•Institutions (1)

University of Jyväskylä¹

10 Mar 2021-arXiv: Methodology

TL;DR: In this paper, the authors consider the stationary subspace analysis (SSA) in a more general multivariate time series setting and propose SSA methods which are able to detect nonstationarities in mean, variance and autocorrelation.

...read moreread less

Abstract: In stationary subspace analysis (SSA) one assumes that the observable p-variate time series is a linear mixture of a k-variate nonstationary time series and a (p-k)-variate stationary time series. The aim is then to estimate the unmixing matrix which transforms the observed multivariate time series onto stationary and nonstationary components. In the classical approach multivariate data are projected onto stationary and nonstationary subspaces by minimizing a Kullback-Leibler divergence between Gaussian distributions, and the method only detects nonstationarities in the first two moments. In this paper we consider SSA in a more general multivariate time series setting and propose SSA methods which are able to detect nonstationarities in mean, variance and autocorrelation, or in all of them. Simulation studies illustrate the performances of proposed methods, and it is shown that especially the method that detects all three types of nonstationarities performs well in various time series settings. The paper is concluded with an illustrative example.

...read moreread less