Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

doi:10.1002/2016GC006663

Open AccessJournal ArticleDOI

Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

Hikaru Iwamori, +6 more

- 01 Mar 2017 -

Geochemistry Geophysics Geosystems

- Vol. 18, Iss: 3, pp 994-1012

Chats0

TLDR

In this paper, the authors show the relationship and complementary roles of k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis(ICA) to capture the true data structure.

Abstract:

Identifying the data structure including trends and groups/clusters in geochemical problems is essential to discuss the origin of sources and processes from the observed variability of data. An increasing number and high dimensionality of recent geochemical data require efficient and accurate multivariate statistical analysis methods. In this paper, we show the relationship and complementary roles of k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis (ICA) to capture the true data structure. When the data are preprocessed by primary standardization (i.e., with the zero mean and normalized by the standard deviation), KCA and PCA provide essentially the same results, although the former returns the solution in a discretized space. When the data are preprocessed by whitening (i.e., normalized by eigenvalues along the principal components), KCA and ICA may identify a set of independent trends and groups, irrespective of the amplitude (power) of variance. As an example, basalt isotopic compositions have been analyzed with KCA on the whitened data, demonstrating clear rock‒tectonic occurrence‒mantle end-member discrimination. Therefore, the combination of these methods, particularly KCA on whitened data, is useful to capture and discuss the data structure of various geochemical systems, for which an Excel program is provided. This article is protected by copyright. All rights reserved.

Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

Citations

Geochemical Discrimination and Characteristics of Magmatic Tectonic Settings: A Machine-Learning-Based Approach

Hydrogeochemical characteristics of a multi-layered coastal aquifer system in the Mekong Delta, Vietnam

Geochemical differentiation processes for arc magma of the Sengan volcanic cluster, Northeastern Japan, constrained from principal component analysis

A process-oriented approach to mantle geochemistry

Geochemical discrimination and characteristics of magmatic tectonic settings; a machine learning-based approach

References

Some methods for classification and analysis of multivariate observations

k-means++: the advantages of careful seeding

Data clustering: 50 years beyond K-means

Fast and robust fixed-point algorithms for independent component analysis

Estimating the number of clusters in a data set via the gap statistic

Related Papers (5)

Decoupled isotopic record of ridge and subduction zone processes in oceanic basalts by independent component analysis

The composition of the Earth

Trace element signature of subduction-zone fluids, melts and supercritical liquids at 120–180 km depth

Geochemical mapping of the Mariana arc‐basin system: Implications for the nature and distribution of subduction components

Differences between oceanic basalts by multitrace element ratio topology