scispace - formally typeset
Open AccessJournal ArticleDOI

Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

Reads0
Chats0
TLDR
In this paper, the authors show the relationship and complementary roles of k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis(ICA) to capture the true data structure.
Abstract
Identifying the data structure including trends and groups/clusters in geochemical problems is essential to discuss the origin of sources and processes from the observed variability of data. An increasing number and high dimensionality of recent geochemical data require efficient and accurate multivariate statistical analysis methods. In this paper, we show the relationship and complementary roles of k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis (ICA) to capture the true data structure. When the data are preprocessed by primary standardization (i.e., with the zero mean and normalized by the standard deviation), KCA and PCA provide essentially the same results, although the former returns the solution in a discretized space. When the data are preprocessed by whitening (i.e., normalized by eigenvalues along the principal components), KCA and ICA may identify a set of independent trends and groups, irrespective of the amplitude (power) of variance. As an example, basalt isotopic compositions have been analyzed with KCA on the whitened data, demonstrating clear rock‒tectonic occurrence‒mantle end-member discrimination. Therefore, the combination of these methods, particularly KCA on whitened data, is useful to capture and discuss the data structure of various geochemical systems, for which an Excel program is provided. This article is protected by copyright. All rights reserved.

read more

Citations
More filters
Journal ArticleDOI

Geochemical Discrimination and Characteristics of Magmatic Tectonic Settings: A Machine-Learning-Based Approach

TL;DR: This article used support vector machine, random forest, and sparse multinomial regression (SMR) approaches to identify geochemical signatures that characterize the tectonic settings of interest and the characteristics of each sample as a probability of the membership of the sample for each setting.
Journal ArticleDOI

Hydrogeochemical characteristics of a multi-layered coastal aquifer system in the Mekong Delta, Vietnam

TL;DR: Investigating the hydrogeochemical characteristics and their influencing factors in a coastal area of the Mekong Delta, Vietnam shows that deep groundwater is dominated by Ca–HCO, Ca–Mg–Cl, and Na-HCO water type while shallow groundwater isdominated by the Na–Cl water type, implying that groundwater extraction may exacerbate groundwater quality-related problems.
Journal ArticleDOI

Geochemical differentiation processes for arc magma of the Sengan volcanic cluster, Northeastern Japan, constrained from principal component analysis

TL;DR: In this paper, the authors employed principal component analysis (PCA) to evaluate the compositional variations of volcanic rocks from the Sengan volcanic cluster of the Northeastern Japan Arc.
Journal ArticleDOI

A process-oriented approach to mantle geochemistry

TL;DR: For example, the authors in this article used oceanic basalts as indirect tracers of Earth's mantle composition, and found that their incompatible element and isotopic composition is inherently biased towards the incompatible element enriched source components.
Journal ArticleDOI

Geochemical discrimination and characteristics of magmatic tectonic settings; a machine learning-based approach

TL;DR: The authors used support vector machine, random forest, and sparse multinomial regression (SMR) approaches to identify geochemical signatures that characterize the tectonic settings of interest and the characteristics of each sample as a probability of the membership of the sample for each setting.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Proceedings ArticleDOI

k-means++: the advantages of careful seeding

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Journal ArticleDOI

Data clustering: 50 years beyond K-means

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.
Journal ArticleDOI

Fast and robust fixed-point algorithms for independent component analysis

TL;DR: Using maximum entropy approximations of differential entropy, a family of new contrast (objective) functions for ICA enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions.
Journal ArticleDOI

Estimating the number of clusters in a data set via the gap statistic

TL;DR: In this paper, the authors proposed a method called the "gap statistic" for estimating the number of clusters (groups) in a set of data, which uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution.
Related Papers (5)