scispace - formally typeset
Search or ask a question
Journal

International Federation of Classification Societies 

About: International Federation of Classification Societies is an academic journal. The journal publishes majorly in the area(s): Cluster analysis & Fuzzy clustering. Over the lifetime, 41 publications have been published receiving 1978 citations.

Papers
More filters
Journal Article
TL;DR: In this paper, Hubert and Arabie corrected the Rand Index for chance (Adjusted Rand Index) and presented some alternative indices, which do not assume one set of units for two partitions.
Abstract: Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper, we present some alternative indices. The proposed indices do not assume one set of units for two partitions. Here, one set of units can be a subset of the other set of units. According to the purpose of the comparison of two partitions, the merging and splitting of clusters in two partitions can have different impact on the value of the indices. Therefore, we proposed different modified Rand Indices.

2,417 citations

Journal Article
TL;DR: In this article, a fast heuristic procedure of variable importance test was proposed for high-dimensional molecular data where many variables do not carry any information and controlled the type I error and had higher power at a substantially smaller computation time than the permutation-based approach.
Abstract: Random forests are a commonly used tool for classication with high dimensional data as well as for ranking candidate predictors based on the so-called variable importance measures.Several approaches have been developed for addressing the question of whether the variable importance is signicantly greater than zero. The existing approaches are permutation-based and require the repeated computation of forests. While for low-dimensional settings those permutation-based approaches might be computationally tractable, for high-dimensional settings typically including thousands of genes, computing time is enormous. In this article we propose a computationally fast heuristic procedure of a variable importance test which is appropriate for high-dimensional molecular data where many variables do not carry any information. In our studies on complex high-dimensional binary classication settings this new approach controlled the type I error and had higher power at a substantially smaller computation time than the permutation-based approach of Altmann and colleagues.

66 citations

Book ChapterDOI
TL;DR: This analysis finds that missing data imputation using hot deck, iterative robust model-based imputation, factorial analysis for mixed data and Random Forest Imputation perform in a similar manner regardless of the amount of missing data and have the highest mean percentage of observations correctly classified.
Abstract: Multivariate data sets frequently have missing observations scattered throughout the data set. Many machine learning algorithms assume that there is no particular significance in the fact that a particular observation has an attribute value missing. A common approach in coping with these missing values is to replace the missing value using some plausible value, and the resulting completed data set is analysed using standard methods. We evaluate the effect that some commonly used imputation methods have on the accuracy of classifiers in supervised leaning. The effect is assessed in simulations performed on several classical datasets where observations have been made missing at random in different proportions. Our analysis finds that missing data imputation using hot deck, iterative robust model-based imputation (IRMI), factorial analysis for mixed data (FAMD) and Random Forest Imputation (MissForest) perform in a similar manner regardless of the amount of missing data and have the highest mean percentage of observations correctly classified. Other methods investigated did not perform as well.

18 citations

Journal Article
TL;DR: In this article, a Taylor series expansion is used to approximate the expectation of a similarity index under fixed marginal totals of a matching counts matrix, and the expectation is expressed as a function of other indices and expectations.
Abstract: Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.

8 citations

Book ChapterDOI
TL;DR: In this article, the authors show how to use commonly known measures of correlation for two sets of variables: (1) the canonical correlation coefficient and (2) the distance correlation coefficient for multivariate functional data.
Abstract: The relationship between two sets of real variables defined for the same individuals can be evaluated by few different correlation coefficients. For the functional data we have only one important tool: the canonical correlations. It is not immediately straightforward to extend other similar measures to the context of functional data analysis. In this work we show how to use commonly known measures of correlation for two sets of variables: \(\mathop{\mathrm{rV}} olimits\) coefficient and distance correlation coefficient for multivariate functional case. Finally, these three different coefficients are compared and their use is demonstrated on two real examples.

8 citations

Network Information
Related Journals (5)
Computational Statistics & Data Analysis
6.1K papers, 193.5K citations
80% related
Journal of Statistical Software
1.4K papers, 352.7K citations
79% related
arXiv: Statistics Theory
11.8K papers, 156.5K citations
74% related
Communications in Statistics-theory and Methods
12.3K papers, 133.9K citations
74% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
201712
20161
201526
20141
20131