scispace - formally typeset
Open AccessPosted Content

Gini diversity index, Hamming distance, and curse of dimensionality

Pranab Kumar Sen
- 01 Dec 2005 - 
- Vol. 63, Iss: 3, pp 329-349
TLDR
In this article, the role of Hamming distance based analysis is appraised in this context and the MANOVA decomposability aspects are specially appraised. The Hamming Distance incorporates the idea of Gini-Simpson diversity index in a variety of multidimensional setups, without making very stringent structural regularity assumptions.
Abstract
The celebrated Gini(-Simpson) biodiversity index has found very useful applications in ecology, bio-environmetrics, econometry, psychometry, genetics, and lately in bioinformatics as well. In such applications, mostly, categorical data models, without possibly an ordering of the categories, crop up, which may preempt routine use of conventional measures of quantitative diversity analysis. Further, in real life problems, mostly, genuine multidimensional data models are encountered. The Hamming distance incorporates the idea of Gini-Simpson diversity index in a variety of multidimensional setups, without making very stringent structural regularity assumptions. In bioinformatics as well as many other large biological system analysis studies, the curse of dimensionality (arising in multidimensional purely qualitative categorical data models) is a geneuine concern. The role of Hamming distance based analysis is appraised in this context. Subgroup or MANOVA decomposability aspects are specially appraised in this setup.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Aspen, climate, and sudden decline in western USA

TL;DR: In this article, a bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots.
Journal ArticleDOI

Multicriteria diversity analysis: A novel heuristic framework for appraising energy portfolios

TL;DR: A novel general framework for analysing energy diversity is outlined and it is argued that the associated multicriteria diversity analysis method provides a more systematic, complete and transparent way to articulate disparate perspectives and approaches and so help to inform more robust and accountable policymaking.
Journal ArticleDOI

Quantifying the abundance of co-occurring conifers along Inland Northwest (USA) climate gradients

TL;DR: The occurrence and abundance of conifers along climate gradients in the Inland Northwest (USA) was assessed using data from 5082 field plots, 81% of which were forested, and the results were in close agreement with the works of descriptive ecologists.
Journal ArticleDOI

Cross-domain, soft-partition clustering with diversity measure and knowledge reference

TL;DR: The quadratic weights and Gini-Simpson diversity based fuzzy clustering model (QWGSD-FC), is first proposed as a basis of this work and two types of cross-domain, soft-partition clustering frameworks and their corresponding algorithms, referred to as type-I/type-II knowledge-transfer-oriented c-means (TI-KT-CM and TII-KT
Journal ArticleDOI

Soft subspace clustering of categorical data with probabilistic distance

TL;DR: A new algorithm is proposed for clustering categorical data with a novel soft feature-selection scheme, by which each categorical attribute is automatically assigned a weight that correlates with the smoothed dispersion of the categories in a cluster.
References
More filters
Journal ArticleDOI

A mathematical theory of communication

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Journal Article

The mathematical theory of communication

TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
Journal ArticleDOI

Generalized Linear Models

Eric R. Ziegel
- 01 Aug 2002 - 
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Journal ArticleDOI

The Mathematical Theory of Communication

TL;DR: The theory of communication is extended to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message anddue to the nature of the final destination of the information.
Related Papers (5)
Trending Questions (1)
What is the use of Gini index in alfa-diversity analyses in ecology?

The Gini index is used in alpha-diversity analyses in ecology to measure the diversity of species within a specific habitat or location.