Showing papers by "Boris Mirkin published in 2005"

PDF

Open Access

Book•DOI•

Rough sets, fuzzy sets, data mining and granular computing

[...]

Sergei O. Kuznetsov, Dominik Ślęzak, Daryl H. Hepting, Boris Mirkin

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: The RSFDGrC 2013 was the 14th International Conference on Distributed Sensor Networks for Computer Science (RSFDG-2013) as mentioned in this paper, held in Halifax, NS, Canada, October 11-14, 2013.

...read moreread less

Abstract: 14th International Conference, RSFDGrC 2013, Halifax, NS, Canada, October 11-14, 2013. Proceedings - Part of the Lecture Notes in Computer Science book series

...read moreread less

535 citations

Book•

Clustering for Data Mining: A Data Recovery Approach

[...]

Boris Mirkin¹•Institutions (1)

Birkbeck, University of London¹

29 Apr 2005

TL;DR: In this article, the authors proposed a data recovery approach in clustering based on graph-theoretic approaches to deal with missing data Validity and reliability in the context of K-means clustering.

...read moreread less

Abstract: INTRODUCTION: HISTORICAL REMARKS WHAT IS CLUSTERING Exemplary Problems Bird's Eye View WHAT IS DATA Feature Characteristics Bivariate Analysis Feature Space and Data Scatter Preprocessing and Standardizing Mixed Data K-MEANS CLUSTERING Conventional K-Means Initialization of K-Means Intelligent K-Means Interpretation Aids Overall Assessment WARD HIERARCHICAL CLUSTERING Agglomeration: Ward Algorithm Divisive Clustering with Ward Criterion Conceptual Clustering Extensions of Ward Clustering Overall Assessment DATA RECOVERY MODELS Statistics Modeling as Data Recovery Data Recovery Model for K-Means Data Recovery Models for Ward Criterion Extensions to Other Data Types One-by-One Clustering Overall Assessment DIFFERENT CLUSTERING APPROACHES Extensions of K-Means Clustering Graph-Theoretic Approaches Conceptual Description of Clusters Overall Assessment GENERAL ISSUES Feature Selection and Extraction Data Pre-Processing and Standardization Similarity on Subsets and Partitions Dealing with Missing Data Validity and Reliability Overall Assessment CONCLUSION: Data Recovery Approach in Clustering BIBLIOGRAPHY Each chapter also contains a section of Base Words

...read moreread less

429 citations

Journal Article•DOI•

Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

[...]

Kira S. Makarova¹, Yuri I. Wolf, Sergey L. Mekhedov, Boris Mirkin, Eugene V. Koonin - Show less +1 more•Institutions (1)

National Institutes of Health¹

01 Jan 2005-Nucleic Acids Research

TL;DR: The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.

...read moreread less

Abstract: Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.

...read moreread less

176 citations

Journal Article•DOI•

Nearest neighbour approach in the least-squares data imputation algorithms

[...]

Ito Wasito¹, Boris Mirkin¹•Institutions (1)

Birkbeck, University of London¹

06 Jan 2005-Information Sciences

TL;DR: Experimental analysis of a set of imputation methods developed within the so-called least-squares approximation approach, a non-parametric computationally effective multidimensional technique, and proposes extensions of these algorithms based on the nearest neighbours approach.

...read moreread less

68 citations

Clustering for Data Mining

[...]

Boca Raton, Boris Mirkin

01 Jan 2005

47 citations

Posted Content•

A Suffix Tree Approach to Email Filtering

[...]

Rajesh Pampapathi¹, Boris Mirkin¹, Mark Levene¹•Institutions (1)

Birkbeck, University of London¹

14 Mar 2005-arXiv: Artificial Intelligence

TL;DR: The results show that the character level represent ation of emails and classes facilitated by the suffix tree can significantly improve classification accuracy when compared with the currently popular methods, such as naive Bayes.

...read moreread less

Abstract: We present an approach to email filtering based on the suffix tr ee data structure. A method for the scoring of emails using the suffix tree is developed and a number of scoring and score normalisation functions are tested. Our results show that the character level represent ation of emails and classes facilitated by the suffix tree can significantly impr ove classification accuracy when compared with the currently popular methods, such as naive Bayes. We believe the method can be extended to the classifica tion of documents in other domains.

...read moreread less

7 citations

Posted Content•

A Suffix Tree Approach to Text Categorisation Applied to Spam Filtering

[...]

Rajesh Pampapathi, Boris Mirkin, Mark Levene

14 Mar 2005

TL;DR: The results show that the character level representation of documents and classes represented by the suffix tree significantly improves classification accuracy when compared with the popular naive Bayesian filtering method.

...read moreread less

Abstract: We present an approach to textual classification based on the suffix tree data structure and apply it to spam filtering. A method for scoring of documents u sing the suffix tree is developed and a number of scoring and score normalisation functions ar e tested. Our results show that the character level representation of documents and classes fa cilitated by the suffix tree significantly improves classification accuracy when compared with the cur rently popular naive Bayesian filtering method.

...read moreread less

4 citations

DOI•

What Is Clustering

[...]

Boris Mirkin

29 Apr 2005

3 citations

DOI•

Ward Hierarchical Clustering

[...]

Boris Mirkin

29 Apr 2005

1 citations