scispace - formally typeset
Search or ask a question

Showing papers by "Boris Mirkin published in 2005"


BookDOI
TL;DR: The RSFDGrC 2013 was the 14th International Conference on Distributed Sensor Networks for Computer Science (RSFDG-2013) as mentioned in this paper, held in Halifax, NS, Canada, October 11-14, 2013.
Abstract: 14th International Conference, RSFDGrC 2013, Halifax, NS, Canada, October 11-14, 2013. Proceedings - Part of the Lecture Notes in Computer Science book series

535 citations


Book
29 Apr 2005
TL;DR: In this article, the authors proposed a data recovery approach in clustering based on graph-theoretic approaches to deal with missing data Validity and reliability in the context of K-means clustering.
Abstract: INTRODUCTION: HISTORICAL REMARKS WHAT IS CLUSTERING Exemplary Problems Bird's Eye View WHAT IS DATA Feature Characteristics Bivariate Analysis Feature Space and Data Scatter Preprocessing and Standardizing Mixed Data K-MEANS CLUSTERING Conventional K-Means Initialization of K-Means Intelligent K-Means Interpretation Aids Overall Assessment WARD HIERARCHICAL CLUSTERING Agglomeration: Ward Algorithm Divisive Clustering with Ward Criterion Conceptual Clustering Extensions of Ward Clustering Overall Assessment DATA RECOVERY MODELS Statistics Modeling as Data Recovery Data Recovery Model for K-Means Data Recovery Models for Ward Criterion Extensions to Other Data Types One-by-One Clustering Overall Assessment DIFFERENT CLUSTERING APPROACHES Extensions of K-Means Clustering Graph-Theoretic Approaches Conceptual Description of Clusters Overall Assessment GENERAL ISSUES Feature Selection and Extraction Data Pre-Processing and Standardization Similarity on Subsets and Partitions Dealing with Missing Data Validity and Reliability Overall Assessment CONCLUSION: Data Recovery Approach in Clustering BIBLIOGRAPHY Each chapter also contains a section of Base Words

429 citations


Journal ArticleDOI
TL;DR: The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.
Abstract: Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.

176 citations


Journal ArticleDOI
TL;DR: Experimental analysis of a set of imputation methods developed within the so-called least-squares approximation approach, a non-parametric computationally effective multidimensional technique, and proposes extensions of these algorithms based on the nearest neighbours approach.

68 citations


01 Jan 2005

47 citations


Posted Content
TL;DR: The results show that the character level represent ation of emails and classes facilitated by the suffix tree can significantly improve classification accuracy when compared with the currently popular methods, such as naive Bayes.
Abstract: We present an approach to email filtering based on the suffix tr ee data structure. A method for the scoring of emails using the suffix tree is developed and a number of scoring and score normalisation functions are tested. Our results show that the character level represent ation of emails and classes facilitated by the suffix tree can significantly impr ove classification accuracy when compared with the currently popular methods, such as naive Bayes. We believe the method can be extended to the classifica tion of documents in other domains.

7 citations


Posted Content
14 Mar 2005
TL;DR: The results show that the character level representation of documents and classes represented by the suffix tree significantly improves classification accuracy when compared with the popular naive Bayesian filtering method.
Abstract: We present an approach to textual classification based on the suffix tree data structure and apply it to spam filtering. A method for scoring of documents u sing the suffix tree is developed and a number of scoring and score normalisation functions ar e tested. Our results show that the character level representation of documents and classes fa cilitated by the suffix tree significantly improves classification accuracy when compared with the cur rently popular naive Bayesian filtering method.

4 citations


DOI
29 Apr 2005

3 citations


DOI
29 Apr 2005

1 citations