scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A K-Means Clustering Algorithm

01 Mar 1979-Journal of The Royal Statistical Society Series C-applied Statistics (John Wiley & Sons, Ltd)-Vol. 28, Iss: 1, pp 100-108
About: This article is published in Journal of The Royal Statistical Society Series C-applied Statistics.The article was published on 1979-03-01. It has received 10702 citations till now. The article focuses on the topics: Canopy clustering algorithm & Correlation clustering.
Citations
More filters
Journal ArticleDOI
TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

9,627 citations

Journal ArticleDOI
TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Abstract: Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent development...

4,123 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the new models developed for the structure program allow structure to be detected at lower levels of divergence, or with less data, than the original structure models or principal components methods, and that they are not biased towards detecting structure when it is not present.
Abstract: Genetic clustering algorithms require a certain amount of data to produce informative results. In the common situation that individuals are sampled at several locations, we show how sample group information can be used to achieve better results when the amount of data is limited. New models are developed for the structure program, both for the cases of admixture and no admixture. These models work by modifying the prior distribution for each individual's population assignment. The new prior distributions allow the proportion of individuals assigned to a particular cluster to vary by location. The models are tested on simulated data, and illustrated using microsatellite data from the CEPH Human Genome Diversity Panel. We demonstrate that the new models allow structure to be detected at lower levels of divergence, or with less data, than the original structure models or principal components methods, and that they are not biased towards detecting structure when it is not present. These models are implemented in a new version of structure which is freely available online at http://pritch.bsd.uchicago.edu/structure.html.

3,105 citations

Journal ArticleDOI
TL;DR: In this article, the authors investigated contextual organizational ambidexterity, defined as the capacity to simultaneously achieve alignment and adaptability at a business-unit level, and found that a context characterized by a combination of stretch, discipline, support, and trust facilitates contextual ambidextrousness.
Abstract: We investigated contextual organizational ambidexterity, defined as the capacity to simultaneously achieve alignment and adaptability at a business-unit level. Building on the leadership and organization context literatures, we argue that a context characterized by a combination of stretch, discipline, support, and trust facilitates contextual ambidexterity. Further, ambidexterity mediates the relationship between these contextual features and performance. Data collected from 4,195 individuals in 41 business units supported our hypotheses. A recurring theme in a variety of organizational literatures is that successful organizations in a dynamic environment are ambidextrous—aligned and efficient in their management of today’s business demands, while also adaptive enough to changes in the environment that they will still be around tomorrow (Duncan, 1976; Tushman & O’Reilly, 1996). The simple idea behind the value of ambidexterity is that the demands on an organization in its task environment are always to some degree in conflict (for instance, investment in current versus future projects, differentiation versus low-cost production), so there are always trade-offs to be made. Although these trade-offs can never entirely be eliminated, the most successful organizations reconcile them to a large degree, and in so doing enhance their long-term competitiveness. Authors have typically viewed ambidexterity in structural terms. According to Duncan (1976), who first used the term, organizations manage trade-offs between conflicting demands by putting in place “dual structures,” so that certain business units—or groups within business units—focus on alignment, while others focus on adaptation (Duncan, 1976).

3,009 citations

References
More filters