scispace - formally typeset
C

Cory Reina

Researcher at Microsoft

Publications -  13
Citations -  1732

Cory Reina is an academic researcher from Microsoft. The author has contributed to research in topics: Cluster analysis & Memory buffer register. The author has an hindex of 10, co-authored 13 publications receiving 1708 citations.

Papers
More filters
Proceedings Article

Scaling clustering algorithms to large databases

TL;DR: A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm.

Scaling EM (Expectation Maximization) Clustering to Large Databases

TL;DR: A scalable implementation of the Expectation-Maximization (EM) algorithm, which constructs proper statistical models of the underlying data source and naturally generalizes to cluster databases containing both discrete-valued and continuous-valued data.
Proceedings Article

Initialization of iterative refinement clustering algorithms

TL;DR: This work presents a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution, and shows that refined initial points indeed lead to improved solutions.
Patent

Scalable system for clustering of large databases

TL;DR: In this article, a data mining system for finding clusters of data items in a database or any other data storage medium is presented, where a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources.
Patent

Scalable system for K-means clustering of large databases

TL;DR: In this article, the authors present a data mining system for evaluating data in a database, where the data in the database is read from a storage medium and brought into a rapid access memory, and the data contained in the data portion is used to update the original guesses at the centroids of each of the K clusters.