Data weaving: scaling up the state-of-the-art in data clustering
Citations
86 citations
40 citations
Cites methods from "Data weaving: scaling up the state-..."
...For RCV1, we remove the words appearing fewer than 10 times and standard stopwords; pre-process the data according to [2] (3); and convert it into a 53 classes classification task....
[...]
19 citations
Cites methods from "Data weaving: scaling up the state-..."
...2009) comprised of the documents from the 30 most popular categories and rcv20 is the subset of a large rcv1 dataset (Bekkerman and Scholz 2008) consisted of documents from the 20 most popular categories. Following Chechik et al. (2010), we reduce the dimensionality of these document datasets to 200 by principle components analysis (PCA)....
[...]
...2009) comprised of the documents from the 30 most popular categories and rcv20 is the subset of a large rcv1 dataset (Bekkerman and Scholz 2008) consisted of documents from the 20 most popular categories....
[...]
15 citations
10 citations
Cites background or methods from "Data weaving: scaling up the state-..."
...when the user u was assigned into a cluster ũ, and the movie m was assigned into a cluster m̃ (for a discussion, see [5])....
[...]
...Improving Clustering Stability with Combinatorial MRFs Ron Bekkerman HP Labs Martin Scholz HP Labs Krishnamurthy ViswanathanHP Labs 1501 Page Mill Rd 1501 Page Mill Rd 1501 Page Mill Rd Palo Alto, CA 94304 Palo Alto, CA 94304 Palo Alto, CA 94304 ron.bekkerman@hp.com scholz@hp.com ABSTRACT As clustering methods are often sensitive to parameter tuning, obtaining stability in clustering results is an important task....
[...]
...[5] R. Bekkerman and M. Scholz....
[...]
...Instead, we apply its parallelized version, called DataLoom [5]....
[...]
...For evaluating our collaborative filtering results based on the constructed clusterings of users and movies, we follow Bekkerman and Scholz [5] who compute the Area Under the ROC Curve (or AUC, in short) for a constructed ranking of the user/movie pairs (see Section 5....
[...]
References
30,570 citations
"Data weaving: scaling up the state-..." refers background in this paper
...In LDA, each document is represented as a distribution of topics, and parameters of those distributions are learned from the data....
[...]
...against the standard uni-modal k-means, as well as against Latent Dirichlet Allocation (LDA) [5]—a popular generative model for representing document collections....
[...]
...Dataset k-means LDA IT-CC SCC 2way DataLoom (deterministic) 2way DataLoom (stochastic) 3way DataLoom (stochastic) acheyer 24.7 44.3 ± 0.4 39.0±0.6 46.1±0.3 43.7±0.5 42.4±0.5 46.7±0.3 mgondek 37.0 68.0 ± 0.8 61.3±1.5 63.4±1.1 63.3±1.8 64.6±1.2 73.8±1.7 sanders-r 45.5 63.8 ± 0.4 56.1±0.7 60.2±0.4 59.8±0.9 61.3±0.8 66.5±0.2 20NG 16.1 56.7 ± 0.6 54.2±0.7 57.7±0.2 55.1±0.7 55.6±0.7 N/A against the standard uni-modal k-means, as well as against Latent Dirichlet Allocation (LDA) [5] a popular generative model for representing document collections....
[...]
...We used Xuerui Wang s LDA implementation [25] that applies Gibbs sampling with 10000 sampling iterations....
[...]
25,546 citations
20,309 citations
"Data weaving: scaling up the state-..." refers background in this paper
...It has recently been discussed that the same kind of parallelization works very well in combination with the popular MapReduce paradigm [10]....
[...]
17,663 citations
4,806 citations