scispace - formally typeset
Topic

Disjoint sets

About: Disjoint sets is a(n) research topic. Over the lifetime, 12145 publication(s) have been published within this topic receiving 183313 citation(s). The topic is also known as: disjoint set & mutually exclusive sets.
Papers
More filters

Book
Robert E. Tarjan1
01 Jan 1983-
TL;DR: This paper presents a meta-trees tree model that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually selecting trees to grow in a graph.
Abstract: Foundations Disjoint Sets Heaps Search Trees Linking and Cutting Trees Minimum Spanning Trees Shortest Paths Network Flows Matchings

2,077 citations


Journal ArticleDOI
16 May 2000-
TL;DR: A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.
Abstract: In this paper, we propose a novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor. We rank each point on the basis of its distance to its kth nearest neighbor and declare the top n points in this ranking to be outliers. In addition to developing relatively straightforward solutions to finding such outliers based on the classical nested-loop join and index join algorithms, we develop a highly efficient partition-based algorithm for mining outliers. This algorithm first partitions the input data set into disjoint subsets, and then prunes entire partitions as soon as it is determined that they cannot contain outliers. This results in substantial savings in computation. We present the results of an extensive experimental study on real-life and synthetic data sets. The results from a real-life NBA database highlight and reveal several expected and unexpected aspects of the database. The results from a study on synthetic data sets demonstrate that the partition-based algorithm scales well with respect to both data set size and data set dimensionality.

1,631 citations


Journal ArticleDOI
01 Apr 1975-Journal of the ACM
TL;DR: It is shown that, if t(m, n) is seen as the maximum time reqmred by a sequence of m > n FINDs and n -- 1 intermixed UNIONs, then kima(m), n is shown to be related to a functional inverse of Ackermann's functmn and as very slow-growing.
Abstract: TWO types of instructmns for mampulating a family of disjoint sets which partitmn a umverse of n elements are considered FIND(x) computes the name of the (unique) set containing element x UNION(A, B, C) combines sets A and B into a new set named C A known algorithm for implementing sequences of these mstructmns is examined It is shown that, if t(m, n) as the maximum time reqmred by a sequence of m > n FINDs and n -- 1 intermixed UNIONs, then kima(m, n) _~ t(m, n) < k:ma(m, n) for some positive constants ki and k2, where a(m, n) is related to a functional inverse of Ackermann's functmn and as very slow-growing

1,343 citations


Book ChapterDOI
01 Jan 2009-
TL;DR: A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L, however, training examples in several application domains are often associated withA set of labels Y ⊆ L.
Abstract: A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multi-label.

1,343 citations


Journal ArticleDOI
01 Jan 2001-Machine Learning
TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.
Abstract: Unlabeled document collections are becoming increasingly common and availables mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors–a few thousand dimensions and a sparsity of 95 to 99% is typical. In this paper, we study a certain spherical k-means algorithm for clustering such document vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. As our first contribution, we empirically demonstrate that, owing to the high-dimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain “fractal-like” and “self-similar” behavior. As our second contribution, we introduce concept decompositions to approximate the matrix of document vectorss these decompositions are obtained by taking the least-squares approximation onto the linear subspace spanned by all the concept vectors. We empirically establish that the approximation errors of the concept decompositions are close to the best possible, namely, to truncated singular value decompositions. As our third contribution, we show that the concept vectors are localized in the word space, are sparse, and tend towards orthonormality. In contrast, the singular vectors are global in the word space and are dense. Nonetheless, we observe the surprising fact that the linear subspaces spanned by the concept vectors and the leading singular vectors are quite close in the sense of small principal angles between them. In conclusion, the concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets.

1,322 citations


Network Information
Related Topics (5)
Cardinality

6.2K papers, 104.8K citations

93% related
Hyperplane

4.8K papers, 83.5K citations

93% related
Partially ordered set

5.5K papers, 76.2K citations

92% related
Open problem

3.7K papers, 54.2K citations

92% related
Integer

9.3K papers, 112.7K citations

92% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202220
2021635
2020592
2019622
2018606
2017598

Top Attributes

Show by:

Topic's top 5 most impactful authors

Saket Saurabh

25 papers, 381 citations

Saharon Shelah

23 papers, 235 citations

Micha Sharir

21 papers, 654 citations

Benny Sudakov

19 papers, 357 citations

Keiichi Kaneko

17 papers, 70 citations