scispace - formally typeset
Search or ask a question

Showing papers on "Disjoint sets published in 2008"


Posted Content
TL;DR: In this article, the authors considered a more realistic network model where a finite number of nodes are uniformly randomly distributed in a general d-dimensional ball of radius R and characterised the distribution of Euclidean distances in the system.
Abstract: In wireless networks, the knowledge of nodal distances is essential for several areas such as system configuration, performance analysis and protocol design. In order to evaluate distance distributions in random networks, the underlying nodal arrangement is almost universally taken to be an infinite Poisson point process. While this assumption is valid in some cases, there are also certain impracticalities to this model. For example, practical networks are non-stationary, and the number of nodes in disjoint areas are not independent. This paper considers a more realistic network model where a finite number of nodes are uniformly randomly distributed in a general d-dimensional ball of radius R and characterizes the distribution of Euclidean distances in the system. The key result is that the probability density function of the distance from the center of the network to its nth nearest neighbor follows a generalized beta distribution. This finding is applied to study network characteristics such as energy consumption, interference, outage and connectivity.

205 citations


Journal ArticleDOI
TL;DR: In this article, a method for the explicit construction of limit state functions using Support Vector Machines (SVM) is presented to handle the difficulties associated with the reliability assessment of problems exhibiting discontinuous responses and disjoint failure domains.

142 citations


Book ChapterDOI
26 Oct 2008
TL;DR: It is shown that DL-safe rules based on the Description Logic Programming (DLP) fragment of OWL 2 can be admitted in ELP without losing tractability.
Abstract: We introduce $\text{\sf{ELP}}$ as a decidable fragment of the Semantic Web Rule Language (SWRL) that admits reasoning in polynomial time $\text{\sf{ELP}}$ is based on the tractable description logic $\mathcal{EL}^{\mathord{+}\mathord{+}}$, and encompasses an extended notion of the recently proposed DL rules for that logic Thus $\text{\sf{ELP}}$ extends $\mathcal{EL}^{\mathord{+}\mathord{+}}$ with a number of features introduced by the forthcoming OWL 2, such as disjoint roles, local reflexivity, certain range restrictions, and the universal role We present a reasoning algorithm based on a translation of $\text{\sf{ELP}}$ to Datalog, and this translation also enables the seamless integration of DL-safe rules into $\text{\sf{ELP}}$ While reasoning with DL-safe rules as such is already highly intractable, we show that DL-safe rules based on the Description Logic Programming (DLP) fragment of OWL 2 can be admitted in $\text{\sf{ELP}}$ without losing tractability

125 citations


Journal ArticleDOI
TL;DR: In this paper, the existence of the outer Minkowski content for d-dimensional closed sets with Lipschitz boundary has been shown to be stable under finite unions of sets with positive reach.
Abstract: We find conditions ensuring the existence of the outer Minkowski content for d-dimensional closed sets in \({\mathbb{R}^d}\) , in connection with regularity properties of their boundaries. Moreover, we provide a class of sets (including all sufficiently regular sets) stable under finite unions for which the outer Minkowski content exists. It follows, in particular, that finite unions of sets with Lipschitz boundary and a type of sets with positive reach belong to this class.

119 citations


Posted Content
TL;DR: Dimitrov and Plaxton as discussed by the authors gave a 2e-competitive algorithm for the secretary problem on graphic matroids, where, with edges appearing online, the goal is to find a maximum-weight acyclic subgraph of a given graph.
Abstract: We examine several online matching problems, with applications to Internet advertising reservation systems. Consider an edge-weighted bipartite graph G, with partite sets L, R. We develop an 8-competitive algorithm for the following secretary problem: Initially given R, and the size of L, the algorithm receives the vertices of L sequentially, in a random order. When a vertex l \in L is seen, all edges incident to l are revealed, together with their weights. The algorithm must immediately either match l to an available vertex of R, or decide that l will remain unmatched. Dimitrov and Plaxton show a 16-competitive algorithm for the transversal matroid secretary problem, which is the special case with weights on vertices, not edges. (Equivalently, one may assume that for each l \in L, the weights on all edges incident to l are identical.) We use a similar algorithm, but simplify and improve the analysis to obtain a better competitive ratio for the more general problem. Perhaps of more interest is the fact that our analysis is easily extended to obtain competitive algorithms for similar problems, such as to find disjoint sets of edges in hypergraphs where edges arrive online. We also introduce secretary problems with adversarially chosen groups. Finally, we give a 2e-competitive algorithm for the secretary problem on graphic matroids, where, with edges appearing online, the goal is to find a maximum-weight acyclic subgraph of a given graph.

116 citations


Proceedings ArticleDOI
20 Jul 2008
TL;DR: This study uncovers components such as divergence from randomness and pivoted document length to be inherent parts of a document-query independence (DQI) measure, and interestingly, an integral of the DQI over the term occurrence probability leads to TF-IDF.
Abstract: Interpretations of TF-IDF are based on binary independence retrieval, Poisson, information theory, and language modelling. This paper contributes a review of existing interpretations, and then, TF-IDF is systematically related to the probabilities P(q|d) and P(d|q). Two approaches are explored: a space of independent, and a space of disjoint terms. For independent terms, an "extreme" query/non-query term assumption uncovers TF-IDF, and an analogy of P(d|q) and the probabilistic odds O(r|d, q) mirrors relevance feedback. For disjoint terms, a relationship between probability theory and TF-IDF is established through the integral + 1/x dx = log x. This study uncovers components such as divergence from randomness and pivoted document length to be inherent parts of a document-query independence (DQI) measure, and interestingly, an integral of the DQI over the term occurrence probability leads to TF-IDF.

108 citations


Journal ArticleDOI
TL;DR: In this paper, a simple derivation of the entanglement entropy for a region made up of a union of disjoint intervals in 1+1 dimensional quantum field theories using holographic techniques is presented.
Abstract: We present a simple derivation of the entanglement entropy for a region made up of a union of disjoint intervals in 1+1 dimensional quantum field theories using holographic techniques. This generalizes the results for 1+1 dimensional conformal field theories derived previously by exploiting the uniformization map. We further comment on the generalization of our result to higher dimensional field theories.

98 citations


Journal ArticleDOI
TL;DR: In this paper, a general limit curve theorem is formulated, which includes the case of converging curves with endpoints and the case in which the limit points assigned since the beginning are one, two, or at most denumerable.
Abstract: The subject of limit curve theorems in Lorentzian geometry is reviewed. A general limit curve theorem is formulated, which includes the case of converging curves with endpoints and the case in which the limit points assigned since the beginning are one, two, or at most denumerable. Some applications are considered. It is proved that in chronological spacetimes, strong causality is either everywhere verified or everywhere violated on maximizing lightlike segments with open domain. As a consequence, if in a chronological spacetime two distinct lightlike lines intersect each other then strong causality holds at their points. Finally, it is proved that two distinct components of the chronology violating set have disjoint closures or there is a lightlike line passing through each point of the intersection of the corresponding boundaries.

86 citations


Journal ArticleDOI
TL;DR: In this paper, a general formulation is proposed, which allows freedom in the form of kinetic interactions, and is suitable for establishing conditions on the existence of one or more disjoint forward-invariant sets for the given system.
Abstract: Many biological systems have the capacity to operate in two distinct modes, in a stable manner. Typically, the system can switch from one stable mode to the other in response to a specific external input. Mathematically, these bistable systems are usually described by models that exhibit (at least) two distinct stable steady states. On the other hand, to capture biological variability, it seems more natural to associate to each stable mode of operation an appropriate invariant set in the state space rather than a single fixed point. A general formulation is proposed in this paper, which allows freedom in the form of kinetic interactions, and is suitable for establishing conditions on the existence of one or more disjoint forward-invariant sets for the given system. Stability with respect to each set is studied in terms of a local notion of input-to-state stability with respect to compact sets. Two well known systems that exhibit bistability are analyzed in this framework: the lac operon and an apoptosis network. For the first example, the question of designing an input that drives the system to switch between modes is also considered.

84 citations


Journal ArticleDOI
TL;DR: An effective variable neighbourhood search heuristic for the capacitated p-median problem (CPMP), a set of n customers is to be partitioned into p disjoint clusters, such that the total dissimilarity within each cluster is minimized subject to constraints on maximum cluster capacity.

77 citations


Journal ArticleDOI
TL;DR: Partial partitions and partial connections (where connected components of a set are mutually disjoint but do not necessarily cover the set) are studied and some methods for generating partial connections are described.
Abstract: In connective segmentation (Serra in J. Math. Imaging Vis. 24(1):83---130, [2006]), each image determines subsets of the space on which it is "homogeneous", in such a way that this family of subsets always constitutes a connection (connectivity class); then the segmentation of the image is the partition of space into its connected components according to that connection. Several concrete examples of connective segmentations or of connections on sets, indicate that the space covering requirement of the partition should be relaxed. Furthermore, morphological operations on partitions require the consideration of wider framework. We study thus partial partitions (families of mutually disjoint non-void subsets of the space) and partial connections (where connected components of a set are mutually disjoint but do not necessarily cover the set). We describe some methods for generating partial connections. We investigate the links between the two lattices of partial connections and of partial partitions. We generalize Serra's characterization of connective segmentation and discuss its relevance. Finally we give some ideas on how the theory of partial connections could lead to improved segmentation algorithms.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated stability issues concerning the radial symmetry of solutions to Serrin's overdetermined problems, and showed that if u is a solution to Δ u = n in a smooth domain Ω ⊂ R n, u = 0 on ∂Ω and | D u | is "close" to 1 on ∆ ∆, then Ω is close to the union of disjoint unitary balls.

Journal ArticleDOI
TL;DR: This paper makes use of an alternative method of finding useful structure in a graph, leading to a proof of the same result with a much smaller value of n0, and gives a polynomial-time algorithm for finding the two cycles.
Abstract: In 1998 Łuczak Rodl and Szemeredi [7] proved, by means of the Regularity Lemma, that there exists n0 such that, for any n ≥ n0 and two-edge-colouring of Kn, there exists a pair of vertex-disjoint monochromatic cycles of opposite colours covering the vertices of Kn. In this paper we make use of an alternative method of finding useful structure in a graph, leading to a proof of the same result with a much smaller value of n0. The proof gives a polynomial-time algorithm for finding the two cycles.

Journal ArticleDOI
TL;DR: In this paper, a general definition of additive abstractions that can be applied to any state space and prove that heuristics based on additive abstraction are consistent as well as admissible is presented.
Abstract: Informally, a set of abstractions of a state space S is additive if the distance between any two states in S is always greater than or equal to the sum of the corresponding distances in the abstract spaces. The first known additive abstractions, called disjoint pattern databases, were experimentally demonstrated to produce state of the art performance on certain state spaces. However, previous applications were restricted to state spaces with special properties, which precludes disjoint pattern databases from being defined for several commonly used testbeds, such as Rubik's Cube, TopSpin and the Pancake puzzle. In this paper we give a general definition of additive abstractions that can be applied to any state space and prove that heuristics based on additive abstractions are consistent as well as admissible. We use this new definition to create additive abstractions for these testbeds and show experimentally that well chosen additive abstractions can reduce search time substantially for the (18,4)-TopSpin puzzle and by three orders of magnitude over state of the art methods for the 17-Pancake puzzle. We also derive a way of testing if the heuristic value returned by additive abstractions is provably too low and show that the use of this test can reduce search time for the 15-puzzle and TopSpin by roughly a factor of two.

Proceedings ArticleDOI
14 Apr 2008
TL;DR: The main results are applied to recursive circulant G and a subclass of hypercube-like interconnection networks, called restricted HL-graphs, and all these networks of degree m with f or less faulty elements have a many-to-many k-DPC joining any k distinct source-sink pairs.
Abstract: A paired many-to-many k-disjoint path cover (k-DPC) of a graph G is a set of k disjoint paths joining k distinct source-sink pairs in which each vertex of G is covered by a path. This paper is concerned with paired many-to-many disjoint path coverability of hypercube-like interconnection networks, called restricted HL-graphs. The class includes twisted cubes, crossed cubes, multiply twisted cubes, Mobius cubes, Mcubes, and generalized twisted cubes. We show that every restricted HL-graph of degree m with f or less faulty elements has a paired many-to-many k-DPC for any f and k ges 2 with f + 2 k les m. The result improves the known bound of f + 2 k les m - 1 by one.

Journal ArticleDOI
TL;DR: A latin bitrade is a pair of partial latin squares which are disjoint, occupy the same set of non-empty cells, and whose corresponding rows and columns contain the same sets of symbols as mentioned in this paper.
Abstract: A latin bitrade is a pair of partial latin squares which are disjoint, occupy the same set of non-empty cells, and whose corresponding rows and columns contain the same sets of symbols. This survey paper summarizes the theory of latin bitrades, detailing their applications to critical sets, random latin squares and existence constructions for latin squares.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a simple scheme to compute the characteristics of the critical topology of the quantum ensemble control landscapes, showing that the set of disjoint critical submanifolds one-to-one corresponds to a finite number of contingency tables that solely depend on the degeneracy structure of the eigenvalues of the initial system density matrix and the observable whose expectation value is to be maximized.
Abstract: The quantum control landscape is defined as the functional that maps the control variables to the expectation values of an observable over the ensemble of quantum systems. Analyzing the topology of such landscapes is important for understanding the origins of the increasing number of laboratory successes in the optimal control of quantum processes. This paper proposes a simple scheme to compute the characteristics of the critical topology of the quantum ensemble control landscapes showing that the set of disjoint critical submanifolds one-to-one corresponds to a finite number of contingency tables that solely depend on the degeneracy structure of the eigenvalues of the initial system density matrix and the observable whose expectation value is to be maximized. The landscape characteristics can be calculated as functions of the table entries, including the dimensions and the numbers of positive and negative eigenvalues of the Hessian quadratic form of each of the connected components of the critical submanifolds. Typical examples are given to illustrate the effectiveness of this method.

Proceedings ArticleDOI
25 Oct 2008
TL;DR: It is shown that any solution can be decomposed into a disjoint collection of multiple-legged spiders, which are then used to re-route flow from terminals to the source via other terminals, achieving an O(k7log2 n)-approximation of the survivable network design problem.
Abstract: In the survivable network design problem (SNDP) the goal is to find a minimum cost subset of edges that satisfies a given set of pairwise connectivity requirements among the vertices. This general network design framework has been studied extensively and is tied to the development of major algorithmic techniques. For the edge-connectivity version of the problem, a 2-approximation algorithm is known for arbitrary pairwise connectivity requirements. However, no non-trivial algorithms are known for its vertex connectivity counterpart. In fact, even highly restricted special cases of the vertex connectivity version remain poorly understood.We study the single-source k-vertex connectivity version of SNDP. We are given a graph G(V,E) with a subset T of terminals and a source vertex s, and the goal is to find a minimum cost subset of edges ensuring that every terminal is k-vertex connected to s. Our main result is an O(k log n)-approximation algorithm for this problem; this improves upon the recent 2O(k 2 )log4 n-approximation. Our algorithm is based on an intuitive rerouting scheme. The analysis relies on a structural result that may be of independent interest: we show that any solution can be decomposed into a disjoint collection of multiple-legged spiders, which are then used to re-route flow from terminals to the source via other terminals.We also obtain the first non-trivial approximation algorithm for the vertex-cost version of the same problem, achieving an O(k7log2 n)-approximation.

Journal ArticleDOI
20 Jan 2008
TL;DR: The necessary and sufficient condition that for a directed graph D=(V,A) with a specified vertex s∈V, there are k arc-disjoint in-trees rooted at s each of which spans V is presented.
Abstract: Given a directed graph D = (V, A) and a set of specified vertices S = {s1,…,sd} ⊆ V with |S| = d and a function f: S → N where N denotes the set of natural numbers, we present a necessary and sufficient condition that there exist Σsi e arc-disjoint in-trees denoted by Ti,1,Ti,2,…,Tif (si) for every i = 1,…,d such that Ti,1,…, Ti,f(si) are rooted at si and each Ti,j spans vertices from which si is reachable. This generalizes the result of Edmonds [2], i.e., the necessary and sufficient condition that for a directed graph D = (V,A) with a specified vertex s e V, there are k arc-disjoint in-trees rooted at s each of which spans V. Furthermore, we extend another characterization of packing in-trees of Edmonds [1] to the one in our case.

Journal ArticleDOI
TL;DR: It is proved that there is a bijection between $k$-noncrossing and $k-nonnesting partitions, with a notion of crossing and nesting based on the canonical sequence, which yields new combinatorial interpretations of the Catalan numbers and the Stirling numbers.
Abstract: A set partition of size $n$ is a collection of disjoint blocks $B_1,B_2,\ldots$, $B_d$ whose union is the set $[n]=\{1,2,\ldots,n\}$ We choose the ordering of the blocks so that they satisfy $\min B_1 canonical sequence $\pi_1,\pi_2,\ldots,\pi_n$, with $\pi_i=j$ if $i\in B_j$ We say that a partition $\pi$ contains a partition $\sigma$ if the canonical sequence of $\pi$ contains a subsequence that is order-isomorphic to the canonical sequence of $\sigma$ Two partitions $\sigma$ and $\sigma'$ are equivalent , if there is a size-preserving bijection between $\sigma$-avoiding and $\sigma'$-avoiding partitions We determine all the equivalence classes of partitions of size at most $7$ This extends previous work of Sagan, who described the equivalence classes of partitions of size at most $3$ Our classification is largely based on several new infinite families of pairs of equivalent patterns For instance, we prove that there is a bijection between $k$-noncrossing and $k$-nonnesting partitions, with a notion of crossing and nesting based on the canonical sequence Our results also yield new combinatorial interpretations of the Catalan numbers and the Stirling numbers

Proceedings ArticleDOI
20 Jan 2008
TL;DR: The first polylogarithmic approximation for generalized connectivity attaining a performance guarantee of O(log2 n log2 k) is presented; this result improves on the previously known ratio which can be Ω(n) in the worst case.
Abstract: In the generalized connectivity problem, we are given an edge-weighted graph G = (V, E) and a collection D = {(S1,T1),…, (Sk,Tk)} of distinct demands; each demand (Si, Ti) is a pair of disjoint vertex subsets. We say that a subgraph F ⊆ G connects a demand (Si, Ti) when it contains a path with one endpoint in Si and the other in Ti. The goal is to identify a minimum weight subgraph that connects all demands in D. Alon et al. (SODA '04) introduced this problem to study online network formation settings and showed that it captures some well-studied problems such as Steiner forest, non-metric facility location, tree multicast, and group Steiner tree. Finding a non-trivial approximation ratio for generalized connectivity was left as an open problem. Our starting point is the first polylogarithmic approximation for generalized connectivity attaining a performance guarantee of O(log2 n log2 k). Here n is the number of vertices in G and k is the number of demands. We also prove that the cut-covering relaxation of this problem has an O(log3 n log2 k) integrality gap. Building upon the results for generalized connectivity we obtain improved approximation algorithms for two problems that contain generalized connectivity as a special case. For the directed Steiner network problem, we obtain an O(k1/2+e) approximation, which improves on the currently best performance guarantee of O(k2/3) due to Charikar et al. (SODA '98). For the set connector problem, recently introduced by Fukunaga and Nagamochi (IPCO '07), we present a polylogarithmic approximation; this result improves on the previously known ratio which can be Ω(n) in the worst case.

Proceedings Article
13 Jul 2008
TL;DR: A general data clustering algorithm which is based on the asymmetric pairwise measure of Markov random walk hitting time on directed graphs and is able to conquer some limitations of traditional pairwise similarity based methods is presented.
Abstract: In this paper, we present a general data clustering algorithm which is based on the asymmetric pairwise measure of Markov random walk hitting time on directed graphs. Unlike traditional graph based clustering methods, we do not explicitly calculate the pairwise similarities between points. Instead, we form a transition matrix of Markov random walk on a directed graph directly from the data. Our algorithm constructs the probabilistic relations of dependence between local sample pairs by studying the local distributions of the data. Such dependence relations are asymmetric, which is a more general measure of pairwise relations than the similarity measures in traditional undirected graph based methods in that it considers both the local density and geometry of the data. The probabilistic relations of the data naturally result in a transition matrix of Markov random walk. Based on the random walk viewpoint, we compute the expected hitting time for all sample pairs, which explores the global information of the structure of the underlying directed graph. An asymmetric measure based clustering algorithm, called K-destinations, is proposed for partitioning the nodes of the directed graph into disjoint sets. By utilizing the local distribution information of the data and the global structure information of the directed graph, our method is able to conquer some limitations of traditional pairwise similarity based methods. Experimental results are provided to validate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: Havlicek and Saniga as mentioned in this paper made an algebraic geometrical study of a single d-dimensional qudit, with d being any positive integer, based on an intricate relation between the symplectic module of the generalized Pauli group of the qudit and the fine structure of the projective line over the modular ring.
Abstract: As a continuation of our previous work (Havlicek and Saniga 2007 J. Phys. A: Math. Gen. 40 F943–52 (Preprint 0708.4333)) an algebraic geometrical study of a single d-dimensional qudit is made, with d being any positive integer. The study is based on an intricate relation between the symplectic module of the generalized Pauli group of the qudit and the fine structure of the projective line over the (modular) ring . Explicit formulae are given for both the number of generalized Pauli operators commuting with a given one and the number of points of the projective line containing the corresponding vector of . We find, remarkably, that a perp-set is not a set-theoretic union of the corresponding points of the associated projective line unless d is a product of distinct primes. The operators are also seen to be structured into disjoint 'layers' according to the degree of their representing vectors. A brief comparison with some multiple-qudit cases is made.

Journal ArticleDOI
TL;DR: This work presents a linear-time algorithm for computing a triangulation of n points in 2D whose positions are constrained to n disjoint disks of uniform size, after O(nlogn) preprocessing applied to these disks.

Journal ArticleDOI
TL;DR: It is proved that deciding whether there exist k pairwise vertex/edge disjoint properly edge-colored s-t paths/trails in a c-edge-colored graph G^c is NP-complete even for k=2 and c=@W(n^2), where n denotes the number of vertices in G^ c.

Journal ArticleDOI
TL;DR: Both the NP-completeness of pic for planar cubic graphs and the Max SNP-hardness of picFor cubic graphs are established and a deterministic polynomial time 54-approximation algorithm for finding clique partitions in maximum degree three graphs is presented.

Journal ArticleDOI
24 Jun 2008
TL;DR: In this article, it was shown that a continuous map f acting on a compact metric space (X, p) with a weaker form of specification property and with a pair of distal points is distributionally chaotic in a very strong sense.
Abstract: Our main result shows that a continuous map f acting on a compact metric space (X, p) with a weaker form of specification property and with a pair of distal points is distributionally chaotic in a very strong sense. Strictly speaking, there is a distributionally scrambled set S dense in X which is the union of disjoint sets homeomorphic to Cantor sets so that, for any two distinct points u, v S, the upper distribution function is identically 1 and the lower distribution function is zero at some £ > 0. As a consequence, we describe a class of maps with a scrambled set of full Lebesgue measure in the case when X is the k-dimensional cube I k . If X = I, then we can even construct scrambled sets whose complements have zero Hausdorff dimension.

Journal ArticleDOI
TL;DR: The Matching Composition Network (MCN) is a family of networks which two components are connected by a perfect matching and the globally two-equal-disjoint path cover property of MCN is considered.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: Preliminary experimental results show that the capacity of bi-decomposition can be scaled up substantially to handle large designs, and interpolation and incremental SAT solving are proposed.
Abstract: Boolean function bi-decomposition is a fundamental operation in logic synthesis. A function f(X) is bi-decomposable under a variable partition XA, XB, XC on X if it can be written as h(fA(XA, XC), fB(XB, XC)) for some functions h, Ja, and /#. The quality of a bi-decomposition is mainly determined by its variable partition. A preferred decomposition is disjoint, i.e. XC = Oslash, and balanced, i.e. |XA| ap |XB|. Finding such a good decomposition reduces communication and circuit complexity, and yields simple physical design solutions. Prior BDD-based methods may not be scalable to decompose large functions due to the memory explosion problem. Also as decomposability is checked under a fixed variable partition, searching a good or feasible partition may run through costly enumeration that requires separate and independent decomposability checkings. This paper proposes a solution to these difficulties using interpolation and incremental SAT solving. Preliminary experimental results show that the capacity of bi-decomposition can be scaled up substantially to handle large designs.

Patent
23 May 2008
TL;DR: A method for identifying emerging concepts in unstructured text streams comprises: selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint inasmuch as each document of U is included in only one category of the partition.
Abstract: A method for identifying emerging concepts in unstructured text streams comprises: selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint inasmuch as each document of U is included in only one category of the partition; and generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category.