Conference

SIAM International Conference on Data Mining

About: SIAM International Conference on Data Mining is an academic conference. The conference publishes majorly in the area(s): Cluster analysis & Feature selection. Over the lifetime, 1807 publications have been published by the conference receiving 77280 citations.

...read moreread less

Topics: Cluster analysis, Feature selection, Correlation clustering, Graph (abstract data type), Support vector machine ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Clustering with Bregman Divergences

[...]

Arindam Banerjee¹, Srujana Merugu¹, Inderjit S. Dhillon¹, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Dec 2005

TL;DR: This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence.

...read moreread less

Abstract: A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans , the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the method to a large class of clustering loss functions. This is achieved by first posing the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate distortion theory, and then deriving an iterative algorithm that monotonically decreases this loss. In addition, we show that there is a bijection between regular exponential families and a large class of Bregman divergences, that we call regular Bregman divergences. This result enables the development of an alternative interpretation of an efficient EM scheme for learning mixtures of exponential family distributions, and leads to a simple soft clustering algorithm for regular Bregman divergences. Finally, we discuss the connection between rate distortion theory and Bregman clustering and present an information theoretic analysis of Bregman clustering algorithms in terms of a trade-off between compression and loss in Bregman information.

...read moreread less

1,723 citations

Proceedings Article•

Learning from Time-Changing Data with Adaptive Windowing

[...]

Albert Bifet¹, Ricard Gavaldà¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Jan 2007

TL;DR: A new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time is presented, using sliding windows whose size is recomputed online according to the rate of change observed from the data in the window itself.

...read moreread less

Abstract: We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself: The window will grow automatically when the data is stationary, for greater accuracy, and will shrink automatically when change is taking place, to discard stale data. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. In fact, for some change structures, we can formally show that the algorithm automatically adjusts the window to a statistically optimal length. Using ideas from data stream algorithmics, we develop a time- and memory-ecient version of this algorithm, called ADWIN2. We show how to incorporate this strategy easily into

...read moreread less

1,267 citations

Proceedings Article•

R-MAT: A Recursive Model for Graph Mining

[...]

Deepayan Chakrabarti, Yiping Zhan, Christos Faloutsos¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2004

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.

...read moreread less

Abstract: How does a ‘normal’ computer (or social) network look like? How can we spot ‘abnormal’ sub-networks in the Internet, or web graph? The answer to such questions is vital for outlier detection (terrorist networks, or illegal money-laundering rings), forecasting, and simulations (“how will a computer virus spread?”). The heart of the problem is finding the properties of real graphs that seem to persist over multiple disciplines. We list such “laws” and, more importantly, we propose a simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters. Contrary to existing generators, our model can trivially generate weighted, directed and bipartite graphs; it subsumes the celebrated Erdős-Renyi model as a special case; it can match the power law behaviors, as well as the deviations from them (like the “winner does not take it all” model of Pennock et al. [20]). We present results on multiple, large real graphs, where we show that our parameter fitting algorithm (AutoMAT-fast) fits them very well.

...read moreread less

1,248 citations

Proceedings Article•

Derivative Dynamic Time Warping.

[...]

Eamonn Keogh¹, Michael J. Pazzani¹•Institutions (1)

University of California, Irvine¹

01 Jan 2001

TL;DR: Dynamic time warping (DTW), is a technique for efficiently achieving this warping of sequences that have the approximately the same overall component shapes, but these shapes do not line up in X-axis.

...read moreread less

Abstract: Time series are a ubiquitous form of data occurring in virtually every scientific discipline. A common task with time series data is comparing one sequence with another. In some domains a very simple distance measure, such as Euclidean distance will suffice. However, it is often the case that two sequences have the approximately the same overall component shapes, but these shapes do not line up in X-axis. Figure 1 shows this with a simple example. In order to find the similarity between such sequences, or as a preprocessing step before averaging them, we must "warp" the time axis of one (or both) sequences to achieve a better alignment. Dynamic time warping (DTW), is a technique for efficiently achieving this warping. In addition to data mining (Keogh & Pazzani 2000, Yi et. al. 1998, Berndt & Clifford 1994), DTW has been used in gesture recognition (Gavrila & Davis 1995), robotics (Schmill et. al 1999), speech processing (Rabiner & Juang 1993), manufacturing (Gollmer & Posten 1995) and medicine (Caiani et. al 1998).

...read moreread less

1,131 citations

Proceedings Article•

CHARM : An Efficient Algorithm for Closed Itemset Mining

[...]

Mohammed J. Zaki¹, Ching-Jiu Hsiao•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Jan 2002

TL;DR: CHARM is an efficient algorithm for mining all frequent closed itemsets that enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.

...read moreread less

Abstract: The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hash-based approach to remove any “non-closed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.

...read moreread less

1,068 citations

Collapse

Performance

Metrics

1,807

Papers

77,280

Citations

No. of papers from the Conference in previous years
Year	Papers
2021	50
2020	63
2019	85
2018	87
2017	98
2016	103