scispace - formally typeset
Search or ask a question
Conference

Pacific-Asia Conference on Knowledge Discovery and Data Mining 

About: Pacific-Asia Conference on Knowledge Discovery and Data Mining is an academic conference. The conference publishes majorly in the area(s): Computer science & Cluster analysis. Over the lifetime, 1889 publications have been published by the conference receiving 23058 citations.


Papers
More filters
Book ChapterDOI
14 Apr 2013
TL;DR: This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.
Abstract: We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data

1,132 citations

Book ChapterDOI
26 May 2004
TL;DR: A new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm is presented, which beat accuracy of existing methods with statistically significant improvements.
Abstract: In this paper we present methods of enhancing existing discriminative classifiers for multi-labeled predictions. Discriminative methods like support vector machines perform very well for uni-labeled text classification tasks. Multi-labeled classification is a harder task subject to relatively less attention. In the multi-labeled setting, classes are often related to each other or part of a is-a hierarchy. We present a new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm. We also present two enhancements to the margin of SVMs for building better models in the presence of overlapping classes. We present results of experiments on real world text benchmark datasets. Our new methods beat accuracy of existing methods with statistically significant improvements.

746 citations

Book ChapterDOI
18 Apr 2000
TL;DR: A novel data structure, called Web access pattern tree, or WAP-tree in short, is developed for efficient mining of access patterns from pieces of logs for access pattern mining.
Abstract: With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity. Web access pattern, which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. In this paper, we study the problem of mining access patterns from Web logs efficiently. A novel data structure, called Web access pattern tree, or WAP-tree in short, is developed for efficient mining of access patterns from pieces of logs. The Web access pattern tree stores highly compressed, critical information for access pattern mining and facilitates the development of novel algorithms for mining access patterns in large set of log pieces. Our algorithm can find access patterns from Web logs quite efficiently. The experimental and performance studies show that our method is in general an order of magnitude faster than conventional methods.

572 citations

Book ChapterDOI
16 Apr 2001
TL;DR: In this article, a Weight Adjusted k-Nearest Neighbor (WAKNN) classification method was proposed to learn feature weights based on a greedy hill climbing technique, and two performance optimizations of WAKNN were proposed to improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality.
Abstract: Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.

446 citations

Book ChapterDOI
26 May 2004
TL;DR: It is argued that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it.
Abstract: Empirical research in learning algorithms for classification tasks generally requires the use of significance tests The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should) In this paper we argue that the replicability of a test is also of importance We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets Based on our results we give recommendations on which test to use

345 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
2023131
2022124
2021174
2020154
2019167
2018196