scispace - formally typeset
Conference

Knowledge Discovery and Data Mining

About: Knowledge Discovery and Data Mining is an academic conference. The conference publishes majorly in the area(s): Cluster analysis & Knowledge extraction. Over the lifetime, 6703 publication(s) have been published by the conference receiving 525894 citation(s).

...read more

Papers
  More

Open accessProceedings Article
02 Aug 1996-
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read more

Topics: OPTICS algorithm (77%), SUBCLU (76%), DBSCAN (72%) ...read more

14,552 Citations


Open accessProceedings Article
01 Jan 1996-
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read more

Topics: OPTICS algorithm (76%), SUBCLU (75%), DBSCAN (72%) ...read more

14,280 Citations


Open accessProceedings ArticleDOI: 10.1145/2939672.2939785
Tianqi Chen1, Carlos Guestrin1Institutions (1)
13 Aug 2016-
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

...read more

Topics: Incremental decision tree (64%), Gradient boosting (61%), ID3 algorithm (60%) ...read more

10,428 Citations


Open accessProceedings ArticleDOI: 10.1145/1014052.1014073
Minqing Hu1, Bing Liu1Institutions (1)
22 Aug 2004-
Abstract: Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

...read more

6,565 Citations


Proceedings ArticleDOI: 10.1145/2939672.2939778
13 Aug 2016-
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

...read more

6,284 Citations


Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
2021442
2020433
2019384
2018321
2017276
2016313

Top Attributes

Show by:

Conference's top 5 most impactful authors

Jiawei Han

116 papers, 13.9K citations

Christos Faloutsos

76 papers, 13.7K citations

Philip S. Yu

68 papers, 7.2K citations

Hui Xiong

55 papers, 3.1K citations

Jian Pei

34 papers, 2.8K citations

Network Information
Related Conferences (5)
International Conference on Data Mining

6.4K papers, 166.4K citations

96% related
Conference on Information and Knowledge Management

7K papers, 191.8K citations

94% related
Web Search and Data Mining

1.2K papers, 72.4K citations

93% related
European conference on Machine Learning

2.7K papers, 94.6K citations

93% related