Journal•

Sigkdd Explorations

About: Sigkdd Explorations is an academic journal. The journal publishes majorly in the area(s): Knowledge extraction & Web mining. Over the lifetime, 485 publications have been published receiving 69089 citations.

...read moreread less

Topics: Knowledge extraction, Web mining, Data stream mining, Cluster analysis, Association rule learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The WEKA data mining software: an update

[...]

Mark Hall, Eibe Frank¹, Geoffrey Holmes¹, Bernhard Pfahringer¹, Peter Reutemann¹, Ian H. Witten¹ - Show less +2 more•Institutions (1)

University of Waikato¹

16 Nov 2009-Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

19,603 citations

Journal Article•DOI•

A study of the behavior of several methods for balancing machine learning training data

[...]

Gustavo E. A. P. A. Batista¹, Ronaldo C. Prati¹, Maria Carolina Monard¹•Institutions (1)

Spanish National Research Council¹

01 Jun 2004-Sigkdd Explorations

TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).

...read moreread less

Abstract: There are several aspects that might influence the performance achieved by existing learning systems. It has been reported that one of these aspects is related to class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. In this situation, which is found in real world data describing an infrequent but important event, the learning system may have difficulties to learn the concept related to the minority class. In this work we perform a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets. Our experiments provide evidence that class imbalance does not systematically hinder the performance of learning systems. In fact, the problem seems to be related to learning with too few minority class examples in the presence of other complicating factors, such as class overlapping. Two of our proposed methods deal with these conditions directly, allying a known over-sampling method with data cleaning methods in order to produce better-defined class clusters. Our comparative experiments show that, in general, over-sampling methods provide more accurate results than under-sampling methods considering the area under the ROC curve (AUC). This result seems to contradict results previously published in the literature. Two of our proposed methods, Smote + Tomek and Smote + ENN, presented very good results for data sets with a small number of positive examples. Moreover, Random over-sampling, a very simple over-sampling method, is very competitive to more complex over-sampling methods. Since the over-sampling methods provided very good performance results, we also measured the syntactic complexity of the decision trees induced from over-sampled data. Our results show that these trees are usually more complex then the ones induced from original data. Random over-sampling usually produced the smallest increase in the mean number of induced rules and Smote + ENN the smallest increase in the mean number of conditions per rule, when compared among the investigated over-sampling methods.

...read moreread less

2,914 citations

Journal Article•DOI•

Activity recognition using cell phone accelerometers

[...]

Jennifer R. Kwapisz¹, Gary M. Weiss¹, Samuel A. Moore¹•Institutions (1)

Fordham University¹

31 Mar 2011-Sigkdd Explorations

TL;DR: This work describes and evaluates a system that uses phone-based accelerometers to perform activity recognition, a task which involves identifying the physical activity a user is performing, and has a wide range of applications, including automatic customization of the mobile device's behavior based upon a user's activity.

...read moreread less

Abstract: Mobile devices are becoming increasingly sophisticated and the latest generation of smart cell phones now incorporates many diverse and powerful sensors These sensors include GPS sensors, vision sensors (ie, cameras), audio sensors (ie, microphones), light sensors, temperature sensors, direction sensors (ie, magnetic compasses), and acceleration sensors (ie, accelerometers) The availability of these sensors in mass-marketed communication devices creates exciting new opportunities for data mining and data mining applications In this paper we describe and evaluate a system that uses phone-based accelerometers to perform activity recognition, a task which involves identifying the physical activity a user is performing To implement our system we collected labeled accelerometer data from twenty-nine users as they performed daily activities such as walking, jogging, climbing stairs, sitting, and standing, and then aggregated this time series data into examples that summarize the user activity over 10- second intervals We then used the resulting training data to induce a predictive model for activity recognition This work is significant because the activity recognition model permits us to gain useful knowledge about the habits of millions of users passively---just by having them carry cell phones in their pockets Our work has a wide range of applications, including automatic customization of the mobile device's behavior based upon a user's activity (eg, sending calls directly to voicemail if a user is jogging) and generating a daily/weekly activity profile to determine if a user (perhaps an obese child) is performing a healthy amount of exercise

...read moreread less

2,417 citations

Journal Article•DOI•

Web usage mining: discovery and applications of usage patterns from Web data

[...]

Jaideep Srivastava¹, Robert Cooley¹, Mukund Deshpande¹, Pang-Ning Tan¹•Institutions (1)

University of Minnesota¹

01 Jan 2000-Sigkdd Explorations

TL;DR: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications as mentioned in this paper, where preprocessing, pattern discovery, and pattern analysis are described in detail.

...read moreread less

Abstract: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

...read moreread less

2,227 citations

Journal Article•DOI•

Editorial: special issue on learning from imbalanced data sets

[...]

Nitesh V. Chawla, Nathalie Japkowicz¹, Aleksander Kotcz•Institutions (1)

University of Ottawa¹

01 Jun 2004-Sigkdd Explorations

1,897 citations