Journal•ISSN: 0219-3116

Knowledge and Information Systems

Springer Science+Business Media

About: Knowledge and Information Systems is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Computer science & Cluster analysis. It has an ISSN identifier of 0219-3116. Over the lifetime, 2090 publications have been published receiving 72555 citations. The journal is also known as: KAIS (Print) & KAIS (Online).

...read moreread less

Topics: Computer science, Cluster analysis, Data stream mining, Knowledge extraction, Feature selection ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Top 10 algorithms in data mining

[...]

Xindong Wu¹, Vipin Kumar², J. Ross Quinlan, Joydeep Ghosh³, Qiang Yang⁴, Hiroshi Motoda⁵, Geoffrey J. McLachlan⁶, Angus S. K. Ng⁷, Bing Liu⁸, Philip S. Yu⁹, Zhi-Hua Zhou¹⁰, Michael Steinbach², David J. Hand¹¹, Dan Steinberg¹² - Show less +10 more•Institutions (12)

University of Vermont¹, University of Minnesota², University of Texas at Austin³, Hong Kong University of Science and Technology⁴, Osaka University⁵, University of Queensland⁶, Griffith University⁷, University of Illinois at Chicago⁸, IBM⁹, Nanjing University¹⁰, Imperial College London¹¹, University of Salford¹²

19 Dec 2007-Knowledge and Information Systems

TL;DR: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART.

...read moreread less

Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

...read moreread less

4,944 citations

Journal Article•DOI•

Exact indexing of dynamic time warping

[...]

Eamonn Keogh¹, Chotirat Ann Ratanamahatana¹•Institutions (1)

University of California, Riverside¹

01 Mar 2005-Knowledge and Information Systems

TL;DR: This work introduces a novel technique for the exact indexing of Dynamic time warping and proves its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.

...read moreread less

Abstract: The problem of indexing time series has attracted much interest. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry and finance. Unfortunately, however, DTW does not obey the triangular inequality and thus has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques or abandoned the idea of indexing and concentrated on speeding up sequential searches. In this work, we introduce a novel technique for the exact indexing of DTW. We prove that our method guarantees no false dismissals and we demonstrate its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.

...read moreread less

1,925 citations

Journal Article•DOI•

Data Preparation for Mining World Wide Web Browsing Patterns

[...]

Robert Cooley¹, Bamshad Mobasher¹, Jaideep Srivastava¹•Institutions (1)

University of Minnesota¹

01 Feb 1999-Knowledge and Information Systems

TL;DR: This paper presents several data preparation techniques in order to identify unique users and user sessions and Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system.

...read moreread less

Abstract: The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of traffic and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. This paper presents several data preparation techniques in order to identify unique users and user sessions. Also, a method to divide user sessions into semantically meaningful transactions is defined and successfully tested against two other methods. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [15].

...read moreread less

1,616 citations

Journal Article•DOI•

Dimensionality reduction for fast similarity search in large time series databases

[...]

Eamonn Keogh¹, Kaushik Chakrabarti², Michael J. Pazzani¹, Sharad Mehrotra¹•Institutions (2)

University of California, Irvine¹, University of Illinois at Urbana–Champaign²

01 Aug 2001-Knowledge and Information Systems

TL;DR: This work introduces a new dimensionality reduction technique which it is called Piecewise Aggregate Approximation (PAA), and theoretically and empirically compare it to the other techniques and demonstrate its superiority.

...read moreread less

Abstract: The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed: Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.

...read moreread less

1,550 citations

Journal Article•DOI•

Defining and evaluating network communities based on ground-truth

[...]

Jaewon Yang¹, Jure Leskovec¹•Institutions (1)

Stanford University¹

01 Jan 2015-Knowledge and Information Systems

TL;DR: In this article, the authors distinguish between structural and functional definitions of network communities and identify networks with explicitly labeled functional communities to which they refer as ground-truth communities, where nodes explicitly state their community memberships and use such social groups to define a reliable and robust notion of groundtruth communities.

...read moreread less

Abstract: Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

...read moreread less

1,518 citations

Collapse

Performance

Metrics

2,126

Papers

72,657

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	114
2022	195
2021	119
2020	158
2019	176
2018	109