There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.

Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

I and i

The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

/pdf/data-mining-concepts-and-techniques-4dtvdfkvmi.pdf

Data Mining: Concepts and Techniques

Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

Data Mining: Practical Machine Learning Tools and Techniques

A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

/pdf/a-survey-on-transfer-learning-1hjmu3otql.pdf

A Survey on Transfer Learning

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Following the intuition that the naturally occurring face data may be generated by sampling a probability distribution that has support on or near a submanifold of ambient space, we propose an appearance-based face recognition method, called orthogonal Laplacianface. Our algorithm is based on the locality preserving projection (LPP) algorithm, which aims at finding a linear approximation to the eigenfunctions of the Laplace Beltrami operator on the face manifold. However, LPP is nonorthogonal, and this makes it difficult to reconstruct the data. The orthogonal locality preserving projection (OLPP) method produces orthogonal basis functions and can have more locality preserving power than LPP. Since the locality preserving power is potentially related to the discriminating power, the OLPP is expected to have more discriminating power than LPP. Experimental results on three face databases demonstrate the effectiveness of our proposed algorithm

/pdf/orthogonal-laplacianfaces-for-face-recognition-s0gyp7nygv.pdf

Orthogonal Laplacianfaces for Face Recognition

Many real-world datasets are comprised of different representations or views which often provide information complementary to each other. To integrate information from multiple views in the unsupervised setting, multiview clustering algorithms have been developed to cluster multiple views simultaneously to derive a solution which uncovers the common latent structure shared by multiple views. In this paper, we propose a novel NMFbased multi-view clustering algorithm by searching for a factorization that gives compatible clustering solutions across multiple views. The key idea is to formulate a joint matrix factorization process with the constraint that pushes clustering solution of each view towards a common consensus instead of fixing it directly. The main challenge is how to keep clustering solutions across different views meaningful and comparable. To tackle this challenge, we design a novel and effective normalization strategy inspired by the connection between NMF and PLSA. Experimental results on synthetic and several real datasets demonstrate the effectiveness of our approach.

/pdf/multi-view-clustering-via-joint-nonnegative-matrix-1bnlg945le.pdf

Multi-view clustering via joint nonnegative matrix factorization

Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no sufficient training samples, the covariance matrix of each class may not be accurately estimated. In this paper, we propose a novel method, called Semi- supervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples. The labeled data points are used to maximize the separability between different classes and the unlabeled data points are used to estimate the intrinsic geometric structure of the data. Specifically, we aim to learn a discriminant function which is as smooth as possible on the data manifold. Experimental results on single training image face recognition and relevance feedback image retrieval demonstrate the effectiveness of our algorithm.

/pdf/semi-supervised-discriminant-analysis-4hmzljnc1b.pdf

Semi-supervised Discriminant Analysis

PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Introduction Harvey J. Miller and Jiawei Han Spatiotemporal Data Mining Paradigms and Methodologies John F. Roddick and Brian G. Lees Fundamentals of Spatial Data Warehousing for Geographic Knowledge Discovery Yvan Bedard and Jiawei Han Analysis of Spatial Data with Map Cubes: Highway Traffic Data Chang-Tien Lu, Arnold P. Boedihardjo, and Shashi Shekhar NEW! Data Quality Issues and Geographic Knowledge Discovery Marc Gervais, Yvan Bedard, Marie-Andree Levesque, Eveline Bernier, and Rodolphe Devillers Spatial Classification and Prediction Models for Geospatial Data Mining Shashi Shekhar, Ranga Raju Vatsavai, and Sanjay Chawla An Overview of Clustering Methods in Geographic Data Analysis Jiawei Han, Jae-Gil Lee, and Micheline Kamber NEW! Computing Medoids in Large Spatial Datasets Kyriakos Mouratidis, Dimitris Papadias, Spiros Papadimitriou NEW! Looking for a Relationship? Try GWR A. Stewart Fotheringham, Martin Charlton, and Urska Demsar Leveraging the Power of Spatial Data Mining to Enhance the Applicability of GIS Technology Donato Malerba, Antonietta Lanza, and Annalisa Appice Visual Exploration and Explanation in Geography: Analysis with Light Mark Gahegan NEW! Multivariate Spatial Clustering and Geovisualization Diansheng Guo NEW! Toward Knowledge Discovery about Geographic Dynamics in Spatiotemporal Databases} May Yuan NEW! The Role of a Multitier Ontological Framework in Reasoning to Discover Meaningful Patterns of Sustainable Mobility Monica Wachowicz, Jose Macedo, Chiara Renso, and Arend Ligtenberg NEW! Periodic Pattern Discovery from Trajectories of Moving Objects Huiping Cao, Nikos Mamoulis, and David W. Cheung NEW! Decentralized Spatial Data Mining for Geosensor Networks Patrick Laube and Matt Duckham NEW! Beyond Exploratory Visualization of Space-Time Paths Menno-Jan Kraak and Otto Huisman

/pdf/geographic-data-mining-and-knowledge-discovery-3t9hotpmfs.pdf

Jiawei Han

Papers

Orthogonal Laplacianfaces for Face Recognition

Multi-view clustering via joint nonnegative matrix factorization

Semi-supervised Discriminant Analysis

PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Geographic Data Mining and Knowledge Discovery