scispace - formally typeset
Search or ask a question
Author

Jiawei Han

Bio: Jiawei Han is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Cluster analysis & Knowledge extraction. The author has an hindex of 168, co-authored 1233 publications receiving 143427 citations. Previous affiliations of Jiawei Han include Georgia Institute of Technology & United States Army Research Laboratory.


Papers
More filters
01 Jan 2011
TL;DR: This dissertation design and implement techniques for directly mining discriminative patterns from a numeric valued feature set of k-embedded edge subtrees given labeled training data, and mining top correlated patterns from transactional databases with low minimum support.
Abstract: Pattern mining has been a hot issue since it was first proposed for market basket analysis. Even though pattern mining is one of the oldest topic in data mining domain, there are still many ongoing challenges to overcome on this subject since the scale of the data size is getting bigger and the complexity of data structure is getting more complicated. This dissertation discusses several pattern mining tasks, challenges associated with them, and algorithm designs that overcome these challenges. Specifically, we design and implement techniques for (1) directly mining discriminative patterns from a numeric valued feature set of k-embedded edge subtrees given labeled training data, (2) mining top correlated patterns from transactional databases with low minimum support, and (3) mining flipping correlation patterns from transactional databases given item hierarchy. We evaluate our solutions by conducting comprehensive experiments on large-scale synthetic and real world datasets.
Journal ArticleDOI
TL;DR: This article outlines a methodology for automatically retrieving corresponding subdirectories of other Web sites in a similar domain when given a sub-directory (subWeb) of a Web site in a certain domain.
Abstract: There is large room for improving the usefulness of today’s Internet search engines. There are various ways. One of the ways is, when given a sub-directory (subWeb) of a Web site in a certain domain, to automatically retrieve, or mine, corresponding subdirectories of other Web sites in a similar domain. In this article, we outline a methodology for doing this.
Journal ArticleDOI
TL;DR: In this paper , the difference in immunomodulatory effects between collagen-derived dipeptides and amino acids was investigated, and the authors concluded that there was no difference in cytokine secretion between amino acids and their respective amino acids.
Abstract: A number of food components, such as polyphenols and phytonutrients, have immunomodulatory effects. Collagen has various bioactivities, such as antioxidative effects, the promotion of wound healing, and relieving symptoms of bone/joint disease. Collagen is digested into dipeptides and amino acids in the gastrointestinal tract and subsequently absorbed. However, the difference in immunomodulatory effects between collagen-derived dipeptides and amino acids is unknown. To investigate such differences, we incubated M1 macrophages or peripheral blood mononuclear cells (PBMC) with collagen-derived dipeptides (hydroxyproline-glycine (Hyp-Gly) and proline-hydroxyproline (Pro-Hyp)) and amino acids (proline (Pro), hydroxyproline (Hyp), and glycine (Gly)). We first investigated the dose dependency of Hyp-Gly on cytokine secretion. Hyp-Gly modulates cytokine secretion from M1 macrophages at 100 µM, but not at 10 µM and 1 µM. We then compared immunomodulatory effects between dipeptides and mixtures of amino acids on M1 macrophages and PBMC. There was, however, no difference in cytokine secretion between dipeptides and their respective amino acids. We conclude that collagen-derived dipeptides and amino acids have immunomodulatory effects on M1-differentiated RAW264.7 cells and PBMC and that there is no difference in the immunomodulatory effects between dipeptides and amino acids.
01 Jan 2017
TL;DR: A data- to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structurerich networks to generate useful knowledge, is shown to represent a promising direction at turning massive text data into structured networks and useful knowledge.
Abstract: The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structurerich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.

Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Journal ArticleDOI
TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

18,616 citations

Proceedings Article
02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

17,056 citations