Showing papers by "Pang-Ning Tan published in 2009"

PDF

Open Access

[...]

01 Jan 2009

TL;DR: A polishing apparatus includes aTurntable with an abrasive cloth mounted on an upper surface thereof, and a top ring disposed above the turntable for supporting a workpiece to be polished and pressing the workpiece against the abrasivecloth under a predetermined pressure.

...read moreread less

Abstract: A polishing apparatus includes a turntable with an abrasive cloth mounted on an upper surface thereof, and a top ring disposed above the turntable for supporting a workpiece to be polished and pressing the workpiece against the abrasive cloth under a predetermined pressure. The turntable and the top ring are movable relatively to each other to polish a surface of the workpiece supported by the top ring with the abrasive cloth. The abrasive cloth has a projecting region on a surface thereof for more intensive contact with the workpiece than other surface of the abrasive cloth. The projecting region has a smaller dimension in a radial direction of the turntable than a diameter of the workpiece when the projecting region is held in contact with the workpiece. A position of the projecting region is determined on the basis of an area in which the projecting region acts on the workpiece.

...read moreread less

178 citations

Proceedings Article•

Detection and characterization of anomalies in multivariate time series

[...]

Haibin Cheng¹, Pang-Ning Tan, Christopher Potter², Steven Klooster³•Institutions (3)

Michigan State University¹, Ames Research Center², California State University³

01 Jan 2009

TL;DR: This paper presents a robust algorithm for detecting anomalies in noisy multivariate time series data by employing a kernel matrix alignment method to capture the dependence relationships among variables in the time series.

...read moreread less

Abstract: Anomaly detection in multivariate time series is an important data mining task with applications to ecosystem modeling, network traffic monitoring, medical diagnosis, and other domains. This paper presents a robust algorithm for detecting anomalies in noisy multivariate time series data by employing a kernel matrix alignment method to capture the dependence relationships among variables in the time series. Anomalies are found by performing a random walk traversal on the graph induced by the aligned kernel matrix. We show that the algorithm is flexible enough to handle different types of time series anomalies including subsequence-based and local anomalies. Our framework can also be used to characterize the anomalies found in a target time series in terms of the anomalies present in other time series. We have performed extensive experiments to empirically demonstrate the effectiveness of our algorithm. A case study is also presented to illustrate the ability of the algorithm to detect ecosystem disturbances in Earth science data.

...read moreread less

122 citations

Book Chapter•DOI•

kNN: k-Nearest Neighbors

[...]

Michael Steinbach, Pang-Ning Tan

09 Apr 2009

42 citations

Proceedings Article•DOI•

Measuring the effects of preprocessing decisions and network forces in dynamic network analysis

[...]

Jerry Scripps¹, Pang-Ning Tan¹, Abdol-Hossein Esfahanian¹•Institutions (1)

Michigan State University¹

28 Jun 2009

TL;DR: This paper investigates how different pre-processing decisions and different network forces such as selection and influence affect the modeling of dynamic networks, and demonstrates the effect of attribute drift.

...read moreread less

Abstract: Social networks have become a major focus of research in recent years, initially directed towards static networks but increasingly, towards dynamic ones. In this paper, we investigate how different pre-processing decisions and different network forces such as selection and influence affect the modeling of dynamic networks. We also present empirical justification for some of the modeling assumptions made in dynamic network analysis (e.g., first-order Markovian assumption) and develop metrics to measure the alignment between links and attributes under different strategies of using the historical network data. We also demonstrate the effect of attribute drift, that is, the importance of individual attributes in forming links change over time.

...read moreread less

39 citations

Proceedings Article•DOI•

A co-classification framework for detecting web spam and spammers in social media web sites

[...]

Feilong Chen¹, Pang-Ning Tan¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

02 Nov 2009

TL;DR: The proposed co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites significantly outperforms classifiers that learn each detection task independently.

...read moreread less

Abstract: Social media are becoming increasingly popular and have attracted considerable attention from spammers. Using a sample of more than ninety thousand known spam Web sites, we found between 7% to 18% of their URLs are posted on two popular social media Web sites, digg.com and delicious.com. In this paper, we present a co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites. The rationale for our approach is that since both detection tasks are related, it would be advantageous to train them simultaneously to make use of the labeled examples in the Web spam and spammer training data. We have evaluated the effectiveness of our algorithm on the delicious.com data set. Our experimental results showed that the proposed co-classification algorithm significantly outperforms classifiers that learn each detection task independently.

...read moreread less

33 citations

Proceedings Article•DOI•

Combining statistics and semantics via ensemble model for document clustering

[...]

Samah Jamal Fodeh¹, William F. Punch¹, Pang-Ning Tan¹•Institutions (1)

Michigan State University¹

08 Mar 2009

TL;DR: An ensemble model is proposed that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model.

...read moreread less

Abstract: Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets.

...read moreread less

19 citations

Proceedings Article•DOI•

A Matrix Alignment Approach for Collective Classification

[...]

Jerry Scripps¹, Pang-Ning Tan¹, Feilong Chen¹, Abdol-Hossein Esfahanian¹•Institutions (1)

Michigan State University¹

20 Jul 2009

TL;DR: A matrix alignment approach to the problem of collective classification which weights the attributes and the links according to their predictive influence and provides comparable accuracy in prediction to other methods is presented.

...read moreread less

Abstract: Within networks there is often a pattern to the way nodes link to one another. It has been shown that the accuracy of node classification can be improved by using the link data. One of the challenges to integrating the attribute and link data, though, is balancing the influence that each has on the classification decision. In this paper we present a matrix alignment approach to the problem of collective classification which weights the attributes and the links according to their predictive influence. The experiments show that while our approach provides comparable accuracy in prediction to other methods, it is also very fast and descriptive.

...read moreread less

8 citations

Proceedings Article•DOI•

History-Based Email Prioritization

[...]

Ronald Nussbaum¹, Abdol-Hossein Esfahanian¹, Pang-Ning Tan¹•Institutions (1)

Michigan State University¹

20 Jul 2009

TL;DR: Two new methods of performing email prioritization are proposed, both of which rank users inboxes using models created from email history.

...read moreread less

Abstract: The rise of email as a communication medium raises several issues. A majority of email messages sent are spam. Also, the amount of legitimate email received by many users is overwhelming. In this paper, we propose two new methods of performing email prioritization. Both techniques rank users inboxes using models created from email history. With them, lower priority email messages may be dealt with so that the use of email remains a net productivity gain.

...read moreread less

7 citations

Proceedings Article•DOI•

A Semi-supervised Framework for Simultaneous Classification and Regression of Zero-Inflated Time Series Data with Application to Precipitation Prediction

[...]

Zubin Abraham¹, Pang-Ning Tan¹•Institutions (1)

Michigan State University¹

06 Dec 2009

TL;DR: A hybrid framework that simultaneously perform classification and regression to accurately predict future values of a zero-inflated time series and is extended to a semi-supervised learning setting via graph regularization is presented.

...read moreread less

Abstract: Time series data with abundant number of zeros are common in many applications, including climate and ecological modeling, disease monitoring, manufacturing defect detection, and traffic accident monitoring. Classical regression models are inappropriate to handle data with such skewed distribution because they tend to underestimate the frequency of zeros and the magnitude of non-zero values in the data. This paper presents a hybrid framework that simultaneously perform classification and regression to accurately predict future values of a zero-inflated time series. A classifier is initially used to determine whether the value at a given time step is zero while a regression model is invoked to estimate its magnitude only if the predicted value has been classified as nonzero. The proposed framework is extended to a semi-supervised learning setting via graph regularization. The effectiveness of the framework is demonstrated via its application to the precipitation prediction problem for climate impact assessment studies.

...read moreread less

7 citations

Exploiting the link structure in mining network data

[...]

Pang-Ning Tan¹, Jerry Scripps¹•Institutions (1)

Michigan State University¹

01 Jan 2009

TL;DR: It will be shown that learning the alignment between links and attributes leads to improvements in link prediction and collective classification, and studying the changes in the relationship of attributes to links over time has revealed information helpful for decisions that are made in processing network data.

...read moreread less

Abstract: The study of networks in general and social networks in particular, has intensified in recent years due in part to the interest in on-line social networks and the availability of large data sets of related objects. An area called network mining has emerged from the larger area of data mining, whose purpose is to extract hidden knowledge from large, linked data sets. It is the purpose of this dissertation to study the relationships that develop in networks involving links, specifically the relationships between links and communities and between links and attributes. Understanding the alignment between communities and the links offers valuable insights into the roles that nodes play with respect to communities. It will also be shown that learning the alignment between links and attributes leads to improvements in link prediction and collective classification. Finally, studying the changes in the relationship of attributes to links over time has revealed information helpful for decisions that are made in processing network data. During the course of this investigation, a number of tangible new algorithms and metrics have been discovered. First, a new metric is introduced that provides information about the number of communities to which a node belongs without having the actual community information. Combining this rawComm metric with the relative degree of a node allows community-based roles to be assigned to nodes. Next, a new framework is proposed that uses weights to align the attributes to the link structure. Two formulations of the framework are used for improving link prediction and collective classification techniques. It is also shown to be valuable in studying the dynamics of temporal networks.

...read moreread less

2 citations

Book Chapter•DOI•

Pattern Preserving Clustering

[...]

Hui Xiong¹, Michael Steinbach², Pang-Ning Tan³, Vipin Kumar², Wenjun Zhou¹ - Show less +1 more•Institutions (3)

Rutgers University¹, University of Minnesota², Michigan State University³

01 Jan 2009