scispace - formally typeset
Search or ask a question

Showing papers by "Pang-Ning Tan published in 2009"


01 Jan 2009
TL;DR: A polishing apparatus includes aTurntable with an abrasive cloth mounted on an upper surface thereof, and a top ring disposed above the turntable for supporting a workpiece to be polished and pressing the workpiece against the abrasivecloth under a predetermined pressure.
Abstract: A polishing apparatus includes a turntable with an abrasive cloth mounted on an upper surface thereof, and a top ring disposed above the turntable for supporting a workpiece to be polished and pressing the workpiece against the abrasive cloth under a predetermined pressure. The turntable and the top ring are movable relatively to each other to polish a surface of the workpiece supported by the top ring with the abrasive cloth. The abrasive cloth has a projecting region on a surface thereof for more intensive contact with the workpiece than other surface of the abrasive cloth. The projecting region has a smaller dimension in a radial direction of the turntable than a diameter of the workpiece when the projecting region is held in contact with the workpiece. A position of the projecting region is determined on the basis of an area in which the projecting region acts on the workpiece.

178 citations


Proceedings Article
01 Jan 2009
TL;DR: This paper presents a robust algorithm for detecting anomalies in noisy multivariate time series data by employing a kernel matrix alignment method to capture the dependence relationships among variables in the time series.
Abstract: Anomaly detection in multivariate time series is an important data mining task with applications to ecosystem modeling, network traffic monitoring, medical diagnosis, and other domains. This paper presents a robust algorithm for detecting anomalies in noisy multivariate time series data by employing a kernel matrix alignment method to capture the dependence relationships among variables in the time series. Anomalies are found by performing a random walk traversal on the graph induced by the aligned kernel matrix. We show that the algorithm is flexible enough to handle different types of time series anomalies including subsequence-based and local anomalies. Our framework can also be used to characterize the anomalies found in a target time series in terms of the anomalies present in other time series. We have performed extensive experiments to empirically demonstrate the effectiveness of our algorithm. A case study is also presented to illustrate the ability of the algorithm to detect ecosystem disturbances in Earth science data.

122 citations


Book ChapterDOI
09 Apr 2009

42 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper investigates how different pre-processing decisions and different network forces such as selection and influence affect the modeling of dynamic networks, and demonstrates the effect of attribute drift.
Abstract: Social networks have become a major focus of research in recent years, initially directed towards static networks but increasingly, towards dynamic ones. In this paper, we investigate how different pre-processing decisions and different network forces such as selection and influence affect the modeling of dynamic networks. We also present empirical justification for some of the modeling assumptions made in dynamic network analysis (e.g., first-order Markovian assumption) and develop metrics to measure the alignment between links and attributes under different strategies of using the historical network data. We also demonstrate the effect of attribute drift, that is, the importance of individual attributes in forming links change over time.

39 citations


Proceedings ArticleDOI
02 Nov 2009
TL;DR: The proposed co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites significantly outperforms classifiers that learn each detection task independently.
Abstract: Social media are becoming increasingly popular and have attracted considerable attention from spammers. Using a sample of more than ninety thousand known spam Web sites, we found between 7% to 18% of their URLs are posted on two popular social media Web sites, digg.com and delicious.com. In this paper, we present a co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites. The rationale for our approach is that since both detection tasks are related, it would be advantageous to train them simultaneously to make use of the labeled examples in the Web spam and spammer training data. We have evaluated the effectiveness of our algorithm on the delicious.com data set. Our experimental results showed that the proposed co-classification algorithm significantly outperforms classifiers that learn each detection task independently.

33 citations


Proceedings ArticleDOI
08 Mar 2009
TL;DR: An ensemble model is proposed that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model.
Abstract: Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets.

19 citations


Proceedings ArticleDOI
20 Jul 2009
TL;DR: A matrix alignment approach to the problem of collective classification which weights the attributes and the links according to their predictive influence and provides comparable accuracy in prediction to other methods is presented.
Abstract: Within networks there is often a pattern to the way nodes link to one another. It has been shown that the accuracy of node classification can be improved by using the link data. One of the challenges to integrating the attribute and link data, though, is balancing the influence that each has on the classification decision. In this paper we present a matrix alignment approach to the problem of collective classification which weights the attributes and the links according to their predictive influence. The experiments show that while our approach provides comparable accuracy in prediction to other methods, it is also very fast and descriptive.

8 citations


Proceedings ArticleDOI
20 Jul 2009
TL;DR: Two new methods of performing email prioritization are proposed, both of which rank users inboxes using models created from email history.
Abstract: The rise of email as a communication medium raises several issues. A majority of email messages sent are spam. Also, the amount of legitimate email received by many users is overwhelming. In this paper, we propose two new methods of performing email prioritization. Both techniques rank users inboxes using models created from email history. With them, lower priority email messages may be dealt with so that the use of email remains a net productivity gain.

7 citations


Proceedings ArticleDOI
06 Dec 2009
TL;DR: A hybrid framework that simultaneously perform classification and regression to accurately predict future values of a zero-inflated time series and is extended to a semi-supervised learning setting via graph regularization is presented.
Abstract: Time series data with abundant number of zeros are common in many applications, including climate and ecological modeling, disease monitoring, manufacturing defect detection, and traffic accident monitoring. Classical regression models are inappropriate to handle data with such skewed distribution because they tend to underestimate the frequency of zeros and the magnitude of non-zero values in the data. This paper presents a hybrid framework that simultaneously perform classification and regression to accurately predict future values of a zero-inflated time series. A classifier is initially used to determine whether the value at a given time step is zero while a regression model is invoked to estimate its magnitude only if the predicted value has been classified as nonzero. The proposed framework is extended to a semi-supervised learning setting via graph regularization. The effectiveness of the framework is demonstrated via its application to the precipitation prediction problem for climate impact assessment studies.

7 citations


01 Jan 2009
TL;DR: It will be shown that learning the alignment between links and attributes leads to improvements in link prediction and collective classification, and studying the changes in the relationship of attributes to links over time has revealed information helpful for decisions that are made in processing network data.
Abstract: The study of networks in general and social networks in particular, has intensified in recent years due in part to the interest in on-line social networks and the availability of large data sets of related objects. An area called network mining has emerged from the larger area of data mining, whose purpose is to extract hidden knowledge from large, linked data sets. It is the purpose of this dissertation to study the relationships that develop in networks involving links, specifically the relationships between links and communities and between links and attributes. Understanding the alignment between communities and the links offers valuable insights into the roles that nodes play with respect to communities. It will also be shown that learning the alignment between links and attributes leads to improvements in link prediction and collective classification. Finally, studying the changes in the relationship of attributes to links over time has revealed information helpful for decisions that are made in processing network data. During the course of this investigation, a number of tangible new algorithms and metrics have been discovered. First, a new metric is introduced that provides information about the number of communities to which a node belongs without having the actual community information. Combining this rawComm metric with the relative degree of a node allows community-based roles to be assigned to nodes. Next, a new framework is proposed that uses weights to align the attributes to the link structure. Two formulations of the framework are used for improving link prediction and collective classification techniques. It is also shown to be valuable in studying the dynamics of temporal networks.

2 citations