scispace - formally typeset
Search or ask a question
Author

Wentao Fan

Bio: Wentao Fan is an academic researcher from Huaqiao University. The author has contributed to research in topics: Mixture model & Dirichlet distribution. The author has an hindex of 14, co-authored 101 publications receiving 895 citations. Previous affiliations of Wentao Fan include Concordia University Wisconsin & Concordia University.


Papers
More filters
Journal ArticleDOI
TL;DR: A simple but fast DPeak, namely FastDPeak, 1 is proposed, which runs in about O ( n l o g ( n ) ) expected time in the intrinsic dimensionality and replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations.
Abstract: Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, ρ and δ , are both obtained by brute force algorithm with complexity O ( n 2 ) . Thus, a simple but fast DPeak, namely FastDPeak, 1 is proposed, which runs in about O ( n l o g ( n ) ) expected time in the intrinsic dimensionality. It replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations. Based on kNN-density, local density peaks and non-local density peaks are identified, and a fast algorithm, which uses two different strategies to compute δ for them, is also proposed with complexity O ( n ) . Experimental results show that FastDPeak is effective and outperforms other variants of DPeak.

144 citations

Journal ArticleDOI
TL;DR: This paper focuses on the variational learning of finite Dirichlet mixture models and suggests that this approach has several advantages: first, the problem of over-fitting is prevented; furthermore, the complexity of the mixture model can be determined automatically and simultaneously with the parameters estimation as part of the Bayesian inference procedure.
Abstract: In this paper, we focus on the variational learning of finite Dirichlet mixture models. Compared to other algorithms that are commonly used for mixture models (such as expectation-maximization), our approach has several advantages: first, the problem of over-fitting is prevented; furthermore, the complexity of the mixture model (i.e., the number of components) can be determined automatically and simultaneously with the parameters estimation as part of the Bayesian inference procedure; finally, since the whole inference process is analytically tractable with closed-form solutions, it may scale well to large applications. Both synthetic and real data, generated from real-life challenging applications namely image databases categorization and anomaly intrusion detection, are experimented to verify the effectiveness of the proposed approach.

110 citations

Journal ArticleDOI
TL;DR: The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of the approach to provide accurate models by distinguishing between relevant and irrelevant features without over- or under-fitting the data.

62 citations

Proceedings ArticleDOI
11 Dec 2011
TL;DR: This paper proposes a novel unsupervised statistical approach for detecting network based attacks through finite generalized Dirichlet mixture models, in the context of Bayesian variational inference.
Abstract: In recent years, an increasing number of security threats have brought a serious risk to the internet and computer networks. Intrusion Detection System (IDS) plays a vital role in detecting various kinds of attacks. Developing adaptive and flexible oriented IDSs remains a challenging and demanding task due to the incessantly appearance of new types of attacks and sabotaging approaches. In this paper, we propose a novel unsupervised statistical approach for detecting network based attacks. In our approach, patterns of normal and intrusive activities are learned through finite generalized Dirichlet mixture models, in the context of Bayesian variational inference. Under the proposed variational framework, the parameters, the complexity of the mixture model, and the features saliency can be estimated simultaneously, in a closed-form. We evaluate the proposed approach using the popular KDD CUP 1999 data set. Experimental results show that this approach is able to detect many different types of intrusions accurately with a low false positive rate.

58 citations

Journal ArticleDOI
TL;DR: A principled approach for approximating the intractable model's posterior distribution by a tractable one is proposed-which is developed-such that all the involved mixture's parameters can be estimated simultaneously and effectively in a closed form.
Abstract: A large class of problems can be formulated in terms of the clustering process. Mixture models are an increasingly important tool in statistical pattern recognition and for analyzing and clustering complex data. Two challenging aspects that should be addressed when considering mixture models are how to choose between a set of plausible models and how to estimate the model's parameters. In this paper, we address both problems simultaneously within a unified online nonparametric Bayesian framework that we develop to learn a Dirichlet process mixture of Beta-Liouville distributions (i.e., an infinite Beta-Liouville mixture model). The proposed infinite model is used for the online modeling and clustering of proportional data for which the Beta-Liouville mixture has been shown to be effective. We propose a principled approach for approximating the intractable model's posterior distribution by a tractable one-which we develop-such that all the involved mixture's parameters can be estimated simultaneously and effectively in a closed form. This is done through variational inference that enjoys important advantages, such as handling of unobserved attributes and preventing under or overfitting; we explain that in detail. The effectiveness of the proposed work is evaluated on three challenging real applications, namely facial expression recognition, behavior modeling and recognition, and dynamic textures clustering.

49 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2002

9,314 citations

Journal ArticleDOI
01 Nov 2018-Heliyon
TL;DR: The study found that neural-network models such as feedforward and feedback propagation artificial neural networks are performing better in its application to human problems and proposed feedforwardand feedback propagation ANN models for research focus based on data analysis factors like accuracy, processing speed, latency, fault tolerance, volume, scalability, convergence, and performance.

1,471 citations

Journal ArticleDOI
TL;DR: The purpose of this paper is to provide a complete survey of the traditional and recent approaches to background modeling for foreground detection, and categorize the different approaches in terms of the mathematical models used.

664 citations

Journal ArticleDOI
TL;DR: This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons to highlight the limitations of existing approaches and point out some promising subsequent research directions.
Abstract: In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers to not only accurately classify the seen classes, but also effectively deal with unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also review the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.

492 citations