Showing papers by "Michael K. Ng published in 2009"

PDF

Open Access

Journal Article•DOI•

Finding the Largest Eigenvalue of a Nonnegative Tensor

[...]

01 Aug 2009-SIAM Journal on Matrix Analysis and Applications

TL;DR: This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix and applies the method to studying higher-order Markov chains.

...read moreread less

Abstract: In this paper we propose an iterative method for calculating the largest eigenvalue of an irreducible nonnegative tensor. This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix. Numerical results show that our proposed method is promising. We also apply the method to studying higher-order Markov chains.

...read moreread less

300 citations

Journal Article•DOI•

A New Total Variation Method for Multiplicative Noise Removal

[...]

Yu-Mei Huang, Michael K. Ng, You-Wei Wen

01 Jan 2009-Siam Journal on Imaging Sciences

TL;DR: An alternating minimization algorithm is developed to find the minimizer of such an objective function efficiently and the convergence of the minimizing method is shown.

...read moreread less

Abstract: Multiplicative noise removal problems have attracted much attention in recent years. Unlike additive noise removal problems, the noise is multiplied to the orginal image, so almost all information of the original image may disappear in the observed image. The main aim of this paper is to propose and study a strictly convex objective function for multiplicative noise removal problems. We also incorporate the modified total variation regularization in the objective function to recover image edges. We develop an alternating minimization algorithm to find the minimizer of such an objective function efficiently and also show the convergence of the minimizing method. Our experimental results show that the quality of images denoised by the proposed method is quite good.

...read moreread less

280 citations

Journal Article•DOI•

Constraint Preconditioners for Symmetric Indefinite Matrices

[...]

Zhong-Zhi Bai, Michael K. Ng, Zeng-Qi Wang¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Mar 2009-SIAM Journal on Matrix Analysis and Applications

TL;DR: Numerical results show that, for a suitably chosen $(1,1)$ block-matrix, this constraint preconditioner outperforms the block-diagonal and theBlock-tridiagonal ones in iteration step and computing time when they are used to accelerate the GMRES method for solving these block two-by-two symmetric positive indefinite linear systems.

...read moreread less

Abstract: We study the eigenvalue bounds of block two-by-two nonsingular and symmetric indefinite matrices whose $(1,1)$ block is symmetric positive definite and Schur complement with respect to its $(2,2)$ block is symmetric indefinite. A constraint preconditioner for this matrix is constructed by simply replacing the $(1,1)$ block by a symmetric and positive definite approximation, and the spectral properties of the preconditioned matrix are discussed. Numerical results show that, for a suitably chosen $(1,1)$ block-matrix, this constraint preconditioner outperforms the block-diagonal and the block-tridiagonal ones in iteration step and computing time when they are used to accelerate the GMRES method for solving these block two-by-two symmetric positive indefinite linear systems. The new results extend the existing ones about block two-by-two matrices of symmetric negative semidefinite $(2,2)$ blocks to those of general symmetric $(2,2)$ blocks.

...read moreread less

150 citations

Journal Article•DOI•

Super-Resolution Reconstruction Algorithm To MODIS Remote Sensing Images

[...]

Huanfeng Shen¹, Michael K. Ng², Pingxiang Li¹, Liangpei Zhang¹•Institutions (2)

Wuhan University¹, Hong Kong Baptist University²

01 Jan 2009-The Computer Journal

TL;DR: A super-resolution image reconstruction algorithm to moderate-resolution imaging spectroradiometer (MODIS) remote sensing images and a Huber prior is used as regularization to preserve sharp edges in the reconstructed image.

...read moreread less

Abstract: In this paper, we propose a super-resolution image reconstruction algorithm to moderate-resolution imaging spectroradiometer (MODIS) remote sensing images. This algorithm consists of two parts: registration and reconstruction. In the registration part, a truncated quadratic cost function is used to exclude the outlier pixels, which strongly deviate from the registration model. Accurate photometric and geometric registration parameters can be obtained simultaneously. In the reconstruction part, the L1 norm data fidelity term is chosen to reduce the effects of inevitable registration error, and a Huber prior is used as regularization to preserve sharp edges in the reconstructed image. In this process, the outliers are excluded again to enhance the robustness of the algorithm. The proposed algorithm has been tested using real MODIS band-4 images, which were captured in different dates. The experimental results and comparative analyses verify the effectiveness of this algorithm.

...read moreread less

95 citations

Journal Article•DOI•

Selection of regularization parameter in total variation image restoration.

[...]

Haiyong Liao¹, Fang Li², Michael K. Ng¹•Institutions (2)

Hong Kong Baptist University¹, East China Normal University²

01 Nov 2009-Journal of The Optical Society of America A-optics Image Science and Vision

TL;DR: A fast TV image restoration method with an automatic selection of the regularization parameter scheme to restore blurred and noisy images using the generalized cross-validation (GCV) technique to determine inexpensively how much regularization to use in each restoration step.

...read moreread less

Abstract: We consider and study total variation (TV) image restoration. In the literature there are several regularization parameter selection methods for Tikhonov regularization problems (e.g., the discrepancy principle and the generalized cross-validation method). However, to our knowledge, these selection methods have not been applied to TV regularization problems. The main aim of this paper is to develop a fast TV image restoration method with an automatic selection of the regularization parameter scheme to restore blurred and noisy images. The method exploits the generalized cross-validation (GCV) technique to determine inexpensively how much regularization to use in each restoration step. By updating the regularization parameter in each iteration, the restored image can be obtained. Our experimental results for testing different kinds of noise show that the visual quality and SNRs of images restored by the proposed method is promising. We also demonstrate that the method is efficient, as it can restore images of size 256 x 256 in approximately 20 s in the MATLAB computing environment.

...read moreread less

87 citations

Journal Article•DOI•

Fast Image Restoration Methods for Impulse and Gaussian Noises Removal

[...]

Yu-Mei Huang¹, Michael K. Ng², You-Wei Wen•Institutions (2)

Lanzhou University¹, Hong Kong Baptist University²

28 Apr 2009-IEEE Signal Processing Letters

TL;DR: The proposed method uses the modified total variation minimization scheme to regularize the deblurred image and fill in suitable values for noisy image pixels where these are detected by median-type filters.

...read moreread less

Abstract: In this paper, we study the restoration of blurred images corrupted by impulse noise or mixed impulse plus Gaussian noises. In the proposed method, we use the modified total variation minimization scheme to regularize the deblurred image and fill in suitable values for noisy image pixels where these are detected by median-type filters. An alternating minimization algorithm is employed to solve the proposed total variation minimization problem. Our experimental results show the proposed algorithm is very efficient and the quality of restored images by the proposed method is competitive with those restored by the existing variational image restoration methods.

...read moreread less

68 citations

Journal Article•DOI•

A new fuzzy k-modes clustering algorithm for categorical data

[...]

Michael K. Ng¹, Liping Jing¹•Institutions (1)

Hong Kong Baptist University¹

25 Jun 2009-International Journal of Granular Computing, Rough Sets and Intelligent Systems

TL;DR: A simple matching dissimilarity measure for categorical objects is modified, which allows the use of the fuzzy k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets.

...read moreread less

Abstract: This correspondence describes extensions to the fuzzy k-modes algorithm for clustering categorical data. We modify a simple matching dissimilarity measure for categorical objects, which allows the use of the fuzzy k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets. We derive rigorously the updating formula of the fuzzy k-modes clustering algorithm with the new dissimilarity measure, and the convergence of the algorithm under the optimisation framework. Experimental results are presented to illustrate that the effectiveness of the new fuzzy k modes algorithm is better than those of the other existing k-modes algorithms.

...read moreread less

24 citations

Journal Article•DOI•

On Preconditioned Iterative Methods for Certain Time-Dependent Partial Differential Equations

[...]

Zhong-Zhi Bai, Yu-Mei Huang¹, Michael K. Ng•Institutions (1)

Hong Kong Baptist University¹

01 Feb 2009-SIAM Journal on Numerical Analysis

TL;DR: It has been shown that a combination of the Newton/fixed-point iteration with the preconditioned GMRES method is efficient and robust for solving the systems of nonlinear equations arising from the sinc-Galerkin discretization of the time-dependent partial differential equations.

...read moreread less

Abstract: When the Newton method or the fixed-point method is employed to solve the systems of nonlinear equations arising in the sinc-Galerkin discretization of certain time-dependent partial differential equations, in each iteration step we need to solve a structured subsystem of linear equations iteratively by, for example, a Krylov subspace method such as the preconditioned GMRES. In this paper, based on the tensor and the Toeplitz structures of the linear subsystems we construct structured preconditioners for their coefficient matrices and estimate the eigenvalue bounds of the preconditioned matrices under certain assumptions. Numerical examples are given to illustrate the effectiveness of the proposed preconditioning methods. It has been shown that a combination of the Newton/fixed-point iteration with the preconditioned GMRES method is efficient and robust for solving the systems of nonlinear equations arising from the sinc-Galerkin discretization of the time-dependent partial differential equations.

...read moreread less

24 citations

Journal Article•DOI•

A high-order Markov-switching model for risk measurement

[...]

Tak Kuen Siu¹, Wai-Ki Ching², Eric S. Fung³, Michael K. Ng³, Xun Li⁴ - Show less +1 more•Institutions (4)

Curtin University¹, University of Hong Kong², Hong Kong Baptist University³, Hong Kong Polytechnic University⁴

01 Jul 2009-Computers & Mathematics With Applications

TL;DR: A High-order Markov-Switching model for measuring the risk of a portfolio and adopts the Value-at-Risk (VaR) as a metric for market risk quantification and examines the high-order effect of the underlying Markov chain on the risk measures via backtesting.

...read moreread less

Abstract: In this paper, we introduce a High-order Markov-Switching (HMS) model for measuring the risk of a portfolio. We suppose that the rate of return from a risky portfolio follows an HMS model with the drift and the volatility modulated by a discrete-time weak Markov chain. The states of the weak Markov chain are interpreted as observable states of an economy. We adopt the Value-at-Risk (VaR) as a metric for market risk quantification and examine the high-order effect of the underlying Markov chain on the risk measures via backtesting.

...read moreread less

23 citations

Journal Article•DOI•

Categorical data clustering with automatic selection of cluster number

[...]

Haiyong Liao¹, Michael K. Ng¹•Institutions (1)

Hong Kong Baptist University¹

18 Mar 2009-Fuzzy Information and Engineering

TL;DR: A new categorical data clustering algorithm with automatic selection of k is proposed, which extends the k-modes clustering algorithms by introducing a penalty term to the objective function to make more clusters compete for objects.

...read moreread less

18 citations

A Coordinate Gradient Descent Method for Nonsmooth

[...]

Zheng-Jian Bai, Michael K. Ng, Liqun Qi, 白正简

01 Sep 2009

Journal Article•DOI•

Mining, Modeling, and Evaluation of Subnetworks From Large Biomolecular Networks and Its Comparison Study

[...]

Xiaohua Hu¹, Michael K. Ng², Fang-Xiang Wu³, Bahrad A. Sokhansanj¹•Institutions (3)

Drexel University¹, Hong Kong Baptist University², University of Saskatchewan³

01 Mar 2009

TL;DR: Experimental results on time-series gene expression data for the human cell cycle indicate that the novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network is promising for subnetwork mining and simulation from large biomolescular networks.

...read moreread less

Abstract: In this paper, we present a novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to such a biomolecular network to obtain various subnetworks. Second, computational models are generated for the subnetworks and simulated to predict their behavior in the cellular context. We discuss and evaluate some of the advanced computational modeling approaches, in particular, state-space modeling, probabilistic Boolean network modeling, and fuzzy logic modeling. The modeling and simulation results represent hypotheses that are tested against high-throughput biological datasets (microarrays and/or genetic screens) under normal and perturbation conditions. Experimental results on time-series gene expression data for the human cell cycle indicate that our approach is promising for subnetwork mining and simulation from large biomolecular networks.

...read moreread less

Journal Article•DOI•

Improved residue function and reduced flow dependence in MR perfusion using least‐absolute‐deviation regularization

[...]

Kelvin K. Wong¹, Chi-Pan Tam¹, Chi-Pan Tam², Chi-Pan Tam³, Michael K. Ng², Michael K. Ng³, Stephen T. C. Wong¹, Stephen T. C. Wong³, Geoffrey S. Young¹, Geoffrey S. Young³ - Show less +6 more•Institutions (3)

Cornell University¹, Hong Kong Baptist University², Brigham and Women's Hospital³

01 Feb 2009-Magnetic Resonance in Medicine

TL;DR: The development of a model‐independent delay‐invariant deconvolution technique using least‐absolute‐deviation (LAD) regularization to improve the CBF estimation accuracy and initial clinical implementation of the method on six representative clinical cases confirm the advantages of the LAD method over rSVD and sSVD methods.

...read moreread less

Abstract: Cerebral blood flow (CBF) estimates derived from singular value decomposition (SVD) of time intensity curves from Gadolinium bolus perfusion-weighted imaging are known to underestimate CBF, especially at high flow rates. We report the development of a model-independent delay-invariant deconvolution technique using least-absolute-deviation (LAD) regularization to improve the CBF estimation accuracy. Computer simulations were performed to compare the accuracy of CBF estimates derived from LAD, reformulated SVD (rSVD) and standard SVD (sSVD) techniques. Simulations were performed at image signal-to-noise ratios ranging from 20 to 400, cerebral blood volumes from 1% to 10%, and CBF from 2.5 mL/100 g/min to 176.5 mL/100 g/min to estimate the effect of these parameters on the accuracy of CBF estimation. The LAD method improved the CBF estimation accuracy by up to 32% in gray matter and 23% in white matter compared with rSVD and sSVD methods. LAD method also reduces the systematic bias of rSVD and sSVD methods to baseline SNR while producing more accurate and reproducible residue function calculation than either rSVD or sSVD method. Initial clinical implementation of the method on six representative clinical cases confirm the advantages of the LAD method over rSVD and sSVD methods.

...read moreread less

Journal Article•DOI•

Approximation BFGS methods for nonlinear image restoration

[...]

Linzhang Lu¹, Michael K. Ng², Fu-Rong Lin³•Institutions (3)

Guizhou Normal University¹, Hong Kong Baptist University², Shantou University³

01 Apr 2009-Journal of Computational and Applied Mathematics

TL;DR: In this article, a novel generalized BFGS method is proposed for large-scale image restoration minimization problems, where the complexity per step is O(nlogn) operations and only O(n) memory allocations are required, where n is the number of image pixels.

...read moreread less

Book Chapter•DOI•

L0-Norm and Total Variation for Wavelet Inpainting

[...]

Andy C. Yau¹, Xue-Cheng Tai¹, Michael K. Ng²•Institutions (2)

Nanyang Technological University¹, Hong Kong Baptist University²

24 May 2009

TL;DR: This paper proposes a wavelet inpainting model by using L 0 -norm and the total variation (TV) minimization and applies a graph cut algorithm to solve the subproblem related to TV minimization.

...read moreread less

Abstract: In this paper, we suggest an algorithm to recover an image whose wavelet coefficients are partially lost. We propose a wavelet inpainting model by using L 0 -norm and the total variation (TV) minimization. Traditionally, L 0 -norm is replaced by L 1 -norm or L 2 -norm due to numerical difficulties. We use an alternating minimization technique to overcome these difficulties. In order to improve the numerical efficiency, we also apply a graph cut algorithm to solve the subproblem related to TV minimization. Numerical results will be given to demonstrate our advantages of the proposed algorithm.

...read moreread less

Book Chapter•DOI•

Budget Semi-supervised Learning

[...]

Zhi-Hua Zhou¹, Michael K. Ng², Qiao-Qiao She¹, Yuan Jiang¹•Institutions (2)

Nanjing University¹, Hong Kong Baptist University²

19 Apr 2009

TL;DR: Budget semi-supervised learning with a resource budget, such as a limited memory insufficient to accommodate and/or process all available unlabeled data is proposed, and it is shown that this is achievable by a simple yet effective method.

...read moreread less

Abstract: In this paper we propose to study budget semi-supervised learning , i.e., semi-supervised learning with a resource budget, such as a limited memory insufficient to accommodate and/or process all available unlabeled data. This setting is with practical importance because in most real scenarios although there may exist abundant unlabeled data, the computational resource that can be used is generally not unlimited. Effective budget semi-supervised learning algorithms should be able to adjust behaviors considering the given resource budget. Roughly, the more resource, the more exploitation on unlabeled data. As an example, in this paper we show that this is achievable by a simple yet effective method.

...read moreread less

Journal Article•DOI•

A novel peak detection approach with chemical noise removal using short-time FFT for prOTOF MS data

[...]

Shu-Qin Zhang¹, Shu-Qin Zhang², Thomas J. DeGraba, Honghui Wang³, Gerard Hoehn³, Denise A. Gonzales⁴, Anthony F. Suffredini³, Wai-Ki Ching⁴, Michael K. Ng⁵, Xiaobo Zhou¹, Xiaobo Zhou⁶, Stephen T. C. Wong⁶, Stephen T. C. Wong¹ - Show less +9 more•Institutions (6)

Brigham and Women's Hospital¹, Fudan University², National Institutes of Health³, University of Hong Kong⁴, Hong Kong Baptist University⁵, Cornell University⁶

01 Aug 2009-Proteomics

TL;DR: A novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses to be developed, is developed and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.

...read moreread less

Abstract: Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short-time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut-off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin-bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease-free patients to detect peaks with S/N> or =2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.

...read moreread less

Journal Article•DOI•

SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters

[...]

Liping Jing¹, Junjie Li¹, Michael K. Ng¹, Yiu-ming Cheung¹, Joshua Huang² - Show less +1 more•Institutions (2)

Hong Kong Baptist University¹, University of Hong Kong²

27 May 2009-International Journal of Data Mining, Modelling and Management

TL;DR: A new penalty term is introduced to the objective function of the fuzzy k-means clustering process to enable several clusters to compete for objects, which leads to merging some cluster centres and the identification of the 'true' number of clusters.

...read moreread less

Abstract: This paper presents a subspace k-means clustering algorithm for high-dimensional data with automatic selection of k. A new penalty term is introduced to the objective function of the fuzzy k-means clustering process to enable several clusters to compete for objects, which leads to merging some cluster centres and the identification of the 'true' number of clusters. The algorithm determines the number of clusters in a dataset by adjusting the penalty term factor. A subspace cluster validation index is proposed and employed to verify the subspace clustering results generated by the algorithm. The experimental results from both the synthetic and real data have demonstrated that the algorithm is effective in producing consistent clustering results and the correct number of clusters. Some real datasets are used to demonstrate how the proposed algorithm can determine interesting sub-clusters in the datasets.

...read moreread less

Journal Article•DOI•

A semi-supervised approach to projected clustering with applications to microarray data

[...]

Kevin Y. Yip¹, Lin Cheung², David W. Cheung², Liping Jing³, Michael K. Ng³ - Show less +1 more•Institutions (3)

Yale University¹, University of Hong Kong², Hong Kong Baptist University³

01 Jun 2009

TL;DR: This work proposes a new algorithm that combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process, and shows that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings.

...read moreread less

Abstract: Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also shown effective in analysing a microarray dataset.

...read moreread less

Transductive Multi-Label Learning via Alpha Matting

[...]

Xiang-Nan Kong, Michael K. Ng, Zhi-Hua Zhou

01 Jan 2009

TL;DR: Empirical studies on real-world multi- label learning tasks show that T can effectively make use of unlabeled data information to achieve performance as good as existing state-of-the-art multi-label learning algorithms.

...read moreread less

Abstract: Multi-label learning deals with the problems when each instance can be assigned to multiple classes simultaneously, which are ubiquitous in real-world learning tasks. In this paper, we propose a new multilabel learning method, which is able to exploit unlabeled data to obtain an effective model for assigning appropriate multiple labels to instances. The proposed method is called T (TRansductive multi-label learning via Alpha Matting), which formulates transductive multi-label learning as an optimization problem. We develop an efficient algorithm which has a closed form solution to solve this optimization problem. Empirical studies on real-world multi-label learning tasks show that T can effectively make use of unlabeled data information to achieve performance as good as existing state-of-the-art multi-label learning algorithms, moreover T is much faster and can handle relatively larger data sets.

...read moreread less

[...]

Haiyong Liao¹, Yang Liu¹, Michael K. Ng¹•Institutions (1)

Hong Kong Baptist University¹

01 Jan 2009

TL;DR: This paper proposes to develop and use shrunken similarity measure to analyze and select relevant SNPs for classification problems and finds some SNPs in chromosome 2 that they contain in some genes which is relevant to Parkinson disease.

...read moreread less

Abstract: Recent development of high-resolution single-nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. However, SNP data typi- cally has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease SNP data) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data. In this paper, we propose to develop and use shrunken dis- similarity measure to analyze and select relevant SNPs for classification problems. Examples for HapMap data and Parkinson data are given to demonstrate the effectiveness of the proposed method and illustrate it has the potential to become a useful analysis tool for SNP data sets. In particular, we find some SNPs in chromosome 2 that they contain in some genes which is relevant to Parkinson disease.

...read moreread less

Journal Article•DOI•

A coordinate gradient descent method for nonsmooth nonseparable minimization

[...]

Zheng-Jian Bai¹, Michael K. Ng², Liqun Qi³•Institutions (3)

Xiamen University¹, Hong Kong Baptist University², Hong Kong Polytechnic University³

01 Sep 2009-Numerical Mathematics-theory Methods and Applications

TL;DR: In this paper, a coordinate gradient descent approach for minimizing the sum of a smooth function and a nonseparable convex function is presented, where a search direction is found by solving a subproblem obtained by a second-order approximation of the smooth function.

...read moreread less

Abstract: This paper presents a coordinate gradient descent approach for minimizing the sum of a smooth function and a nonseparable convex function. We find a search direction by solving a subproblem obtained by a second-order approximation of the smooth function and adding a separable convex function. Under a local Lipschitzian error bound assumption, we show that the algorithm possesses global and local linear convergence properties. We also give some numerical tests (including image recovery examples) to illustrate the efficiency of the proposed method. AMS subject classifications: 65F22, 65K05

...read moreread less

Book Chapter•DOI•

Soft Subspace Clustering for High-Dimensional Data

[...]

Liping Jing¹, Michael K. Ng¹, Joshua Zhexue Huang²•Institutions (2)

Hong Kong Baptist University¹, University of Hong Kong²

01 Jan 2009

TL;DR: Clustering high-dimensional data requires special treatment and one type of clustering methods for high dimensional data is referred to as subspace clustering, aiming at finding clusters from subspaces instead of the entire data space.

...read moreread less

Abstract: High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is viewed as a vector of terms whose dimension is equal to the total number of unique terms in a data set, which is usually in thousands. High dimensional data occurs in business as well. In retails, for example, to effectively manage supplier relationship, suppliers are often categorized according to their business behaviors (Zhang, Huang, Qian, Xu, & Jing, 2006). The supplier’s behavior data is high dimensional, which contains thousands of attributes to describe the supplier’s behaviors, including product items, ordered amounts, order frequencies, product quality and so forth. One more example is DNA microarray data. Clustering high-dimensional data requires special treatment (Swanson, 1990; Jain, Murty, & Flynn, 1999; Cai, He, & Han, 2005; Kontaki, Papadopoulos & Manolopoulos., 2007), although various methods for clustering are available (Jain & Dubes, 1988). One type of clustering methods for high dimensional data is referred to as subspace clustering, aiming at finding clusters from subspaces instead of the entire data space. In a subspace clustering, each cluster is a set of objects identified by a subset of dimensions and different clusters are represented in different subsets of dimensions. Soft subspace clustering considers that different dimensions make different contributions to the identification of objects in a cluster. It represents the importance of a dimension as a weight that can be treated as the degree of the dimension in contribution to the cluster. Soft subspace clustering can find the cluster memberships of objects and identify the subspace of each cluster in the same clustering process.

...read moreread less