scispace - formally typeset
Search or ask a question

Showing papers by "Michael K. Ng published in 2009"


Journal ArticleDOI
TL;DR: This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix and applies the method to studying higher-order Markov chains.
Abstract: In this paper we propose an iterative method for calculating the largest eigenvalue of an irreducible nonnegative tensor. This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix. Numerical results show that our proposed method is promising. We also apply the method to studying higher-order Markov chains.

300 citations


Journal ArticleDOI
TL;DR: An alternating minimization algorithm is developed to find the minimizer of such an objective function efficiently and the convergence of the minimizing method is shown.
Abstract: Multiplicative noise removal problems have attracted much attention in recent years. Unlike additive noise removal problems, the noise is multiplied to the orginal image, so almost all information of the original image may disappear in the observed image. The main aim of this paper is to propose and study a strictly convex objective function for multiplicative noise removal problems. We also incorporate the modified total variation regularization in the objective function to recover image edges. We develop an alternating minimization algorithm to find the minimizer of such an objective function efficiently and also show the convergence of the minimizing method. Our experimental results show that the quality of images denoised by the proposed method is quite good.

280 citations


Journal ArticleDOI
TL;DR: Numerical results show that, for a suitably chosen $(1,1)$ block-matrix, this constraint preconditioner outperforms the block-diagonal and theBlock-tridiagonal ones in iteration step and computing time when they are used to accelerate the GMRES method for solving these block two-by-two symmetric positive indefinite linear systems.
Abstract: We study the eigenvalue bounds of block two-by-two nonsingular and symmetric indefinite matrices whose $(1,1)$ block is symmetric positive definite and Schur complement with respect to its $(2,2)$ block is symmetric indefinite. A constraint preconditioner for this matrix is constructed by simply replacing the $(1,1)$ block by a symmetric and positive definite approximation, and the spectral properties of the preconditioned matrix are discussed. Numerical results show that, for a suitably chosen $(1,1)$ block-matrix, this constraint preconditioner outperforms the block-diagonal and the block-tridiagonal ones in iteration step and computing time when they are used to accelerate the GMRES method for solving these block two-by-two symmetric positive indefinite linear systems. The new results extend the existing ones about block two-by-two matrices of symmetric negative semidefinite $(2,2)$ blocks to those of general symmetric $(2,2)$ blocks.

150 citations


Journal ArticleDOI
TL;DR: A super-resolution image reconstruction algorithm to moderate-resolution imaging spectroradiometer (MODIS) remote sensing images and a Huber prior is used as regularization to preserve sharp edges in the reconstructed image.
Abstract: In this paper, we propose a super-resolution image reconstruction algorithm to moderate-resolution imaging spectroradiometer (MODIS) remote sensing images. This algorithm consists of two parts: registration and reconstruction. In the registration part, a truncated quadratic cost function is used to exclude the outlier pixels, which strongly deviate from the registration model. Accurate photometric and geometric registration parameters can be obtained simultaneously. In the reconstruction part, the L1 norm data fidelity term is chosen to reduce the effects of inevitable registration error, and a Huber prior is used as regularization to preserve sharp edges in the reconstructed image. In this process, the outliers are excluded again to enhance the robustness of the algorithm. The proposed algorithm has been tested using real MODIS band-4 images, which were captured in different dates. The experimental results and comparative analyses verify the effectiveness of this algorithm.

95 citations


Journal ArticleDOI
TL;DR: A fast TV image restoration method with an automatic selection of the regularization parameter scheme to restore blurred and noisy images using the generalized cross-validation (GCV) technique to determine inexpensively how much regularization to use in each restoration step.
Abstract: We consider and study total variation (TV) image restoration. In the literature there are several regularization parameter selection methods for Tikhonov regularization problems (e.g., the discrepancy principle and the generalized cross-validation method). However, to our knowledge, these selection methods have not been applied to TV regularization problems. The main aim of this paper is to develop a fast TV image restoration method with an automatic selection of the regularization parameter scheme to restore blurred and noisy images. The method exploits the generalized cross-validation (GCV) technique to determine inexpensively how much regularization to use in each restoration step. By updating the regularization parameter in each iteration, the restored image can be obtained. Our experimental results for testing different kinds of noise show that the visual quality and SNRs of images restored by the proposed method is promising. We also demonstrate that the method is efficient, as it can restore images of size 256 x 256 in approximately 20 s in the MATLAB computing environment.

87 citations


Journal ArticleDOI
TL;DR: The proposed method uses the modified total variation minimization scheme to regularize the deblurred image and fill in suitable values for noisy image pixels where these are detected by median-type filters.
Abstract: In this paper, we study the restoration of blurred images corrupted by impulse noise or mixed impulse plus Gaussian noises. In the proposed method, we use the modified total variation minimization scheme to regularize the deblurred image and fill in suitable values for noisy image pixels where these are detected by median-type filters. An alternating minimization algorithm is employed to solve the proposed total variation minimization problem. Our experimental results show the proposed algorithm is very efficient and the quality of restored images by the proposed method is competitive with those restored by the existing variational image restoration methods.

68 citations


Journal ArticleDOI
TL;DR: A simple matching dissimilarity measure for categorical objects is modified, which allows the use of the fuzzy k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets.
Abstract: This correspondence describes extensions to the fuzzy k-modes algorithm for clustering categorical data. We modify a simple matching dissimilarity measure for categorical objects, which allows the use of the fuzzy k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets. We derive rigorously the updating formula of the fuzzy k-modes clustering algorithm with the new dissimilarity measure, and the convergence of the algorithm under the optimisation framework. Experimental results are presented to illustrate that the effectiveness of the new fuzzy k modes algorithm is better than those of the other existing k-modes algorithms.

24 citations


Journal ArticleDOI
TL;DR: It has been shown that a combination of the Newton/fixed-point iteration with the preconditioned GMRES method is efficient and robust for solving the systems of nonlinear equations arising from the sinc-Galerkin discretization of the time-dependent partial differential equations.
Abstract: When the Newton method or the fixed-point method is employed to solve the systems of nonlinear equations arising in the sinc-Galerkin discretization of certain time-dependent partial differential equations, in each iteration step we need to solve a structured subsystem of linear equations iteratively by, for example, a Krylov subspace method such as the preconditioned GMRES. In this paper, based on the tensor and the Toeplitz structures of the linear subsystems we construct structured preconditioners for their coefficient matrices and estimate the eigenvalue bounds of the preconditioned matrices under certain assumptions. Numerical examples are given to illustrate the effectiveness of the proposed preconditioning methods. It has been shown that a combination of the Newton/fixed-point iteration with the preconditioned GMRES method is efficient and robust for solving the systems of nonlinear equations arising from the sinc-Galerkin discretization of the time-dependent partial differential equations.

24 citations


Journal ArticleDOI
TL;DR: A High-order Markov-Switching model for measuring the risk of a portfolio and adopts the Value-at-Risk (VaR) as a metric for market risk quantification and examines the high-order effect of the underlying Markov chain on the risk measures via backtesting.
Abstract: In this paper, we introduce a High-order Markov-Switching (HMS) model for measuring the risk of a portfolio. We suppose that the rate of return from a risky portfolio follows an HMS model with the drift and the volatility modulated by a discrete-time weak Markov chain. The states of the weak Markov chain are interpreted as observable states of an economy. We adopt the Value-at-Risk (VaR) as a metric for market risk quantification and examine the high-order effect of the underlying Markov chain on the risk measures via backtesting.

23 citations


Journal ArticleDOI
TL;DR: A new categorical data clustering algorithm with automatic selection of k is proposed, which extends the k-modes clustering algorithms by introducing a penalty term to the objective function to make more clusters compete for objects.

18 citations



Journal ArticleDOI
01 Mar 2009
TL;DR: Experimental results on time-series gene expression data for the human cell cycle indicate that the novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network is promising for subnetwork mining and simulation from large biomolescular networks.
Abstract: In this paper, we present a novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to such a biomolecular network to obtain various subnetworks. Second, computational models are generated for the subnetworks and simulated to predict their behavior in the cellular context. We discuss and evaluate some of the advanced computational modeling approaches, in particular, state-space modeling, probabilistic Boolean network modeling, and fuzzy logic modeling. The modeling and simulation results represent hypotheses that are tested against high-throughput biological datasets (microarrays and/or genetic screens) under normal and perturbation conditions. Experimental results on time-series gene expression data for the human cell cycle indicate that our approach is promising for subnetwork mining and simulation from large biomolecular networks.

Journal ArticleDOI
TL;DR: The development of a model‐independent delay‐invariant deconvolution technique using least‐absolute‐deviation (LAD) regularization to improve the CBF estimation accuracy and initial clinical implementation of the method on six representative clinical cases confirm the advantages of the LAD method over rSVD and sSVD methods.
Abstract: Cerebral blood flow (CBF) estimates derived from singular value decomposition (SVD) of time intensity curves from Gadolinium bolus perfusion-weighted imaging are known to underestimate CBF, especially at high flow rates. We report the development of a model-independent delay-invariant deconvolution technique using least-absolute-deviation (LAD) regularization to improve the CBF estimation accuracy. Computer simulations were performed to compare the accuracy of CBF estimates derived from LAD, reformulated SVD (rSVD) and standard SVD (sSVD) techniques. Simulations were performed at image signal-to-noise ratios ranging from 20 to 400, cerebral blood volumes from 1% to 10%, and CBF from 2.5 mL/100 g/min to 176.5 mL/100 g/min to estimate the effect of these parameters on the accuracy of CBF estimation. The LAD method improved the CBF estimation accuracy by up to 32% in gray matter and 23% in white matter compared with rSVD and sSVD methods. LAD method also reduces the systematic bias of rSVD and sSVD methods to baseline SNR while producing more accurate and reproducible residue function calculation than either rSVD or sSVD method. Initial clinical implementation of the method on six representative clinical cases confirm the advantages of the LAD method over rSVD and sSVD methods.

Journal ArticleDOI
TL;DR: In this article, a novel generalized BFGS method is proposed for large-scale image restoration minimization problems, where the complexity per step is O(nlogn) operations and only O(n) memory allocations are required, where n is the number of image pixels.

Book ChapterDOI
24 May 2009
TL;DR: This paper proposes a wavelet inpainting model by using L 0 -norm and the total variation (TV) minimization and applies a graph cut algorithm to solve the subproblem related to TV minimization.
Abstract: In this paper, we suggest an algorithm to recover an image whose wavelet coefficients are partially lost. We propose a wavelet inpainting model by using L 0 -norm and the total variation (TV) minimization. Traditionally, L 0 -norm is replaced by L 1 -norm or L 2 -norm due to numerical difficulties. We use an alternating minimization technique to overcome these difficulties. In order to improve the numerical efficiency, we also apply a graph cut algorithm to solve the subproblem related to TV minimization. Numerical results will be given to demonstrate our advantages of the proposed algorithm.

Book ChapterDOI
19 Apr 2009
TL;DR: Budget semi-supervised learning with a resource budget, such as a limited memory insufficient to accommodate and/or process all available unlabeled data is proposed, and it is shown that this is achievable by a simple yet effective method.
Abstract: In this paper we propose to study budget semi-supervised learning , i.e., semi-supervised learning with a resource budget, such as a limited memory insufficient to accommodate and/or process all available unlabeled data. This setting is with practical importance because in most real scenarios although there may exist abundant unlabeled data, the computational resource that can be used is generally not unlimited. Effective budget semi-supervised learning algorithms should be able to adjust behaviors considering the given resource budget. Roughly, the more resource, the more exploitation on unlabeled data. As an example, in this paper we show that this is achievable by a simple yet effective method.

Journal ArticleDOI
TL;DR: A novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses to be developed, is developed and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.
Abstract: Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short-time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut-off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin-bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease-free patients to detect peaks with S/N> or =2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.

Journal ArticleDOI
TL;DR: A new penalty term is introduced to the objective function of the fuzzy k-means clustering process to enable several clusters to compete for objects, which leads to merging some cluster centres and the identification of the 'true' number of clusters.
Abstract: This paper presents a subspace k-means clustering algorithm for high-dimensional data with automatic selection of k. A new penalty term is introduced to the objective function of the fuzzy k-means clustering process to enable several clusters to compete for objects, which leads to merging some cluster centres and the identification of the 'true' number of clusters. The algorithm determines the number of clusters in a dataset by adjusting the penalty term factor. A subspace cluster validation index is proposed and employed to verify the subspace clustering results generated by the algorithm. The experimental results from both the synthetic and real data have demonstrated that the algorithm is effective in producing consistent clustering results and the correct number of clusters. Some real datasets are used to demonstrate how the proposed algorithm can determine interesting sub-clusters in the datasets.

Journal ArticleDOI
01 Jun 2009
TL;DR: This work proposes a new algorithm that combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process, and shows that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings.
Abstract: Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also shown effective in analysing a microarray dataset.

01 Jan 2009
TL;DR: Empirical studies on real-world multi- label learning tasks show that T can effectively make use of unlabeled data information to achieve performance as good as existing state-of-the-art multi-label learning algorithms.
Abstract: Multi-label learning deals with the problems when each instance can be assigned to multiple classes simultaneously, which are ubiquitous in real-world learning tasks. In this paper, we propose a new multilabel learning method, which is able to exploit unlabeled data to obtain an effective model for assigning appropriate multiple labels to instances. The proposed method is called T (TRansductive multi-label learning via Alpha Matting), which formulates transductive multi-label learning as an optimization problem. We develop an efficient algorithm which has a closed form solution to solve this optimization problem. Empirical studies on real-world multi-label learning tasks show that T can effectively make use of unlabeled data information to achieve performance as good as existing state-of-the-art multi-label learning algorithms, moreover T is much faster and can handle relatively larger data sets.

01 Jan 2009
TL;DR: This paper proposes to develop and use shrunken similarity measure to analyze and select relevant SNPs for classification problems and finds some SNPs in chromosome 2 that they contain in some genes which is relevant to Parkinson disease.
Abstract: Recent development of high-resolution single-nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. However, SNP data typi- cally has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease SNP data) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data. In this paper, we propose to develop and use shrunken dis- similarity measure to analyze and select relevant SNPs for classification problems. Examples for HapMap data and Parkinson data are given to demonstrate the effectiveness of the proposed method and illustrate it has the potential to become a useful analysis tool for SNP data sets. In particular, we find some SNPs in chromosome 2 that they contain in some genes which is relevant to Parkinson disease.

Journal ArticleDOI
TL;DR: In this paper, a coordinate gradient descent approach for minimizing the sum of a smooth function and a nonseparable convex function is presented, where a search direction is found by solving a subproblem obtained by a second-order approximation of the smooth function.
Abstract: This paper presents a coordinate gradient descent approach for minimizing the sum of a smooth function and a nonseparable convex function. We find a search direction by solving a subproblem obtained by a second-order approximation of the smooth function and adding a separable convex function. Under a local Lipschitzian error bound assumption, we show that the algorithm possesses global and local linear convergence properties. We also give some numerical tests (including image recovery examples) to illustrate the efficiency of the proposed method. AMS subject classifications: 65F22, 65K05

Book ChapterDOI
01 Jan 2009
TL;DR: Clustering high-dimensional data requires special treatment and one type of clustering methods for high dimensional data is referred to as subspace clustering, aiming at finding clusters from subspaces instead of the entire data space.
Abstract: High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is viewed as a vector of terms whose dimension is equal to the total number of unique terms in a data set, which is usually in thousands. High dimensional data occurs in business as well. In retails, for example, to effectively manage supplier relationship, suppliers are often categorized according to their business behaviors (Zhang, Huang, Qian, Xu, & Jing, 2006). The supplier’s behavior data is high dimensional, which contains thousands of attributes to describe the supplier’s behaviors, including product items, ordered amounts, order frequencies, product quality and so forth. One more example is DNA microarray data. Clustering high-dimensional data requires special treatment (Swanson, 1990; Jain, Murty, & Flynn, 1999; Cai, He, & Han, 2005; Kontaki, Papadopoulos & Manolopoulos., 2007), although various methods for clustering are available (Jain & Dubes, 1988). One type of clustering methods for high dimensional data is referred to as subspace clustering, aiming at finding clusters from subspaces instead of the entire data space. In a subspace clustering, each cluster is a set of objects identified by a subset of dimensions and different clusters are represented in different subsets of dimensions. Soft subspace clustering considers that different dimensions make different contributions to the identification of objects in a cluster. It represents the importance of a dimension as a weight that can be treated as the degree of the dimension in contribution to the cluster. Soft subspace clustering can find the cluster memberships of objects and identify the subspace of each cluster in the same clustering process.