scispace - formally typeset
Search or ask a question

Showing papers by "Michael K. Ng published in 2004"


Journal ArticleDOI
TL;DR: A new approach is developed, which allows the use of the k-means-type paradigm to efficiently cluster large data sets by using weighted dissimilarity measures for objects.

237 citations


Book
16 Dec 2004
TL;DR: This chapter discusses Iterative methods, a method for solving the differential equations of toeplitz systems, and its applications to ordinary and partial differential equations.
Abstract: 1 Notations and definitions 2 Iterative methods THEORY 3 Toeplitz systems 4 Circulant preconditioners 5 Non-circulant type preconditioners 6 Ill-conditioned Toeplitz systems 7 Structured systems APPLICATIONS 8 Applications to ordinary and partial differential equations 9 Applications to queuing networks 10 Applications to signal processing 11 Applications to image processing 12 Applications to integral equations

236 citations


Journal ArticleDOI
TL;DR: A new algorithm is proposed that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters and has excellent accuracy and usability.
Abstract: In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.

165 citations


Journal ArticleDOI
TL;DR: It is concluded that the proposed super‐resolution techniques can both improve the signal‐to‐noise ratio and augment the detectability of small activated areas in fMRI image sets acquired with thicker slices.
Abstract: The problem of increasing the slice resolution of functional MRI (fMRI) images without a loss in signal-to-noise ratio is considered. In standard fMRI experiments, increasing the slice resolution by a certain factor decreases the signal-to-noise ratio of the images with the same factor. For this purpose an adapted EPI MRI acquisition protocol is proposed, allowing one to acquire slice-shifted images from which one can generate interpolated super-resolution images, with an increased resolution in the slice direction. To solve the problem of correctness and robustness of the created super-resolution images from these slice-shifted datasets, the use of discontinuity preserving regularization methods is proposed. Tests on real morphological, synthetic functional, and real functional MR datasets have been performed, by comparing the obtained super-resolution datasets with high-resolution reference datasets. In the morphological experiments the image spatial resolution of the different types of images are compared. In the synthetic and real fMRI experiments, on the other hand, the quality of the different datasets is studied as function of their resulting activation maps. From the results obtained in this study, we conclude that the proposed super-resolution techniques can both improve the signal-to-noise ratio and augment the detectability of small activated areas in fMRI image sets acquired with thicker slices.

78 citations


Journal ArticleDOI
TL;DR: This paper applies the developed higher‐order Markov chain model for analyzing categorical data sequences to the server logs data to model the users' behavior in accessing information and to predict their behavior in the future.
Abstract: In this paper we study higher-order Markov chain models for analyzing categorical data sequences. We propose an efficient estimation method for the model parameters. Data sequences such as DNA and sales demand are used to illustrate the predicting power of our proposed models. In particular, we apply the developed higher-order Markov chain model to the server logs data. The objective here is to model the users' behavior in accessing information and to predict their behavior in the future. Our tests are based on a realistic web log and our model shows an improvement in prediction. © 2004 Wiley Periodicals, Inc. Naval Research Logistics, 2004

77 citations


Journal ArticleDOI
TL;DR: This work modifications the algorithm of [1], based on Newton's iteration and on the concept of @e-displacement rank, to the computation of the Moore-Penrose inverse of a rank-deficient Toeplitz matrix.

52 citations


Journal ArticleDOI
TL;DR: A stochastic dynamic programming model with a Markov chain for the optimization of CLV is proposed and then applied to practical data of a computer service company.
Abstract: Since the early 1980s, the concept of relationship marketing has been becoming important in general marketing, especially in the area of direct and interactive marketing. The core of relationship marketing is the maintenance of long-term relationships with the customers. However, the relationship marketing is costly and therefore, the determination of the customer lifetime value (CLV) is an important element in making strategic decisions in both advertising and promotion. In this paper, we propose a stochastic dynamic programming model with a Markov chain for the optimization of CLV. Both cases of infinite horizon and finite horizon are discussed. The model is then applied to practical data of a computer service company.

49 citations


Journal ArticleDOI
TL;DR: This paper presents an approximate inversion method for triangular Toeplitz matrices based on trigonometric polynomial interpolation and revise the approximate method proposed by Bini.

37 citations



Journal ArticleDOI
TL;DR: An iterative deblurring algorithm is derived from a wavelet framework and a methodology to finddeblurring filters and it is proved its convergence.
Abstract: Blur removal is an important problem in signal and image processing. In this article, we formulate the deblurring problem within a wavelet framework and design a methodology to find deblurring filters. Using these deblurring filters, we derive an iterative deblurring algorithm and prove its convergence. Simulation results are reported to illustrate the proposed framework and methodology. © 2004 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 14, 113–121, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.20014

14 citations


Journal ArticleDOI
01 Oct 2004
TL;DR: Experimental results show that the algorithm is capable of identifying some interesting projected clusters from real microarray data and has a low dependency on user parameters while allowing users to input some domain knowledge should they be available.
Abstract: In microarray gene expression data, clusters may hide in subspaces. Traditional clustering algorithms that make use of similarity measurements in the full input space may fail to detect the clusters. In recent years a number of algorithms have been proposed to identify this kind of projected clusters, but many of them rely on some critical parameters whose proper values are hard for users to determine. In this paper a new algorithm that dynamically adjusts its internal thresholds is proposed. It has a low dependency on user parameters while allowing users to input some domain knowledge should they be available. Experimental results show that the algorithm is capable of identifying some interesting projected clusters from real microarray data.

Journal ArticleDOI
TL;DR: This note presents higher-order Markov chain models for modelling categorical data sequences with an efficient algorithm for solving the model parameters that can be implemented easily in a Microsoft EXCEL worksheet.
Abstract: Categorical data sequences occur in many applications such as forecasting, data mining and bioinformatics. In this note, we present higher-order Markov chain models for modelling categorical data sequences with an efficient algorithm for solving the model parameters. The algorithm can be implemented easily in a Microsoft EXCEL worksheet. We give a detailed description for the implementation which is accessible and useful to anyone who is interested in the applications of higher-order Markov chain models and has some knowledge of EXCEL.

Book ChapterDOI
26 May 2004
TL;DR: Efficient and effective algorithms for identifying dense region as distinct and meaningful patterns from given data are presented and extensions of the algorithms for handling data streams are discussed.
Abstract: We introduce the notion of dense region as distinct and meaningful patterns from given data. Efficient and effective algorithms for identifying such regions are presented. Next, we discuss extensions of the algorithms for handling data streams. Finally, experiments on large-scale data streams such as clickstreams are given which confirm that the usefulness of our algorithms.

Journal ArticleDOI
TL;DR: A simple method for the reconstruction of a Jacobi matrix from eigenvalues is developed and some necessary conditions for such inverse eigenvalue problem to have solutions are given.
Abstract: In this paper, we study the inverse eigenvalue problem of a specially structured Jacobi matrix, which arises from the discretization of the differential equation governing the axial of a rod with varying cross section (Ram and Elhay 1998 Commum. Numer. Methods Engng. 14 597-608). We give a sufficient and some necessary conditions for such inverse eigenvalue problem to have solutions. Based on these results, a simple method for the reconstruction of a Jacobi matrix from eigenvalues is developed. Numerical examples are given to demonstrate our results.

Book ChapterDOI
17 Mar 2004
TL;DR: In this article, an efficient algorithm is introduced to detect user access patterns using Website topology and Web-log stream data, which can online modify a website topology so that the new topology can improve the Website connectivity to adapt current visitors' access patterns.
Abstract: When people visit Websites, they desire to efficiently and exactly access the contents they are interested in without delay. However, due to the constant changes of site contents and user patterns, the access efficiency of Websites cannot be optimized, especially in peak hours. In this paper, we first address the problems of access efficiency in Websites during peak hours and then propose new measures to evaluate access efficiency. An efficient algorithm is introduced to detect user access patterns using Website topology and Web-log stream data. Adopting this method, we can online modify a Website topology so that the new topology can improve the Website connectivity to adapt current visitors’ access patterns. A real sports Website is used to evaluate the effectiveness of our proposed method of accelerating user access to related contents. The results of the evaluation presented in this paper suggest that this method is feasible to online improve the connectivity of a Website intelligently.

Journal ArticleDOI
TL;DR: In this paper, a method that utilizes the relationship between the Perron root of a nonnegative matrix and the estimates of the row sums of its generalized Perron complement is presented.

Journal ArticleDOI
TL;DR: It is shown that if the Toeplitz matrix is nonsingular and well-conditioned, then they are numerically forward stable.

Book ChapterDOI
26 May 2004
TL;DR: A new prediction model for predicting when an online customer leaves a current page and which next Web page the customer will visit, which is based on the Kolmogorov’s backward equations is presented.
Abstract: This paper presents a new prediction model for predicting when an online customer leaves a current page and which next Web page the customer will visit. The model can forecast the total number of visits of a given Web page by all incoming users at the same time. The prediction technique can be used as a component for many Web based applications . The prediction model regards a Web browsing session as a continuous-time Markov process where the transition probability matrix can be computed from Web log data using the Kolmogorov’s backward equations. The model is tested against real Web-log data where the scalability and accuracy of our method are analyzed.

Journal ArticleDOI
TL;DR: A novel integrated data warehousing and data mining framework for Website management and patterns discovery is introduced to analyze Web user behavior and some statistical indexes and practical solutions are proposed to intelligently discover interesting user access patterns.
Abstract: A new challenge in Web usage analysis is how to manage and discover informative patterns from various types of Web data stored in structured or unstructured databases for system monitoring and decision making. In this paper, a novel integrated data warehousing and data mining framework for Website management and patterns discovery is introduced to analyze Web user behavior. The merit of the framework is that it combines multidimensional Web databases to support online analytical processing for improving Web services. Based on the model, we propose some statistical indexes and practical solutions to intelligently discover interesting user access patterns for Website optimization, Web personalization and recommendation etc. We use the Web data from a sports Website as data sources to evaluate the effectiveness of the model. The results show that this integrated data warehousing and mining model is effective and efficient to apply into practical Web applications.

01 May 2004
TL;DR: In this paper, an extension model for MDP, Higher-Order Markov Decision Model (HMDP), was proposed to overcome the limitation of MDP in predicting the profitability of a customer.
Abstract: To predict the profitability of a customer, today’s firms have to practice Customer Lifetime Value (CLV) computation. Different approaches are proposed in the last ten years to analyze the complex customer phenomenon. One of them is Markov Decision Process (MDP) model. The class of Markov Models is an effective and a flexibility decision model. Whereas the use of MDP model is limited by its assumption, in this paper, we attempt to introduce an extension model for MDP: Higher-order Markov Decision Model (HMDP). HMDP can perform excellently in CLV calculation and overcome the limitation of MDP. By using a real application, we will demonstrate how it can be used efficiently in a firm’s daily operations.

Journal Article
TL;DR: A new clustering model for generating and maintaining clusters efficiently which represent the changing Web user patterns in Websites is purposeed which can be employed in different Web applications such as personalization and recommendation systems.
Abstract: With the fast growing of the Internet and its Web users all over the world, how to manage and discover useful patterns from tremendous and evolving Web information sources become new challenges to our data engineering researchers. Also, there is a great demand on designing scalable and flexible data mining algorithms for various time-critical and data-intensive Web applications. In this paper, we purpose a new clustering model for generating and maintaining clusters efficiently which represent the changing Web user patterns in Websites. With effective pruning process, the clusters can be fast discovered and updated to reflect the current or changing user patterns to Website administrators. This model can also be employed in different Web applications such as personalization and recommendation systems.

Journal ArticleDOI
TL;DR: Some properties of the weighted Tikhonov filter matrices are given together with their filtering and regularization effects and perturbation identities for the weighted linear least squares problem and weighted pseudoinverses are presented.

Journal ArticleDOI
01 Oct 2004-Calcolo
TL;DR: The hybrid algorithm combines the evolutionary algorithm and the successive over-relaxation (SOR) method for solving linear systems of equations and it is proved the convergence of the hybrid algorithm for strictly diagonal dominant linear systems.
Abstract: In this paper, we propose a hybrid algorithm based on [12] for solving linear systems of equations. The hybrid algorithm combines the evolutionary algorithm and the successive over-relaxation (SOR) method. The evolutionary algorithm allows the relaxation parameter w to be adaptive in the SOR method. We prove the convergence of the hybrid algorithm for strictly diagonal dominant linear systems. We then apply it to solve the steady-state probability distributions of Markovian queueing systems. Numerical examples are given to demonstrate the fast convergence rate of the method.

Journal Article
TL;DR: In this paper, a W-k-means algorithm is used to automatically calculate the feature weights from the training data, which increases the stability of the classifier and reduces outliers.
Abstract: In using a classified data set to test clustering algorithms, the data points in a class are considered as one cluster (or more than one) in space. In this paper we adopt this principle to build classification models through interactively clustering a training data set to construct a tree of clusters. The leaf clusters of the tree are selected as decision clusters to classify new data based on a distance function. We consider the feature weights in calculating the distances between a new object and the center of a decision cluster. The new algorithm, W-k-means, is used to automatically calculate the feature weights from the training data. The Fastmap technique is used to handle outliers in selecting decision clusters. This step increases the stability of the classifier. Experimental results on public domain data sets have shown that the models built using this clustering approach outperformed some popular classification algorithms.

Book ChapterDOI
01 Aug 2004
TL;DR: This paper adopts the principle of interaction clustering to build classification models through interactively clustering a training data set to construct a tree of clusters and considers the feature weights in calculating the distances between a new object and the center of a decision cluster.
Abstract: In using a classified data set to test clustering algorithms, the data points in a class are considered as one cluster (or more than one) in space. In this paper we adopt this principle to build classification models through interactively clustering a training data set to construct a tree of clusters. The leaf clusters of the tree are selected as decision clusters to classify new data based on a distance function. We consider the feature weights in calculating the distances between a new object and the center of a decision cluster. The new algorithm, W-k-means, is used to automatically calculate the feature weights from the training data. The Fastmap technique is used to handle outliers in selecting decision clusters. This step increases the stability of the classifier. Experimental results on public domain data sets have shown that the models built using this clustering approach outperformed some popular classification algorithms.

Book ChapterDOI
16 Dec 2004
TL;DR: Several discretization methods for large matrices are discussed and purposed, and it is suggested that they can be employed in practical Web applications, such as user patterns discovery.
Abstract: Dense regions discovery is an important knowledge discovery process for finding distinct and meaningful patterns from given data. The challenge in dense regions discovery is how to find informative patterns from various types of data stored in structured or unstructured databases, such as mining user patterns from Web data. Therefore, novel approaches are needed to integrate and manage these multi-type data repositories to support new generation information management systems. In this paper, we focus on discussing and purposing several discretization methods for large matrices. The experiments suggest that the discretization methods can be employed in practical Web applications, such as user patterns discovery.

Journal ArticleDOI
TL;DR: Two exact algorithms are proposed to solve the steady state probability distributions of irreducible Markov chains whose generator matrices have tridiagonal structure based on divide-and-conquer procedure and a parallel algorithm.

Journal ArticleDOI
TL;DR: The problem of reconstructing a high-resolution image from multiple undersampled, shifted, degraded frames with subpixel displacement errors from multisensors is studied and it is found that cosine transform based preconditioners are effective when the number of shifted low-resolution frames are large, but are less effectivewhen the number is small.

Journal ArticleDOI
TL;DR: This work extends the multisensor work by Bose and Boo (1998) and considers the perturbations of displacement error that are due to both translation and rotation, and introduces the warping process to obtain the ideal low‐resolution image.
Abstract: We extend the multisensor work by Bose and Boo (1998) and consider the perturbations of displacement error that are due to both translation and rotation. The warping process is introduced to obtain the ideal low-resolution image, which is located at exactly horizontal and vertical shift. In this approach, the problem of high-resolution image reconstruction is turned into the problem of image restoration, and the system becomes spatially invariant rather than spatially variant in the original problem. An efficient algorithm is presented. Experimental results show that the proposed methods are quite effective, and they perform better than the bilinear image interpolation method. © 2004 Wiley Periodicals, Inc. Int J Imaging Syst Technol 14, 75–83, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.20010

Book ChapterDOI
25 Aug 2004
TL;DR: In this paper, a new clustering model for generating and maintaining clusters efficiently which represent the changing Web user patterns in Websites with effective pruning process, the clusters can be fast discovered and updated to reflect the current or changing user patterns to Website administrators.
Abstract: With the fast growing of the Internet and its Web users all over the world, how to manage and discover useful patterns from tremendous and evolving Web information sources become new challenges to our data engineering researchers Also, there is a great demand on designing scalable and flexible data mining algorithms for various time-critical and data-intensive Web applications In this paper, we purpose a new clustering model for generating and maintaining clusters efficiently which represent the changing Web user patterns in Websites With effective pruning process, the clusters can be fast discovered and updated to reflect the current or changing user patterns to Website administrators This model can also be employed in different Web applications such as personalization and recommendation systems