scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2004"


Journal ArticleDOI
TL;DR: This paper compares the running times of several standard algorithms, as well as a new algorithm that is recently developed that works several times faster than any of the other methods, making near real-time performance possible.
Abstract: Minimum cut/maximum flow algorithms on graphs have emerged as an increasingly useful tool for exactor approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-Tarjan style "push -relabel" methods and algorithms based on Ford-Fulkerson style "augmenting paths." We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and segmentation. In many cases, our new algorithm works several times faster than any of the other methods, making near real-time performance possible. An implementation of our max-flow/min-cut algorithm is available upon request for research purposes.

4,463 citations


Journal ArticleDOI
TL;DR: A new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation that is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction.
Abstract: In this paper, a new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation. As opposed to PCA, 2DPCA is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction. Instead, an image covariance matrix is constructed directly using the original image matrices, and its eigenvectors are derived for image feature extraction. To test 2DPCA and evaluate its performance, a series of experiments were performed on three face image databases: ORL, AR, and Yale face databases. The recognition rate across all trials was higher using 2DPCA than PCA. The experimental results also indicated that the extraction of image features is computationally more efficient using 2DPCA than PCA.

3,439 citations


Journal ArticleDOI
TL;DR: The two main results are that cue combination can be performed adequately with a simple linear model and that a proper, explicit treatment of texture is required to detect boundaries in natural images.
Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are 1) that cue combination can be performed adequately with a simple linear model and 2) that a proper, explicit treatment of texture is required to detect boundaries in natural images.

2,229 citations


Journal ArticleDOI
David Nister1
TL;DR: The algorithm is used in a robust hypothesize-and-test framework to estimate structure and motion in real-time with low delay and is the first algorithm well-suited for numerical implementation that also corresponds to the inherent complexity of the problem.
Abstract: An efficient algorithmic solution to the classical five-point relative pose problem is presented. The problem is to find the possible solutions for relative camera pose between two calibrated views given five corresponding points. The algorithm consists of computing the coefficients of a tenth degree polynomial in closed form and, subsequently, finding its roots. It is the first algorithm well-suited for numerical implementation that also corresponds to the inherent complexity of the problem. We investigate the numerical precision of the algorithm. We also study its performance under noise in minimal as well as overdetermined cases. The performance is compared to that of the well-known 8 and 7-point methods and a 6-point scheme. The algorithm is used in a robust hypothesize-and-test framework to estimate structure and motion in real-time with low delay. The real-time system uses solely visual input and has been demonstrated at major conferences.

2,077 citations


Journal ArticleDOI
TL;DR: The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning making it feasible to apply them to very large grouping problems.
Abstract: Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, due to the computational demands of these approaches, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning making it feasible to apply them to very large grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems known as the Nystrom method. This method allows one to extrapolate the complete grouping solution using only a small number of samples. In doing so, we leverage the fact that there are far fewer coherent groups in a scene than pixels.

1,420 citations


Journal ArticleDOI
TL;DR: The algorithm is improved here to reduce its spatial complexity and to achieve a better performance on large graphs; its features are analyzed in detail with special reference to time and memory requirements.
Abstract: We present an algorithm for graph isomorphism and subgraph isomorphism suited for dealing with large graphs. A first version of the algorithm has been presented in a previous paper, where we examined its performance for the isomorphism of small and medium size graphs. The algorithm is improved here to reduce its spatial complexity and to achieve a better performance on large graphs; its features are analyzed in detail with special reference to time and memory requirements. The results of a testing performed on a publicly available database of synthetically generated graphs and on graphs relative to a real application dealing with technical drawings are presented, confirming the effectiveness of the approach, especially when working with large graphs.

1,344 citations


Journal ArticleDOI
TL;DR: Support Vector Tracking integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker and maximizes the SVM classification score to account for large motions between successive frames.
Abstract: Support Vector Tracking (SVT) integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker. Instead of minimizing an intensity difference function between successive frames, SVT maximizes the SVM classification score. To account for large motions between successive frames, we build pyramids from the support vectors and use a coarse-to-fine approach in the classification stage. We show results of using SVT for vehicle tracking in image sequences.

1,131 citations


Journal ArticleDOI
TL;DR: A learning-based approach to the problem of detecting objects in still, gray-scale images that makes use of a sparse, part-based representation is developed and a critical evaluation of the approach under the proposed standards is presented.
Abstract: We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in the previous work. A secondary focus of this paper is to highlight these issues, and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.

970 citations


Journal ArticleDOI
TL;DR: Experiments revealed that the proposed hybrid GA is superior to both a simple GA and sequential search algorithms, and showed better convergence properties compared to the classical GAs.
Abstract: This paper proposes a novel hybrid genetic algorithm for feature selection. Local search operations are devised and embedded in hybrid GAs to fine-tune the search. The operations are parameterized in terms of their fine-tuning power, and their effectiveness and timing requirements are analyzed and compared. The hybridization technique produces two desirable effects: a significant improvement in the final performance and the acquisition of subset-size control. The hybrid GAs showed better convergence properties compared to the classical GAs. A method of performing rigorous timing analysis was developed, in order to compare the timing requirement of the conventional and the proposed algorithms. Experiments performed with various standard data sets revealed that the proposed hybrid GA is superior to both a simple GA and sequential search algorithms.

844 citations


Journal ArticleDOI
TL;DR: A statistical basis for a process often described in computer vision: image segmentation by region merging following a particular order in the choice of regions is explored, leading to a fast segmentation algorithm tailored to processing images described using most common numerical pixel attribute spaces.
Abstract: This paper explores a statistical basis for a process often described in computer vision: image segmentation by region merging following a particular order in the choice of regions. We exhibit a particular blend of algorithmics and statistics whose segmentation error is, as we show, limited from both the qualitative and quantitative standpoints. This approach can be efficiently approximated in linear time/space, leading to a fast segmentation algorithm tailored to processing images described using most common numerical pixel attribute spaces. The conceptual simplicity of the approach makes it simple to modify and cope with hard noise corruption, handle occlusion, authorize the control of the segmentation scale, and process unconventional data such as spherical images. Experiments on gray-level and color images, obtained with a short readily available C-code, display the quality of the segmentations obtained.

843 citations


Journal ArticleDOI
TL;DR: A template update algorithm is proposed that avoids the "drifting" inherent in the naive algorithm and remains a good model of the tracked object.
Abstract: Template tracking dates back to the 1981 Lucas-Kanade algorithm. One question that has received very little attention, however, is how to update the template so that it remains a good model of the tracked object. We propose a template update algorithm that avoids the "drifting" inherent in the naive algorithm.

Journal ArticleDOI
TL;DR: An edit-distance algorithm for shock graphs that finds the optimal deformation path in polynomial time is employed and gives intuitive correspondences for a variety of shapes and is robust in the presence of a wide range of visual transformations.
Abstract: This paper presents a novel framework for the recognition of objects based on their silhouettes. The main idea is to measure the distance between two shapes as the minimum extent of deformation necessary for one shape to match the other. Since the space of deformations is very high-dimensional, three steps are taken to make the search practical: 1) define an equivalence class for shapes based on shock-graph topology, 2) define an equivalence class for deformation paths based on shock-graph transitions, and 3) avoid complexity-increasing deformation paths by moving toward shock-graph degeneracy. Despite these steps, which tremendously reduce the search requirement, there still remain numerous deformation paths to consider. To that end, we employ an edit-distance algorithm for shock graphs that finds the optimal deformation path in polynomial time. The proposed approach gives intuitive correspondences for a variety of shapes and is robust in the presence of a wide range of visual transformations. The recognition rates on two distinct databases of 99 and 216 shapes each indicate highly successful within category matches (100 percent in top three matches), which render the framework potentially usable in a range of shape-based recognition applications.

Journal ArticleDOI
TL;DR: This paper proposes the concept of feature saliency and introduces an expectation-maximization algorithm to estimate it, in the context of mixture-based clustering, and extends the criterion and algorithm to simultaneously estimate the feature saliencies and the number of clusters.
Abstract: Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

Journal ArticleDOI
Zhengyou Zhang1
TL;DR: It is shown that camera calibration is not possible with free-moving 1D objects, but can be solved if one point is fixed, and a closed-form solution is developed if six or more observations of such a 1D object are made.
Abstract: Camera calibration has been studied extensively in computer vision and photogrammetry and the proposed techniques in the literature include those using 3D apparatus (two or three planes orthogonal to each other or a plane undergoing a pure translation, etc.), 2D objects (planar patterns undergoing unknown motions), and 0D features (self-calibration using unknown scene points). Yet, the paper proposes a new calibration technique using 1D objects (points aligned on a line), thus filling the missing dimension in calibration. In particular, we show that camera calibration is not possible with free-moving 1D objects, but can be solved if one point is fixed. A closed-form solution is developed if six or more observations of such a 1D object are made. For higher accuracy, a nonlinear technique based on the maximum likelihood criterion is then used to refine the estimate. Singularities have also been studied. Besides the theoretical aspect, the proposed technique is also important in practice especially when calibrating multiple cameras mounted apart from each other, where the calibration objects are required to be visible simultaneously.

Journal ArticleDOI
TL;DR: A novel Gabor-based kernel Principal Component Analysis (PCA) method is presented by integrating the Gabor wavelet representation of face images and the kernel PCA method for face recognition, extended to include fractional power polynomial models for enhanced face recognition performance.
Abstract: This paper presents a novel Gabor-based kernel principal component analysis (PCA) method by integrating the Gabor wavelet representation of face images and the kernel PCA method for face recognition. Gabor wavelets first derive desirable facial features characterized by spatial frequency, spatial locality, and orientation selectivity to cope with the variations due to illumination and facial expression changes. The kernel PCA method is then extended to include fractional power polynomial models for enhanced face recognition performance. A fractional power polynomial, however, does not necessarily define a kernel function, as it might not define a positive semidefinite Gram matrix. Note that the sigmoid kernels, one of the three classes of widely used kernel functions (polynomial kernels, Gaussian kernels, and sigmoid kernels), do not actually define a positive semidefinite Gram matrix either. Nevertheless, the sigmoid kernels have been successfully used in practice, such as in building support vector machines. In order to derive real kernel PCA features, we apply only those kernel PCA eigenvectors that are associated with positive eigenvalues. The feasibility of the Gabor-based kernel PCA method with fractional power polynomial models has been successfully tested on both frontal and pose-angled face recognition, using two data sets from the FERET database and the CMU PIE database, respectively. The FERET data set contains 600 frontal face images of 200 subjects, while the PIE data set consists of 680 images across five poses (left and right profiles, left and right half profiles, and frontal view) with two different facial expressions (neutral and smiling) of 68 subjects. The effectiveness of the Gabor-based kernel PCA method with fractional power polynomial models is shown in terms of both absolute performance indices and comparative performance against the PCA method, the kernel PCA method with polynomial kernels, the kernel PCA method with fractional power polynomial models, the Gabor wavelet-based PCA method, and the Gabor wavelet-based kernel PCA method with polynomial kernels.

Journal ArticleDOI
TL;DR: It is shown that an efficient face detection system does not require any costly local preprocessing before classification of image areas, and provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases.
Abstract: In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns.

Journal ArticleDOI
TL;DR: Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.
Abstract: A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.

Journal ArticleDOI
TL;DR: This work shows how multiple human objects are segmented and their global motions are tracked in 3D using ellipsoid human shape models and estimates the modes (e.g., walking, running, standing) of the locomotion and 3D body postures by making inference in a prior locomotion model.
Abstract: Tracking multiple humans in complex situations is challenging. The difficulties are tackled with appropriate knowledge in the form of various models in our approach. Human motion is decomposed into its global motion and limb motion. In the first part, we show how multiple human objects are segmented and their global motions are tracked in 3D using ellipsoid human shape models. Experiments show that it successfully applies to the cases where a small number of people move together, have occlusion, and cast shadow or reflection. In the second part, we estimate the modes (e.g., walking, running, standing) of the locomotion and 3D body postures by making inference in a prior locomotion model. Camera model and ground plane assumptions provide geometric constraints in both parts. Robust results are shown on some difficult sequences.

Journal ArticleDOI
TL;DR: A tracking method which tracks the complete object regions, adapts to changing visual features, and handles occlusions, which has two major components related to the visual features and the object shape.
Abstract: We propose a tracking method which tracks the complete object regions, adapts to changing visual features, and handles occlusions. Tracking is achieved by evolving the contour from frame to frame by minimizing some energy functional evaluated in the contour vicinity defined by a band. Our approach has two major components related to the visual features and the object shape. Visual features (color, texture) are modeled by semiparametric models and are fused using independent opinion polling. Shape priors consist of shape level sets and are used to recover the missing object regions during occlusion. We demonstrate the performance of our method in real sequences with and without object occlusions.

Journal ArticleDOI
TL;DR: A precise definition of the image foresting transform is given, and a procedure to compute it-a generalization of Dijkstra's algorithm-with a proof of correctness is given.
Abstract: The image foresting transform (IFT) is a graph-based approach to the design of image processing operators based on connectivity. It naturally leads to correct and efficient implementations and to a better understanding of how different operators relate to each other. We give here a precise definition of the IFT, and a procedure to compute it-a generalization of Dijkstra's algorithm-with a proof of correctness. We also discuss implementation issues and illustrate the use of the IFT in a few applications.

Journal ArticleDOI
TL;DR: This work uses a Fourier basis to represent tangents to the shape spaces and a gradient-based shooting method to solve for the tangent that connects any two shapes via a geodesic.
Abstract: For analyzing shapes of planar, closed curves, we propose differential geometric representations of curves using their direction functions and curvature functions. Shapes are represented as elements of infinite-dimensional spaces and their pairwise differences are quantified using the lengths of geodesics connecting them on these spaces. We use a Fourier basis to represent tangents to the shape spaces and then use a gradient-based shooting method to solve for the tangent that connects any two shapes via a geodesic. Using the Surrey fish database, we demonstrate some applications of this approach: 1) interpolation and extrapolations of shape changes, 2) clustering of objects according to their shapes, 3) statistics on shape spaces, and 4) Bayesian extraction of shapes in low-quality images.

Journal ArticleDOI
TL;DR: The fundamental trade off between spatial resolution and temporal resolution is exploited to construct a hybrid camera that can measure its own motion during image integration and show that, with minimal resources, hybrid imaging outperforms previous approaches to the motion blur problem.
Abstract: Motion blur due to camera motion can significantly degrade the quality of an image. Since the path of the camera motion can be arbitrary, deblurring of motion blurred images is a hard problem. Previous methods to deal with this problem have included blind restoration of motion blurred images, optical correction using stabilized lenses, and special CMOS sensors that limit the exposure time in the presence of motion. In this paper, we exploit the fundamental trade off between spatial resolution and temporal resolution to construct a hybrid camera that can measure its own motion during image integration. The acquired motion information is used to compute a point spread function (PSF) that represents the path of the camera during integration. This PSF is then used to deblur the image. To verify the feasibility of hybrid imaging for motion deblurring, we have implemented a prototype hybrid camera. This prototype system was evaluated in different indoor and outdoor scenes using long exposures and complex camera motion paths. The results show that, with minimal resources, hybrid imaging outperforms previous approaches to the motion blur problem. We conclude with a brief discussion on how our ideas can be extended beyond the case of global camera motion to the case where individual objects in the scene move with different velocities.

Journal ArticleDOI
TL;DR: This paper is the first attempt to determine the explicit limits of reconstruction-based algorithms, under both real and synthetic conditions, from the conditioning analysis of the coefficient matrix based on the perturbation theory of linear systems.
Abstract: Superresolution is a technique that can produce images of a higher resolution than that of the originally captured ones. Nevertheless, improvement in resolution using such a technique is very limited in practice. This makes it significant to study the problem: "Do fundamental limits exist for Superresolution?" In this paper, we focus on a major class of superresolution algorithms, called the reconstruction-based algorithms, which compute high-resolution images by simulating the image formation process. Assuming local translation among low-resolution images, this paper is the first attempt to determine the explicit limits of reconstruction-based algorithms, under both real and synthetic conditions. Based on the perturbation theory of linear systems, we obtain the superresolution limits from the conditioning analysis of the coefficient matrix. Moreover, we determine the number of low-resolution images that are sufficient to achieve the limit. Both real and synthetic experiments are carried out to verify our analysis.

Journal ArticleDOI
TL;DR: It is proved that the imaginary part is a smoothed second derivative, scaled by time, when the complex diffusion coefficient approaches the real axis, and developed two examples of nonlinear complex processes, useful in image processing.
Abstract: The linear and nonlinear scale spaces, generated by the inherently real-valued diffusion equation, are generalized to complex diffusion processes, by incorporating the free Schrodinger equation. A fundamental solution for the linear case of the complex diffusion equation is developed. Analysis of its behavior shows that the generalized diffusion process combines properties of both forward and inverse diffusion. We prove that the imaginary part is a smoothed second derivative, scaled by time, when the complex diffusion coefficient approaches the real axis. Based on this observation, we develop two examples of nonlinear complex processes, useful in image processing: a regularized shock filter for image enhancement and a ramp preserving denoising process.

Journal ArticleDOI
TL;DR: This work proposes a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion that yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors.
Abstract: We propose a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion. In contrast with previous computational designs, our approach is truly 2D and provides filters that have closed-form expressions. It also yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors. We illustrate the method with the design of operators for edge and ridge detection. We present some experimental results that demonstrate the performance improvement of these new feature detectors. We propose computationally efficient local optimization algorithms for the estimation of feature orientation. We also introduce the notion of shape-adaptable feature detection and use it for the detection of image corners.

Journal ArticleDOI
TL;DR: This work has formulated the tracking problem in terms of local bundle adjustment and developed a method for establishing image correspondences that can equally well handle short and wide-baseline matching and results in a real-time tracker that does not jitter or drift and can deal with significant aspect changes.
Abstract: We propose an efficient real-time solution for tracking rigid objects in 3D using a single camera that can handle large camera displacements, drastic aspect changes, and partial occlusions. While commercial products are already available for offline camera registration, robust online tracking remains an open issue because many real-time algorithms described in the literature still lack robustness and are prone to drift and jitter. To address these problems, we have formulated the tracking problem in terms of local bundle adjustment and have developed a method for establishing image correspondences that can equally well handle short and wide-baseline matching. We then can merge the information from preceding frames with that provided by a very limited number of keyframes created during a training stage, which results in a real-time tracker that does not jitter or drift and can deal with significant aspect changes.

Journal ArticleDOI
TL;DR: This paper first model face difference with three components: intrinsic difference, transformation difference, and noise, and builds a unified framework by using this face difference model and a detailed subspace analysis on the three components.
Abstract: PCA, LDA, and Bayesian analysis are the three most representative subspace face recognition approaches. In this paper, we show that they can be unified under the same framework. We first model face difference with three components: intrinsic difference, transformation difference, and noise. A unified framework is then constructed by using this face difference model and a detailed subspace analysis on the three components. We explain the inherent relationship among different subspace methods and their unique contributions to the extraction of discriminating information from the face difference. Based on the framework, a unified subspace analysis method is developed using PCA, Bayes, and LDA as three steps. A 3D parameter space is constructed using the three subspace dimensions as axes. Searching through this parameter space, we achieve better recognition performance than standard subspace methods.

Journal ArticleDOI
TL;DR: This paper reviews the advances in online Chinese character recognition (OLCCR), with emphasis on the research works from the 1990s, in terms of pattern representation, character classification, learning/adaptation, and contextual processing.
Abstract: Online handwriting recognition is gaining renewed interest owing to the increase of pen computing applications and new pen input devices. The recognition of Chinese characters is different from western handwriting recognition and poses a special challenge. To provide an overview of the technical status and inspire future research, this paper reviews the advances in online Chinese character recognition (OLCCR), with emphasis on the research works from the 1990s. Compared to the research in the 1980s, the research efforts in the 1990s aimed to further relax the constraints of handwriting, namely, the adherence to standard stroke orders and stroke numbers and the restriction of recognition to isolated characters only. The target of recognition has shifted from regular script to fluent script in order to better meet the requirements of practical applications. The research works are reviewed in terms of pattern representation, character classification, learning/adaptation, and contextual processing. We compare important results and discuss possible directions of future research.

Journal ArticleDOI
TL;DR: This work examines a number of optimization criteria, and extends their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement.
Abstract: Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from the reduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits.

Journal ArticleDOI
TL;DR: The use of language models is shown to improve the accuracy of the system and the approach is described in detail and compared with other methods presented in the literature to deal with the same problem.
Abstract: This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of statistical language models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, the error rate is reduced by /spl sim/50 percent for single writer data and by /spl sim/25 percent for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed.