scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Pattern Recognition and Artificial Intelligence in 2001"


Journal ArticleDOI
TL;DR: A tutorial on learning and inference in hidden Markov models in the context of the recent literature on Bayesian networks is provided, and a discussion of Bayesian methods for model selection in generalized HMMs is discussed.
Abstract: We provide a tutorial on learning and inference in hidden Markov models in the context of the recent literature on Bayesian networks. This perspective make sit possible to consider novel generalizations to hidden Markov models with multiple hidden state variables, multiscale representations, and mixed discrete and continuous variables. Although exact inference in these generalizations is usually intractable, one can use approximate inference in these generalizations is usually intractable, one can use approximate inference algorithms such as Markov chain sampling and variational methods. We describe how such methods are applied to these generalized hidden Markov models. We conclude this review with a discussion of Bayesian methods for model selection in generalized HMMs.

760 citations


Journal ArticleDOI
TL;DR: A novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided and linguistic knowledge beyond the lexicon level is incorporated in the recognition process.
Abstract: In this paper, a system for the reading of totally unconstrained handwritten text is presented. The kernel of the system is a hidden Markov model (HMM) for handwriting recognition. The HMM is enhanced by a statistical language model. Thus linguistic knowledge beyond the lexicon level is incorporated in the recognition process. Another novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided. A number of experiments with various language models and large vocabularies have been conducted. The language models used in the system were also analytically compared based on their perplexity.

463 citations


Journal ArticleDOI
TL;DR: The classical approach based on watershed and markers is reinterpreted in a multiscale framework and new results are presented, considerably enlarging the palette of available tools.
Abstract: Mathematical morphology offers a rich framework for segmenting images and video sequences. We propose a unified presentation of these tools, expressed as a small number of distance transforms on graphs. The classical approach based on watershed and markers is reinterpreted in a multiscale framework and new results are presented, considerably enlarging the palette of available tools.

131 citations


Journal ArticleDOI
TL;DR: This paper describes an application for tracking human movement in an office-like spatial layout where the AHMM is used to track and predict the evolution of object trajectories at different levels of detail.
Abstract: In this paper, we consider the problem of tracking an object and predicting the object's future trajectory in a wide-area environment, with complex spatial layout and the use of multiple sensors/cameras To solve this problem, there is a need for representing the dynamic and noisy data in the tracking tasks, and dealing with them at different levels of detail We employ the Abstract Hidden Markov Models (AHMM), an extension of teh well-known Hidden Markov Model (HMM) and a special type of Dynamic Probabilistic Network (DPN), as our underlying representation framework The AHMM allows us to explicitly encode the hierarchy of connected spatial locations, making it scalable to the size of the environment being modeled We describe an application for tracking human movement in an office-like spatial layout where the AHMM is used to track and predict the evolution of object trajectories at different levels of detail

91 citations


Journal ArticleDOI
TL;DR: An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article.
Abstract: An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.

78 citations


Journal ArticleDOI
TL;DR: Two frameworks based on hidden Markov models are presented, designed to model and recognize gestures that vary in systematic ways, and variation in the signal is overcome by relying on online learning rather than conventional offline, batch learning.
Abstract: Conventional application of hidden Markov models to the task of recognizing human gesture may suffer from multiple sources of systematic variation in the sensor outputs. We present two frameworks based on hidden Markov models which are designed to model and recognize gestures that vary in systematic ways. In the first, the systematic variation is assumed to be communicative in nature, and the input gesture is assumed to belong to gesture family. The variation across the family is modeled explicity by the parametric hidden Markov model (PHMM). In the second framework, variation in the signal is overcome by relying on online learning rather than conventional offline, batch learning.

73 citations


Journal ArticleDOI
Premkumar Natajan1, Zhidong Lu1, Richard Schwartz1, Issam Bazzi1, John Makhoul1 
TL;DR: The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese, and an unsupervised adaptation method is described to improve performance under degraded conditions.
Abstract: This paper presents a script-independent methodology for optical character recognition (OCR) based on the use of hidden Markov models (HMM). The feature extraction, training and recognition components of the system are all designed to be script independent. The training and recognition components were taken without modification from a continuous speech recognition system; the only component that is specific to OCR is the feature extraction component. To port the system to a new language, all that is needed is text image training data from the new language, along with ground truth which gives the identity of the sequences of characters along each line of each text image, without specifying the location of the characters on the image. The parameters of the character HMMs are estimated automatically from the training data, without the need for laborious handwritten rules. The system does not require presegmentation of the data, neither at the word level nor at the character level. Thus, the system is able to handle languages with connected characters in a straightforward manner. The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese. The robustness of the system is further demonstrated by testing the system on fax data. An unsupervised adaptation method is then described to improve performance under degraded conditions.

69 citations


Journal ArticleDOI
TL;DR: A circular coded target for automatic image point measurement and identification and the applied image processing method will be described in detail, and some application examples of the circularcoded target to optical 3D-measurement techniques are shown.
Abstract: One of the primary, but tedious, tasks for the user and developer of an optical 3D-measurement system is to find the homologous image points in multiple images, a task that is frequently referred to as the correspondence problem. Along with the solution, error-free correspondence and accurate measurement of image points are of great importance, on which the qualitative results of succeeding camera calibration and 3D-measurements are immediately dependent. In fact, the automation of measurement processes is getting more important with progresses in production, and hence, is of increasing topical interest. In this paper, we present a circular coded target for automatic image point measurement and identification. The applied image processing method will be described in detail, and we will show some application examples of the circular coded target to optical 3D-measurement techniques.

66 citations


Journal ArticleDOI
Jay J. Lee1, Jahwan Kim1, Jin H. Kim1
TL;DR: A data-driven systematic method to design HMM topology, where data samples in a single pattern class are structurally simplified into a sequence of straight-line segments to form an architecture of a multiple parallel-path HMM which behaves as single HMM.
Abstract: Although HMM is widely used for online handwriting recognition, there is no simple and well-established method of designing the HMM topology. We propose a data-driven systematic method to design HMM topology. Data samples in a single pattern class are structurally simplified into a sequence of straight-line segments. Then the resulting multiple models of the class are combined to form an architecture of a multiple parallel-path HMM, which behaves as single HMM. To avoid excessive growing of the number of the states, parameter trying is applied such that structural similarity among patterns is reflected. Experiments on online Hangul recognition showed about 19% of error reductions, compared to the intuitive deisgn method.

52 citations


Journal ArticleDOI
TL;DR: This paper proposes an ontology to describe KDD agents, in the style of OOER (Object Oriented Entity Relationship) data model, and applies several AI planning techniques to solve the most difficult problem in a multiagent KDD system.
Abstract: How to increase both autonomy and versatility of a knowledge discovery system is a core problem and a crucial aspect of KDD (Knowledge Discovery and Data Mining). Within the framework of the KDD process and the GLS (Global Learning Scheme) system recently proposed by us, this paper describes a way of increasing both autonomy and versatility of a KDD system by dynamically organizing KDD processes. In our approach, the KDD process is modeled as an organized society of KDD agents with multiple levels. We propose an ontology to describe KDD agents, in the style of OOER (Object Oriented Entity Relationship) data model. Based on this ontology of KDD agents, we apply several AI planning techniques, which are implemented as a meta-agent, so that we might (1) solve the most difficult problem in a multiagent KDD system: how to automatically choose appropriate KDD techniques (KDD agents) to achieve a particular discovery goal in a particular application domain; (2) tackle the complexity of KDD process; and (3) support evolution of KDD data, knowledge and process. The GLS system, as a multistrategy and multiagent KDD system based on the methodology, increases both autonomy and versatility.

41 citations


Journal ArticleDOI
TL;DR: This article justifies the implementation of dynamic adaptive autonomy through a series of experiments showing that a multiagent system operating under dynamic Adaptive Agent Autonomy performs better than a multi Agent System operating under fixed autonomy for the same changing run-time conditions.
Abstract: Autonomy is an often cited but rarely agreed upon agent characteristic. Although no definition of agent autonomy is universally accepted, the concept of adaptive autonomy promises increasingly flexible and robust agent-based systems. In general, adaptive autonomy gives agents the ability to seek help for problems or take initiative when otherwise they would be constrained by their design to follow some fixed procedures or rules for interacting with other agents. In order to access these benefits, this article provides a core definition and representation of agent autonomy designed to support the implementation of adaptive agent autonomy. This definition identifies "decision-making control" governing the determination of agent goals and tasks as the key dimension of agent autonomy. In order to gain run-time flexibility and any associated performance improvements, agents must be able to dynamically adapt their autonomy during system operation. This article justifies the implementation of dynamic adaptive autonomy through a series of experiments showing that a multiagent system operating under dynamic adaptive autonomy performs better than a multiagent system operating under fixed autonomy for the same changing run-time conditions.

Journal ArticleDOI
TL;DR: It is observed that the cursive segments of forgery signatures are generally less smooth and less natural than the genuine ones, especially for those signatures that consist of cursive graphic patterns.
Abstract: In this paper, a method is proposed for offline signature verification. It is based on a smoothness criterion. It is observed that the cursive segments of forgery signatures are generally less smooth and less natural than the genuine ones, especially for those signatures that consist of cursive graphic patterns. Two approaches are proposed to extract a smoothness feature: a crossing method and a fractal dimension method. When the proposed smoothness feature is combined with other global shape features for signature verification, satisfactory results are obtained.

Journal ArticleDOI
TL;DR: This method first performs orthonormal shell decomposition on the line moment that is obtained from the 2-D pattern, then applies Fourier transform on each scale of the shell coefficients.
Abstract: Invariance and low dimension of features are of crucial significance in pattern recognition. This paper proposes a novel orthonormal shell Fourier descriptor that satisfies all of these demands. This method first performs orthonormal shell decomposition on the line moment that is obtained from the 2-D pattern, then applies Fourier transform on each scale of the shell coefficients. Unlike other existing wavelet-based methods, our method allows applying common orthonormal wavelets, such as Daubechies, Symmlet and Coiflet, therefore it is simple to implement. We study the structure of the filter used and develop a fast algorithm to rapidly compute the spectra of orthonormal shell coefficients. The complexity of the proposed descriptor is O(n log n). We apply a coarse-to-fine strategy to search the image database; the matching is very quick because of the multiscale feature structure. The effectiveness of this new descriptor is demonstrated by a series of experiments as well as the comparison with other descriptors. The proposed descriptor is robust to white noise.

Journal ArticleDOI
TL;DR: The analysis shows that under the conditions typical for digital image processing the curvature can rarely be estimated with a precision higher than 50%.
Abstract: The paper presents an analysis of sources of errors when estimating derivatives of numerical or noisy functions. A method of minimizing the errors is suggested. When being applied to the estimation of the curvature of digital curves, the analysis shows that under the conditions typical for digital image processing the curvature can rarely be estimated with a precision higher than 50%. Ways of overcoming the difficulties are discussed and a new method for estimating the curvature is suggested and investigated as to its precision. The method is based on specifying boundaries of regions in gray value images with subpixel precision. The method has an essentially higher precision than the known methods.

Journal ArticleDOI
TL;DR: A text categorization system, capable of analyzing HTML/text documents collected from the Web, based on a hybrid case- based architecture, where two multilayer perceptrons are integrated into a case-based reasoner.
Abstract: This paper presents a text categorization system, capable of analyzing HTML/text documents collected from the Web. The system is a component of a more extensive intelligent agent for adaptive information filtering on the Web. It is based on a hybrid case-based architecture, where two multilayer perceptrons are integrated into a case-based reasoner. An empirical evaluation of the system was performed by means of a confidence interval technique. The experimental results obtained are encouraging and support the choice of a hybrid case-based approach to text categorization.

Journal ArticleDOI
TL;DR: The Q-convexity is a kind of convexity in the discrete plane that has practically the same properties as the usual conveXity: an intersection of two Q- Convex sets is Q-Convex, and the salient points can be defined like the extremal points.
Abstract: The Q-convexity is a kind of convexity in the discrete plane. This notion has practically the same properties as the usual convexity: an intersection of two Q-convex sets is Q-convex, and the salient points can be defined like the extremal points. Moreover a Q-convex set is characterized by its salient point. The salient points can be generalized to any finite subset of ℤ2.

Journal ArticleDOI
TL;DR: Results point to the use of this type of model for the depiction of shape boundaries when it is necessary to have accurate boundary annotations as, for example, occurs in Cartogrpahy.
Abstract: This paper is concerned with an application of Hidden markov Models (HMMs) to the generation of shape boundaries from image features. In the proposed model, shape classes are defined by sequences of "shape states" each of which has a probability distribution of expected image feature types (features "symbols"). The tracking procedure uses a generalization of the well-known Viterbi method by replacing its search by a type of "beam-search" so allowing the procedure, at any time, to consider less likely features (symbols) as well the search for an instantiable optimal state sequences. We have evaluated the model's performace on a variety of image shape types and have also developed a new performance measure defined by an expected Hamming distance between predicted and observed symbol sequences. Result point to the use of this type of model for the depiction of shape boundaries when it is necessary to have accurate boundary annotations as, for example, occurs in Cartogrpahy.

Journal ArticleDOI
TL;DR: A reject rule applicable to a Multi-Expert System (MES) that allows the achievement of the best trade-off between reject and error rates as a function of the costs attributed to errors and rejects in the considered application is proposed.
Abstract: In this paper we propose a reject rule applicable to a Multi-Expert System (MES). The rule is adaptive to the given domain and allows the achievement of the best trade-off between reject and error rates as a function of the costs attributed to errors and rejects in the considered application. The results of the method are particularly effective since the method does not rely on particular statistical assumptions, as other reject rules. An experimental analysis carried out on publicly available databases is reported together with a comparison with other methods present in the literature.

Journal ArticleDOI
Bruce Maxwell1
TL;DR: The survey shows that, in addition to classic survey courses in CV/IP, there are many focused and multidisciplinary courses being taught that reportedly improve both student and faculty interest in the topic.
Abstract: This paper provides a survey of the variety of computer vision [CV] and image processing [IP] courses being taught at institutions around the world. The survey shows that, in addition to classic survey courses in CV/IP, there are many focused and multidisciplinary courses being taught that reportedly improve both student and faculty interest in the topic. It also demonstrates that students can successfully undertake a variety of complex lab assignments. In addition, this paper includes a comparative review of current textbooks and supplemental texts appropriate for CV/IP courses.

Journal ArticleDOI
TL;DR: A presentation is constructed for the fundamental group of an arbitrary graph, and a finite presentation of any subset of Z^3 of this group can be computed by an efficient algorithm.
Abstract: As its analogue in the continuous framework, the digital fundamental group represents a major information on the topology of discrete objects. However, the fundamental group is an abstract information and cannot directly be encoded in a computer using its definition. A classical mathematical way to encode a discrete group is to find a \emph{presentation} of this group. In this paper, we construct a presentation for the fundamental group of an arbitrary graph, and a finite presentation (hence encodable in the memory of a computer) of any subset of Z^3. This presentation can be computed by an efficient algorithm.

Journal ArticleDOI
TL;DR: The proposed technique, based on distinct region features and fuzzy logic principles, is designed to cope with the problems inherent in the segmentation task that the traditional merging cost functions cannot overcome.
Abstract: In this paper a novel Fuzzy Rule Based Dissimilarity Function is presented, to determine the hierarchical merging sequence in a region based segmentation scheme. The proposed technique, based on distinct region features and fuzzy logic principles, is designed to cope with the problems inherent in the segmentation task that the traditional merging cost functions cannot overcome. It combines the global (color) and local (spatial) information of the image to compare two adjacent regions in the rgb space. The validity of the approach has been subjectively and objectively verified for several types of color images such as head and shoulders, natural and texture images.

Journal ArticleDOI
TL;DR: An efficient general purpose search algorithm for alignment and an applied procedure for IC print mark quality inspection that is robust with respect to linear change of image intensity and thus can be applied to general industrial visual inspection.
Abstract: This paper presents an efficient general purpose search algorithm for alignment and an applied procedure for IC print mark quality inspection. The search algorithm is based on normalized cross-correlation and enhances it with a hierarchical resolution pyramid, dynamic programming, and pixel over-sampling to achieve subpixel accuracy on one or more targets. The general purpose search procedure is robust with respect to linear change of image intensity and thus can be applied to general industrial visual inspection. Accuracy, speed, reliability, and repeatability are all critical for the industrial use. After proper optimization, the proposed procedure was tested on the IC inspection platforms in the Mechanical Industry Research Laboratories (MIRL), Industrial Technology Research Institute (ITRI), Taiwan. The proposed method meets all these criteria and has worked well in field tests on various IC products.

Journal ArticleDOI
TL;DR: A method of automatic 3D–2D projective registration between the 3D (i.e. polygonal face surface derived from CT or MRI data) and the 2D faces of the same individual in the photographs is described.
Abstract: In this paper, we describe a method of automatic 3D–2D projective registration between the 3D (i.e. polygonal face surface derived from CT or MRI data) and the 2D faces of the same individual in the photographs. Our task is to make a realistic 3D model face for post-surgical simulation by pasting color textures accurately on the face surface. We utilize edge features such as external edge (facial outline) and internal edges like eye, nose and mouth edges from both the 3D face and photographs for matching. We define 3D edge as a set of 3D surface points which is 2D edge on the projected space. We choose 3D edges within the specific regions alone by automatically categorizing the 3D face into eye, nose, mouth and ear regions using a knowledge-based technique. Experimentally we have shown that for human face matching selected region-based edge yields better matching accuracy than that of the usual edge. Moreover, we average the root mean square (RMS) measures of the selected facial regions rather than computing a single RMS measure to obtain matching uniformity over the entire region.

Journal ArticleDOI
Jinwen Ma1
TL;DR: A new neural network approach to real-time pattern recognition on a given set of binary (or bipolar) sample patterns is presented, constructed to recognize these patterns with minimum error probability in a noisy environment.
Abstract: This paper presents a new neural network approach to real-time pattern recognition on a given set of binary (or bipolar) sample patterns. The perceptive neuron of a binary pattern is defined and constructed as a binary neuron with a neighborhood perceptive field. Letting its hidden units be the respective perceptive neurons of the patterns, a three-layer forward neural network is constructed to recognize these patterns with minimum error probability in a noisy environment. The theoretical and simulation analyses show that the network is effective for pattern recognition and can be under strict real-time constraints.

Journal ArticleDOI
TL;DR: This paper pioneers the use of artificial life for image segmentation, a challenging area in image processing, and associates each pixel in an image with a life and it evolves according to a system of rules.
Abstract: Artificial life has been successfully used for understanding biological systems and in many applications in robotics, computer graphics, etc. In this paper, we pioneer the use of artificial life for image segmentation, a challenging area in image processing. Our method associates each pixel in an image with a life and it evolves according to a system of rules. The segmented partitions emerge when the state of the lives reaches an equilibrium. The artificial life approach is promising in image processing because it is inherently parallel and coincides with the self-governing biological process. In addition, it has the advantage of the integration of both detail preservation and noise removal. The experiments demonstrate the feasibility of the artificial life approach on both intensity images and color images. We also compared the approach with other four commonly used methods for three different kinds of noise corrupted images.

Journal ArticleDOI
TL;DR: The general scope of multiagent system software engineering is reviewed with focus on the analysis and evaluation of certain aspects of the current specification standards provided by FIPA, as well as the deployment of an audio–video entertainment broadcasting (AVEB) system.
Abstract: We draw upon various practical experiences of designing and implementing complex systems through a multiagent approach which supports engineering of dynamic open distributed services. The general scope of multiagent system software engineering is reviewed with focus on the analysis and evaluation of certain aspects of the current specification standards provided by FIPA (Foundation of Intelligent Physical Agents). The benefits and drawbacks of a multiagent approach, using the FIPA standards as a benchmark, are evaluated and further illustrated through the deployment of an audio–video entertainment broadcasting (AVEB) system. The development and testing of the AVEB application was part of an EU project called FACTS (acts AC317). A main result of using agent engineering paradigm for complex distributed development, especially apparent in FIPA standards, has been the identification of the usefulness and power of its protocols. The reason for the importance of the protocols in developing multiagent systems (MAS) is it provides a degree of expressing cooperation within MAS architecture. As the protocols stand currently they are not sufficient to capture a complete explicit model of the cooperative requirements in multiagent systems. However, they provide a basis from which to start. We examine this feature of FIPA further in order to evaluate its role as a bridge between the mental agency and social agency requirements in the development of cooperation in multiagent systems.

Journal ArticleDOI
TL;DR: This paper proposes a family of exponential functions, that include Gaussian when the value of the exponent is 2, for image smoothing, and demonstrates that optimal results are obtained when the values of these functions are within a certain range.
Abstract: Noise reduction in images, also known as image smoothing, is an essential and first step before further processings of the image. The key to image smoothing is to preserve important features while removing noise from the image. Gaussian function is widely used in image smoothing. Recently it has been reported that exponential functions (value of the exponent is not equal to 2) perform substantially better than Gaussian functions in modeling and preserving image features. In this paper we propose a family of exponential functions, that include Gaussian when the value of the exponent is 2, for image smoothing. We experiment with a variety of images, artificial and real, and demonstrate that optimal results are obtained when the value of the exponent is within a certain range.

Journal ArticleDOI
TL;DR: A planar curve descriptor which is invariant to translation, size, rotation and starting point in tracing the boundary is developed based on the periodized wavelet transform for recognition of two-dimensional closed boundary curves and is extended for the recognition of occluded objects.
Abstract: A planar curve descriptor which is invariant to translation, size, rotation and starting point in tracing the boundary is developed based on the periodized wavelet transform. Coefficients obtained from the transform are divided into different bands, and feature vectors are extracted for the recognition of two-dimensional closed boundary curves. The weight vectors which include the width of different bands are also derived to differentiate spurious results arising from noisy samples. The technique is further extended for the recognition of occluded objects by incorporating local features into the features vector to form a feature map. Matching the likeliness of a part of the feature map with that of the reference feature maps indicates which class the occluded object belongs to. Experimental results were obtained to show the effectiveness of the proposed technique.

Journal ArticleDOI
TL;DR: This paper considers a hidden Markov mesh random field (HMMRF) for character recognition and employed a look-ahead scheme based on maximum marginal a posteriori probability criterion for third-order HMMRF to accelerate the computation in both phases.
Abstract: In this paper we consider a hidden Markov mesh random field (HMMRF) for character recognition. The model consists of a "hidden" Markov mesh random field (MMRF) and on overlying probabilistic observation function of the MMRF. Just like the 1-D HMM, the hidden layer is characterized by the initial and the transition probability distributions, and the ovservation layer is defined by distribution functions for vector-quantized (VQ) observations. The HMMRF-based method consists of two phases: decoding and training. The decoding and the training algorithms are developed using dynamic programming and maximum likelihood estimation methods. To accelerate the computation in both phases, we employed a look-ahead scheme based on maximum marginal a posteriori probability criterion for third-order HMMRF. Tested on a largetst-set handwritten Korean Hangul character database, the model showed a promising result: up to 87.2% recognition rate with 8 state HMMRF and 128 VQ levels.

Journal ArticleDOI
TL;DR: In this paper, a digital index theorem for digital (n - 1)-manifolds in a digital space (Rn, f), where f belongs to a large family of lighting functions on the standard cubical decomposition Rn of the n-dimensional Euclidean space, was proved.
Abstract: This paper is devoted to state and prove a Digital Index Theorem for digital (n - 1)-manifolds in a digital space (Rn, f), where f belongs to a large family of lighting functions on the standard cubical decomposition Rn of the n-dimensional Euclidean space. As an immediate consequence we obtain the corresponding theorems for all (α, β)-surfaces of Kong–Roscoe, with α, β ∈ {6, 18, 26} and (α, β) ≠ (6, 6), (18, 26), (26, 26), as well as for the strong 26-surfaces of Bertrand–Malgouyres.