scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Pattern Recognition and Artificial Intelligence in 2004"


Journal ArticleDOI
TL;DR: This paper will try to characterize the role that graphs play within the Pattern Recognition field, and presents two taxonomies that include almost all the graph matching algorithms proposed from the late seventies and describes the different classes of algorithms.
Abstract: A recent paper posed the question: "Graph Matching: What are we really talking about?". Far from providing a definite answer to that question, in this paper we will try to characterize the role that graphs play within the Pattern Recognition field. To this aim two taxonomies are presented and discussed. The first includes almost all the graph matching algorithms proposed from the late seventies, and describes the different classes of algorithms. The second taxonomy considers the types of common applications of graph-based techniques in the Pattern Recognition and Machine Vision field.

1,517 citations


Journal ArticleDOI
TL;DR: This paper describes a novel approach for signature verification and identification in an offline environment based on a quasi-multiresolution technique using GSC (Gradient, Structural and Concavity) features for feature extraction using a mapping from the handwriting domain to the signature domain.
Abstract: This paper describes a novel approach for signature verification and identification in an offline environment based on a quasi-multiresolution technique using GSC (Gradient, Structural and Concavity) features for feature extraction. These features when used at the word level, instead of the character level, yield promising results with accuracies as high as 78% and 93% for verification and identification, respectively. This method was successfully employed in our previous theory of individuality of handwriting developed at CEDAR — based on obtaining within and between writer statistical distance distributions. In this paper, exploring signature verification and identification as offline handwriting verification and identification tasks respectively, we depict a mapping from the handwriting domain to the signature domain.

343 citations


Journal ArticleDOI
TL;DR: Experimental results are reported on a syntax-constrained interpretation task which show the effectiveness of the proposed approaches, and are shown to be comparatively better than those achieved with other conventional, N-gram-based techniques which do not take advantage of full integration.
Abstract: The interpretation of handwritten sentences is carried out using a holistic approach in which both text image recognition and the interpretation itself are tightly integrated. Conventional approaches follow a serial, first-recognition then-interpretation scheme which cannot adequately use semantic–pragmatic knowledge to recover from recognition errors. Stochastic finite-sate transducers are shown to be suitable models for this integration, permitting a full exploitation of the final interpretation constraints. Continuous-density hidden Markov models are embedded in the edges of the transducer to account for lexical and morphological constraints. Robustness with respect to stroke vertical variability is achieved by integrating tangent vectors into the emission densities of these models. Experimental results are reported on a syntax-constrained interpretation task which show the effectiveness of the proposed approaches. These results are also shown to be comparatively better than those achieved with other conventional, N-gram-based techniques which do not take advantage of full integration.

132 citations


Journal ArticleDOI
TL;DR: The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.
Abstract: In this paper we describe a classification method that allows the use of graph-based representations of data instead of traditional vector-based representations. We compare the vector approach combined with the k-Nearest Neighbor (k-NN) algorithm to the graph-matching approach when classifying three different web document collections, using the leave-one-out approach for measuring classification accuracy. We also compare the performance of different graph distance measures as well as various document representations that utilize graphs. The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.

86 citations


Journal ArticleDOI
TL;DR: Stating the optimal selection of genes as a search task, an automatic and robust choice in the genes finally selected is performed, in contrast to previous works that research the same types of problems.
Abstract: Despite the fact that cancer classification has considerably improved, nowadays a general method that classifies known types of cancer has not yet been developed. In this work, we propose the use of supervised classification techniques, coupled with feature subset selection algorithms, to automatically perform this classification in gene expression datasets. Due to the large number of features of gene expression datasets, the search of a highly accurate combination of features is done by means of the new Estimation of Distribution Algorithms paradigm. In order to assess the accuracy level of the proposed approach, the naive-Bayes classification algorithm is employed in a wrapper form. Promising results are achieved, in addition to a considerable reduction in the number of genes. Stating the optimal selection of genes as a search task, an automatic and robust choice in the genes finally selected is performed, in contrast to previous works that research the same types of problems.

84 citations


Journal ArticleDOI
TL;DR: A color image segmentation scheme that performs the segmentation in the combined intensity- texture-position feature space in order to produce regions that correspond to the real-life objects shown in the image and an approach to large-format image segmen- tation, both focused on breaking down semantic objects for object-based multimedia ap- plications.
Abstract: In this paper, a color image segmentation al- gorithm and an approach to large-format image segmen- tation are presented, both focused on breaking down im- ages to semantic objects for object-based multimedia ap- plications. The proposed color image segmentation algo- rithm performs the segmentation in the combined intensity- texture-position feature space in order to produce con- nected regions that correspond to the real-life objects shown in the image. A preprocessing stage of conditional image fil- tering and a modified K-Means-with-connectivity-constraint pixel classification algorithm are used to allow for seamless integration of the different pixel features. Unsupervised op- eration of the segmentation algorithm is enabled by means of an initial clustering procedure. The large-format image seg- mentation scheme employs the aforementioned segmenta- tion algorithm, providing an elegant framework for the fast segmentation of relatively large images. In this framework, the segmentation algorithm is applied to reduced versions of the original images, in order to speed-up the completion of the segmentation, resulting in a coarse-grained segmen- tation mask. The final fine-grained segmentation mask is produced with partial reclassification of the pixels of the original image to the already formed regions, using a Bayes classifier. As shown by experimental evaluation, this novel scheme provides fast segmentation with high perceptual seg- mentation quality.

78 citations


Journal ArticleDOI
TL;DR: A probabilistic formulation of SORGs is presented that includes as particular cases the two previously proposed approaches based on random graphs, namely the First-Order Random Graphs (FORGs) and the Function-Describedgraphs (FDGs).
Abstract: The aim of this article is to present a random graph representation, that is based on second-order relations between graph elements, for modeling sets of attributed graphs (AGs). We refer to these models as Second-Order Random Graphs (SORGs). The basic feature of SORGs is that they include both marginal probability functions of graph elements and second-order joint probability functions. This allows a more precise description of both the structural and semantic information contents in a set of AGs and, consequently, an expected improvement in graph matching and object recognition. The article presents a probabilistic formulation of SORGs that includes as particular cases the two previously proposed approaches based on random graphs, namely the First-Order Random Graphs (FORGs) and the Function-Described Graphs (FDGs). We then propose a distance measure derived from the probability of instantiating a SORG into an AG and an incremental procedure to synthesize SORGs from sequences of AGs. Finally, SORGs are shown to improve the performance of FORGs, FDGs and direct AG-to-AG matching in three experimental recognition tasks: one in which AGs are randomly generated and the other two in which AGs represent multiple views of 3D objects (either synthetic or real) that have been extracted from color images. In the last case, object learning is achieved through the synthesis of SORG models.

55 citations


Journal ArticleDOI
TL;DR: This paper shows how the eigenstructure of the adjacency matrix can be used for the purposes of robust graph matching, by finding the sequence of string edit operations which minimize edit distance.
Abstract: This paper shows how the eigenstructure of the adjacency matrix can be used for the purposes of robust graph matching. We commence from the observation that the leading eigenvector of a transition probability matrix is the steady state of the associated Markov chain. When the transition matrix is the normalized adjacency matrix of a graph, then the leading eigenvector gives the sequence of nodes of the steady state random walk on the graph. We use this property to convert the nodes in a graph into a string where the node-order is given by the sequence of nodes visited in the random walk. We match graphs represented in this way, by finding the sequence of string edit operations which minimize edit distance.

52 citations


Journal ArticleDOI
TL;DR: Segmentation of ovarian ultrasound images using cellular neural networks (CNNs) is studied in this paper and the recognition rate of follicles was around 60% and misidentification rate was around 30%.
Abstract: Segmentation of ovarian ultrasound images using cellular neural networks (CNNs) is studied in this paper. The segmentation method consists of five successive steps where the first four uses CNNs. In the first step, only rough position of follicles is determined. In the second step, the results are improved by expansion of detected follicles. In the third step, previously undetected inexpressive follicles are determined, while the fourth step detects the position of ovary. All results are joined in the fifth step. The templates for CNNs were obtained by applying genetic algorithm. The segmentation method has been tested on 50 ovarian ultrasound images. The recognition rate of follicles was around 60% and misidentification rate was around 30%.

52 citations


Journal ArticleDOI
TL;DR: By means of statistical experiments it was proved that the optimized RBPNN by the designed GA, have still a better generalization performance with respect to the ones by the ROLSA and the MKA, in spite of the network scale having been greatly reduced.
Abstract: This paper discusses using genetic algorithms (GA) to optimize the structure of radial basis probabilistic neural networks (RBPNN), including how to select hidden centers of the first hidden layer and to determine the controlling parameter of Gaussian kernel functions. In the process of constructing the genetic algorithm, a novel encoding method is proposed for optimizing the RBPNN structure. This encoding method can not only make the selected hidden centers sufficiently reflect the key distribution characteristic in the space of training samples set and reduce the hidden centers number as few as possible, but also simultaneously determine the optimum controlling parameters of Gaussian kernel functions matching the selected hidden centers. Additionally, we also constructively propose a new fitness function so as to make the designed RBPNN as simple as possible in the network structure in the case of not losing the network performance. Finally, we take the two benchmark problems of discriminating two-spiral problem and classifying the iris data, for example, to test and evaluate this designed GA. The experimental results illustrate that our designed GA can significantly reduce the required hidden centers number, compared with the recursive orthogonal least square algorithm (ROLSA) and the modified K-means algorithm (MKA). In particular, by means of statistical experiments it was proved that the optimized RBPNN by our designed GA, have still a better generalization performance with respect to the ones by the ROLSA and the MKA, in spite of the network scale having been greatly reduced. Additionally, our experimental results also demonstrate that our designed GA is also suitable for optimizing the radial basis function neural networks (RBFNN).

41 citations


Journal ArticleDOI
TL;DR: It is shown here how low-level features can be related to semantic photo categories, such as indoor, outdoor and close-up, using decision forests consisting of trees constructed according to CART methodology.
Abstract: Annotating photographs with broad semantic labels can be useful in both image processing and content-based image retrieval. We show here how low-level features can be related to semantic photo categories, such as indoor, outdoor and close-up, using decision forests consisting of trees constructed according to CART methodology. We also show how the results can be improved by introducing a rejection option in the classification process. Experimental results on a test set of 4,500 photographs are reported and discussed.

Journal ArticleDOI
TL;DR: This paper will use two EDAs to obtain not the best structure, but the optimal ordering of variables for the K2 algorithm: UMDA and MIMIC, both of them in discrete and continuous domains and check whether the individual representation and its relation to the corresponding ordering play important roles.
Abstract: The search for the optimal ordering of a set of variables in order to solve a computational problem is a dicult y that can appear in several circumstances. One of these situations is the automatic learning of a network structure, for example, a Bayesian Network structure (BN) starting from a dataset. Searching in the space of structures is often unmanageable, especially if the number of variables is high. Popular heuristic approaches, like Cooper and Herskovits’s K2 algorithm, depend on a given ordering of variables. Estimation of Distribution Algorithms (EDAs) are a new paradigm for Evolutionary Computation that have been used as a search engine in the BN structure learning problem. In this paper, we will use two dieren t EDAs to obtain not the best structure, but the optimal ordering of variables for the K2 algorithm: UMDA and MIMIC, both of them in discrete and continuous domains. We will also check whether the individual representation and its relation to the corresponding ordering play important roles, and whether MIMIC outperforms the results of UMDA.

Journal ArticleDOI
TL;DR: This paper explores how graph eigenspaces have been used to encode many different properties of graphs and how such methods can be used for solving inexact graph matching.
Abstract: Graph eigenspaces have been used to encode many different properties of graphs. In this paper we explore how such methods can be used for solving inexact graph matching (the matching of sets of vertices in one graph to those in another) having the same or different numbers of vertices. In this case we explore eigen-subspace projections and vertex clustering (EPS) methods. The correspondence algorithm enables the EPC method to discover a range of correspondence relationships from one-to-one vertex matching to that of inexact (many-to-many) matching of structurally similar subgraphs based on the similarities of their vertex connectivities defined by their positions in the common subspace. Examples in shape recognition and random graphs are used to illustrate this method.

Journal ArticleDOI
TL;DR: It is shown that combining of the SVDD descriptions improves the retrieval performance with respect to ranking, on the contrary to the Mahalanobis case.
Abstract: A flexible description of images is offered by a cloud of points in a feature space. In the context of image retrieval such clouds can be represented in a number of ways. Two approaches are here considered. The first approach is based on the assumption of a normal distribution, hence homogeneous clouds, while the second one focuses on the boundary description, which is more suitable for multimodal clouds. The images are then compared either by using the Mahalanobis distance or by the support vector data description (SVDD), respectively. The paper investigates some possibilities of combining the image clouds based on the idea that responses of several cloud descriptions may convey a pattern, specific for semantically similar images. A ranking of image dissimilarities is used as a comparison for two image databases targeting image classification and retrieval problems. We show that combining of the SVDD descriptions improves the retrieval performance with respect to ranking, on the contrary to the Mahalanobis case. Surprisingly, it turns out that the ranking of the Mahalanobis distances works well also for inhomogeneous images.

Journal ArticleDOI
TL;DR: Two distance measures for attributed graphs are presented that are based on the maximal similarity common subgraph of two graphs that can deal not only with structural distortions, but also with perturbations of attributes.
Abstract: Two distance measures for attributed graphs are presented that are based on the maximal similarity common subgraph of two graphs. They are generalizations of two existing distance measures based on the maximal common subgraph. The new measures are superior to the well-known measures based on elementary edit transformations in that no particular edit operations (together with their costs) need to be defined. Moreover, they can deal not only with structural distortions, but also with perturbations of attributes. It is shown that the new distance measures are metrics.

Journal ArticleDOI
TL;DR: The genetic algorithm (GA) is used to evolve the configuration and the training parameter set of the neural network to solve the online CCP recognition problem and indicates that the proposed GA can evolve neural network architecture while simultaneously determining training parameters to maximize efficiently the performance of the online CCP recognizers.
Abstract: Pattern recognition is an important issue in statistical process control (SPC) because unnatural patterns exhibited by control charts can be associated with specific assignable causes adversely affecting the process. Artificial neural networks have been widely investigated as an effective approach to control chart pattern (CCP) recognition in recent years. However, an overwhelming majority of these applications has used trial-and-error experiments to determine the network architecture and training parameters, which are crucial to the performance of the network. In this paper, the genetic algorithm (GA) is used to evolve the configuration and the training parameter set of the neural network to solve the online CCP recognition problem. Numerical results are provided that indicate that the proposed GA can evolve neural network architecture while simultaneously determining training parameters to maximize efficiently the performance of the online CCP recognizers. Because the population size is a major parameter of GA processing speed, an investigation was also conducted to identify the effects of the population size on the performance of the proposed GA. This research further confirms the feasibility of using GA to evolve neural networks. Although a back-propagation-based CCP recognizer is the particular application presented here, the proposed GA methodology can be applied to neural networks in general.

Journal ArticleDOI
TL;DR: The performance of Bagging, Boosting and Error-Correcting Output Code (ECOC) is compared for five decision tree pruning methods and the influence of pruning on the performance of the ensembles is studied.
Abstract: Design of ensemble classifiers involves three factors: 1) a learning algorithm to produce a classifier (base classifier), 2) an ensemble method to generate diverse classifiers, and 3) a combining method to combine decisions made by base classifiers. With regard to the first factor, a good choice for constructing a classifier is a decision tree learning algorithm. However, a possible problem with this learning algorithm is its complexity which has only been addressed previously in the context of pruning methods for individual trees. Furthermore, the ensemble method may require the learning algorithm to produce a complex classifier. Considering the fact that performance of simplification methods as well as ensemble methods changes from one domain to another, our main contribution is to address a simplification method (post-pruning) in the context of ensemble methods including Bagging, Boosting and Error-Correcting Output Code (ECOC). Using a statistical test, the performance of ensembles made by Bagging, Boosting and ECOC as well as five pruning methods in the context of ensembles is compared. In addition to the implementation a supporting theory called Margin, is discussed and the relationship of Pruning to bias and variance is explained. For ECOC, the effect of parameters such as code length and size of training set on performance of Pruning methods is also studied. Decomposition methods such as ECOC are considered as a solution to reduce complexity of multi-class problems in many real problems such as face recognition. Focusing on the decomposition methods, AdaBoost.OC which is a combination of Boosting and ECOC is compared with the pseudo-loss based version of Boosting, AdaBoost.M2. In addition, the influence of pruning on the performance of ensembles is studied. Motivated by the result that both pruned and unpruned ensembles made by AdaBoost.OC have similar accuracy, pruned ensembles are compared with ensembles of single node decision trees. This results in the hypothesis that ensembles of simple classifiers may give better performance as shown for AdaBoost.OC on the identification problem in face recognition. The implication is that in some problems to achieve best accuracy of an ensemble, it is necessary to select base classifier complexity.

Journal ArticleDOI
TL;DR: This paper reviews the particularities of graph structures representing technical drawings and classify them in two categories, depending on whether the structure that they represent consists of prototype patterns or repetitive patterns, and proposes a combined strategy for recognition.
Abstract: Symbol recognition is a well-known challenge in the field of graphics recognition. Due to the representational power of graph structures, a number of graph-based approaches are used to answer whether a known symbol appears in a document and under which degree of confidence. In this paper, we review the particularities of graph structures representing technical drawings and we classify them in two categories, depending on whether the structure that they represent consists of prototype patterns or repetitive patterns. The recognition is then formulated in terms of graph matching or graph parsing, respectively. Since some symbols consist of two types of structures, the main contribution of this work is to propose a combined strategy. In addition, the combination of graph matching and graph parsing processes is based on a common graph structure that also involves a graph indexing mechanism. Graph nodes are classified in equivalence classes depending on their local configuration. Graph matching indexes in such equivalence classes using the information of model graph nodes as local descriptors, and then global consistency is checked using the graph edge attributes. On the other hand, representatives of equivalence classes are used as tokens of a graph grammar that guides a parsing process to recognize repetitive structures.

Journal ArticleDOI
TL;DR: A feature reduction scheme that adaptively adjusts to the amount of labeled data available and can be used in conjunction with ECOC and the BHC, as well as other approaches such as round-robin classification that decompose a multiclass problem into a number of two (meta)-class problems.
Abstract: Classification of land cover based on hyperspectral data is very challenging because typically tens of classes with uneven priors are involved, the inputs are high dimensional, and there is often scarcity of labeled data. Several researchers have observed that it is often preferable to decompose a multiclass problem into multiple two-class problems, solve each such subproblem using a suitable binary classifier, and then combine the outputs of this collection of classifiers in a suitable manner to obtain the answer to the original multiclass problem. This approach is taken by the popular error correcting output codes (ECOC) technique, as well by the binary hierarchical classifier (BHC). Classical techniques for dealing with small sample sizes include regularization of covariance matrices and feature reduction. In this paper we address the twin problems of small sample sizes and multiclass settings by proposing a feature reduction scheme that adaptively adjusts to the amount of labeled data available. This scheme can be used in conjunction with ECOC and the BHC, as well as other approaches such as round-robin classification that decompose a multiclass problem into a number of two (meta)-class problems. In particular, we develop the best-basis binary hierarchical classifier (BB-BHC) and best basis ECOC (BB-ECOC) families of models that are adapted to "small sample size" situations. Currently, there are few studies that compare the efficacy of different approaches to multiclass problems in general settings as well as in the specific context of small sample sizes. Our experiments on two sets of remote sensing data show that both BB-BHC and BB-ECOC methods are superior to their nonadaptive versions when faced with limited data, with the BB-BHC showing a slight edge in terms of classification accuracy as well as interpretability.

Journal ArticleDOI
TL;DR: Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.
Abstract: An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.

Journal ArticleDOI
TL;DR: A model for the segmentation of cursive handwriting into strokes that has been derived in analogy with those proposed in the literature for early processing tasks in primate visual system is proposed, which suggests that the proposed measure of saliency has a direct relation with the dynamics of the handwriting.
Abstract: We propose a model for the segmentation of cursive handwriting into strokes that has been derived in analogy with those proposed in the literature for early processing tasks in primate visual system. The model allows reformulating the problem of selecting on the ink the points corresponding to perceptually relevant changes of curvature as a preattentive, purely bottom-up visual task, where the conspicuity of curvature changes is measured in terms of their saliency. The modeling of the segmentation as a saliency-driven visual task has lead to a segmentation algorithm whose architecture is biologically-plausible and that does not rely on any parameter other than those that can be directly obtained from the ink. Experimental results show that the performance is very stable and predictable, thus preventing those erratic behaviors of segmentation methods often reported in the literature. They also suggest that the proposed measure of saliency has a direct relation with the dynamics of the handwriting, so as it could be used to capture in a quantitative way some aspects of cursive handwriting intuitively related to the notion of style.

Journal ArticleDOI
TL;DR: The method implements a bias-variance control strategy in order to avoid overfitting in classification tasks on noisy data, based on a notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights.
Abstract: In this paper, we propose a regularization technique for AdaBoost. The method implements a bias-variance control strategy in order to avoid overfitting in classification tasks on noisy data. The method is based on a notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. The procedure consists in sorting the training data points by a hardness measure, and in progressively eliminating the hardest, stopping at an automatically selected threshold. Effectiveness of the method is tested and discussed on synthetic as well as real data.

Journal ArticleDOI
TL;DR: It is shown that significant improvement of the recognition performance is possible even when the original training set is large and the textlines are provided by a large number of different writers.
Abstract: A perturbation model for the generation of synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented The goal of synthetic textline generation is to improve the performance of an offline cursive handwriting recognition system by providing it with additional training data It can be expected that by adding synthetic training data the variability of the training set improves, which leads to a higher recognition rate On the other hand, synthetic training data may bias a recognizer towards unnatural handwriting styles, which could lead to a deterioration of the recognition rate In this paper the proposed perturbation model is evaluated under several experimental conditions, and it is shown that significant improvement of the recognition performance is possible even when the original training set is large and the textlines are provided by a large number of different writers

Journal ArticleDOI
TL;DR: Results show the benefit of adding spatio-temporal contextual information to the classification scheme, and suggest that the proposed approach represents an interesting alternative to the MRF-based approach, in particular, in terms of simplicity.
Abstract: A fuzzy-logic approach to the classification of multitemporal, multisensor remote-sensing images is proposed. The approach is based on a fuzzy fusion of three basic sources of information: spectral, spatial and temporal contextual information sources. It aims at improving the accuracy over that of single-time noncontextual classification. Single-time class posterior probabilities, which are used to represent spectral information, are estimated by Multilayer Perceptron neural networks trained for each single-time image, thus making the approach applicable to multisensor data. Both the spatial and temporal kinds of contextual information are derived from the single-time classification maps obtained by the neural networks. The expert's knowledge of possible transitions between classes at two different times is exploited to extract temporal contextual information. The three kinds of information are then fuzzified in order to apply a fuzzy reasoning rule for their fusion. Fuzzy reasoning is based on the "MAX" fuzzy operator and on information about class prior probabilities. Finally, the class with the largest fuzzy output value is selected for each pixel in order to provide the final classification map. Experimental results on a multitemporal data set consisting of two multisensor (Landsat TM and ERS-1 SAR) images are reported. The accuracy of the proposed fuzzy spatio-temporal contextual classifier is compared with those obtained by the Multilayer Perceptron neural networks and a reference classification approach based on Markov Random Fields (MRFs). Results show the benefit of adding spatio-temporal contextual information to the classification scheme, and suggest that the proposed approach represents an interesting alternative to the MRF-based approach, in particular, in terms of simplicity.

Journal ArticleDOI
TL;DR: This work has developed a hybrid coarse-to-fine algorithm for stereo feature matching, which is based on the 2D six-parameter affine transformation and local similarity evaluation, and results proving the performance of this algorithm are presented.
Abstract: Stereo vision-based bin picking systems require accurate 3D information to be recovered from 2D stereo images. To achieve this goal, we have developed a hybrid coarse-to-fine algorithm for stereo feature matching, which is based on the 2D six-parameter affine transformation and local similarity evaluation. With this algorithm, the coarse matching is performed by the 2D six-parameter affine transformation to get rough feature matches, imposing a strong constraint to further search instead of the traditional epipolar constraint. To obtain precise matches, the perspective effect is dealt with fine stereo feature matching by performing local similarity evaluation on the attribute vectors of features. Experimental results proving the performance of the stereo feature matching algorithm are also presented.

Journal ArticleDOI
TL;DR: A probabilistic model employing stroke features, stroke crossings and stroke densities is applied and is compared with a neural-network, showing that the method is generally better but less effective in distinguishing figures from other components.
Abstract: This paper describes a method for separating online handwritten patterns into Japanese text, figures and mathematical formulas. Today, Tablet PCs and electronic whiteboards provide much larger writing area for pen interfaces unlike PDAs (Personal Digital Assistants), through which users can easily input text, write mathematical formulas and draw figures on the screen. The fact that these objects can be written by a single pen (marker) without switching the device, mode, software or whatever else, and without any writing restrictions such as grids or boxes is one of the most important benefits of the pen interfaces. However, the task of segmenting these objects is challenging. To address this issue, we have applied a probabilistic model employing stroke features, stroke crossings and stroke densities. Further, we partially apply the approach of segmentation by recognition. Although the current recognizer for formulas is not a true recognizer, we have achieved about 81% correct segmentation for all the strokes when applied to our newly prepared database of mixed patterns. This method has been compared with a neural-network. The results show that our method is generally better but less effective in distinguishing figures from other components.

Journal ArticleDOI
TL;DR: A new kind of automated model selection algorithm for Gaussian mixture modeling via an entropy penalized maximum-likelihood estimation that can make model selection automatically during the parameter estimation, with the mixing proportions of the extra Gaussians attenuating to zero.
Abstract: Gaussian mixture modeling is a powerful approach for data analysis and the determination of the number of Gaussians, or clusters, is actually the problem of Gaussian mixture model selection which has been investigated from several respects. This paper proposes a new kind of automated model selection algorithm for Gaussian mixture modeling via an entropy penalized maximum-likelihood estimation. It is demonstrated by the experiments that the proposed algorithm can make model selection automatically during the parameter estimation, with the mixing proportions of the extra Gaussians attenuating to zero. As compared with the BYY automated model selection algorithms, it converges more stably and accurately as the number of samples becomes large.

Journal ArticleDOI
TL;DR: Results from experimentation show how a combination of sequence and performance features is able to generalize across a wide variety of input samples and obtain a diagnostic classification which can be used alongside other forms of conventional assessment.
Abstract: The reported work aims to objectively and accurately assess the post-stroke clinical condition of visuo-spatial neglect using a series of standardized geometric shape drawing tasks. We present a method implementing existing pencil-and-paper diagnostic methods and define a set of static and dynamic features that can be extracted from drawing responses captured online using a graphics tablet. We also present a method for automatically assessing the constructional sequence of the drawing using Hidden Markov Models. The method enables the automated extraction, position identification and drawing order of individual sides of a shape within a drawing. Discrimination between two populations (a neglect population and stroke subjects without neglect as determined by existing standard assessment methods) using a combination of performance features and constructional sequence is examined across three separate drawing tasks. Results from experimentation show how a combination of sequence and performance features is able to generalize across a wide variety of input samples and obtain a diagnostic classification which can be used alongside other forms of conventional assessment. Furthermore, the application of a multi-classifier combination strategy leads to a significant increase in recognition ability.

Journal ArticleDOI
TL;DR: This paper adopts a technique of Gaussian shadow modeling to remove all unwanted shadows and shows that the proposed method is much more robust and powerful than other traditional methods.
Abstract: This paper presents a novel approach to track multiple moving objects using the level-set method. The proposed method can track different objects no matter if they are rigid, nonrigid, merged, split, with shadows, or without shadows. At the first stage, the paper proposes an edge-based camera compensation technique for dealing with the problem of object tracking when the background is not static. Then, after camera compensation, different moving pixels can be easily extracted through a subtraction technique. Thus, a speed function with three ingredients, i.e. pixel motions, object variances and background variances, can be accordingly defined for guiding the process of object boundary detection. According to the defined speed function, different object boundaries can be efficiently detected and tracked by a curve evolution technique, i.e. the level-set-based method. Once desired objects have been extracted, in order to further understand the video content, this paper takes advantage of a relation table to identify and observe different behaviors of tracked objects. However, the above analysis sometimes fails due to the existence of shadows. To avoid this problem, this paper adopts a technique of Gaussian shadow modeling to remove all unwanted shadows. Experimental results show that the proposed method is much more robust and powerful than other traditional methods.

Journal ArticleDOI
TL;DR: New ways of discovering groups of pupils sharing the same writing strategies during their primary education and methods for the temporal modeling of these pupils' writing strategies are proposed.
Abstract: The aim of this paper is to assess the evolution in writing performance amongst typical pupils in primary education. More precisely, we propose ways of discovering groups of pupils sharing the same writing strategies during their primary education and methods for the temporal modeling of these pupils' writing strategies. For this purpose, online acquisition of writing and drawing tests have been performed three times during a period of one year for the same pupils under the same experimental conditions. A first approach, based on clustering, is applied to highlight clusters on a set of dynamic primitives chosen by an expert in the field of child development psychology. Results are presented by means of a comparative study between features of each group and writing tests. An analysis of within and between-strategies migration of pupils over time is also conducted to highlight pupils who change (or fail to change) their writing strategies during this period of one year. A second approach is used to model the problem by means of a probabilistic graphical model, i.e. a bayesian network. Expert knowledge partially determines the bayesian network structure, in which the writing strategy is represented by a hidden variable whose cardinality is estimated by the results of the clustering approach. By considering that each writing test is represented by its own (local) strategy and that there exists a global strategy which deals with each local strategy, we propose a Global Hierarchical Model. The results of our hierarchical model structured using real data highlight, among others, two global strategies that correspond to normo-writer pupils and more advanced normo-writers. A longitudinal and temporal study of the evolution of the pupils in these strategies shows that these two strategies are consistent.