scispace - formally typeset
Search or ask a question

Showing papers by "Ching Y. Suen published in 2012"


Journal ArticleDOI
TL;DR: A hybrid model of integrating the synergy of two superior classifiers: Convolutional Neural Network (CNN) and Support Vector Machine (SVM) which have proven results in recognizing different types of patterns is presented.

585 citations


Journal ArticleDOI
TL;DR: The proposed LoGID framework to adapt hidden Markov model-based pattern recognition systems during both the generalization and learning phases is evaluated and it is shown that it can effectively improve the performance of systems created with small training sets as more data are observed over time.

48 citations


Book
07 Jan 2012
TL;DR: In this article, the authors present an overview of basic algorithms for progessing speech signals and an architecture for isolated and connected word recognition in real-time speech recognition systems, as well as knowledge-based and expert systems in automatic speech recognition.
Abstract: I. Review of Basic Algorithms.- An Overview of Digital Techniques for Progessing Speech Signals.- Systems for Isolated and Connected Word Recognition.- II. System Architecture and Vlsi for Speech Processing.- Systolic Architectures for Connected Speech Recognition.- Computer Systems for High-Performance Speech Recognition.- VLSI Architectures for Recognition of Context-Free Languages.- Implementation of an Aeoustical Front-End for Speech Recognition.- Reconfigurable Modular Architecture for a Man-Machine Vocal Communication System in Real Time.- A Surrey of Algorithms & Architecture for Connected Speech Recognition.- III. Software Systems for Automatic Speech Recognition.- Knowledge-Based and Expert Systems in Automatic Speech Recognition.- The Speech Understanding and Dialog System EVAR.- A New Rule-Based Expert System for Speech Recognition.- SAY - A PC Based Speech Analysis System.- Automatic Generation of Linguistic, Phonetic and Acoustic Knowledge for a Diphone-Based Continuous Speech Recognition System.- The Use of Dynamic Frequency Warping in a Speaker-Independent Vowel Classifier.- Dynamic Time Warping Algorithms for Isolated and Connected Word Recognition.- An Efficient Algorithm for Recognizing Isolated Turkish Words.- A General Fuzzy-Parsing Scheme for Speech Recognition.- IV Speech Synthesis and Phonetics.- Linguistics and Automatic Processing of Speech.- Synthesis of Speech by Computers and Chips.- Prosodic Knowledge in the Rule-Based Synthex Expert System for Speech Synthesis.- Syntex - Unrestricted Conversion of Text to Speech for German.- Concatenation Rules for Demisyllable Speech Synthesis.- On the Use of Phonetic Knowledge for Automatic Speech Recognition.- Demisyllables as Processing Units for Automatic Speech Recognition and Lexical Access.- Detection and Recognition of Nasal Consonants in Continuous Speech - Preliminary Results.- Author Index.

46 citations



Proceedings ArticleDOI
Muna Khayyat1, Louisa Lam1, Ching Y. Suen1, Fei Yin, Cheng-Lin Liu 
27 Mar 2012
TL;DR: This paper uses morphological dilation with a dynamic adaptive mask for line extraction using the CENPARMI Arabic handwritten documents database which contains multi-skewed and touching lines to demonstrate the effectiveness of this approach.
Abstract: This paper presents a robust method for handwritten text line extraction. We use morphological dilation with a dynamic adaptive mask for line extraction. Line separation occurs because of the repulsion and attraction between connected components. The characteristics of the Arabic script are considered to ensure a high performance of the algorithm. Our method is evaluated on the CENPARMI Arabic handwritten documents database which contains multi-skewed and touching lines. With a matching score of 0.95, our method achieved precision and recall rates of 96:3% and 96:7% respectively, which demonstrate the effectiveness of our approach.

30 citations


Journal ArticleDOI
TL;DR: This work proposes a distance-based local binary pattern (DLBP) descriptor, a part-based pedestrian representation, and a novel CI_DLBP descriptor, which unifies the color intensity and DLBP by learning the joint distributions of the DLBP and color intensity at each channel.
Abstract: Matching pedestrians across disjoint camera views is a challenging task, since their observations are separated in time and space and their appearances may vary considerably. Recently, some approaches of matching pedestrians have been proposed. However, these approaches either used too complex representations or only considered the color information and discarded the spatial structural information of the pedestrian. In order to describe the spatial structural information in color space, we propose a distance-based local binary pattern (DLBP) descriptor. Besides the spatial structural information, the color itself namely its intensity value is also an important feature in matching pedestrians across disjoint camera views. In order to effectively combine these two kinds of information, we further propose a novel CI_DLBP descriptor, which unifies the color intensity and DLBP by learning the joint distributions (2-D histograms) of the DLBP and color intensity at each channel. In addition, different from the previous approaches in which the pedestrians matching is based on their whole bodies, we develop a part-based pedestrian representation because the color density and spatial structural information between the upper outer garment and the lower garment worn by the pedestrian is usually different. Experimental results on challenging realistic scenarios and VIPeR dataset validate the proposed DLBP operator, the CI_DLBP descriptor, and the part-based pedestrian representation for pedestrian matching across disjoint camera views. Compared with existing methods based on color information, this new CI_DLBP approach performs better.

18 citations


Proceedings ArticleDOI
18 Sep 2012
TL;DR: For the first time in Arabic word spotting, language models are incorporated into the process of reconstructing words from PAWs, and a hierarchical classifier is implemented.
Abstract: With the ever-increasing amounts of published materials being made available, developing efficient means of locating target items has become a subject of significant interest. Among the approaches adopted for this purpose is word spotting, which enables the identification of documents through the use of pertinent keywords. This paper reports on an effective method of word spotting for Arabic handwritten documents that takes into consideration the nature of Arabic handwriting. Parts of Arabic Words (PAWs) form the basic components of this search process, and a hierarchical classifier (consisting of a set of classifiers each trained on a different part of the input pattern) is implemented. For the first time in Arabic word spotting, language models are incorporated into the process of reconstructing words from PAWs. Details of the method and promising experimental results are also presented.

16 citations


Journal ArticleDOI
TL;DR: A new iris segmentation scheme using game theory to elicit iris/pupil boundaries from a nonideal iris image is described, which is robust to noise and poor localization, and less affected by weakiris/sclera boundaries.
Abstract: Robust segmentation of an iris image plays an important role in iris recognition. However, the nonlinear deformations, pupil dilations, head rotations, motion blurs, reflections, nonuniform intensities, low image contrast, camera angles and diffusions, and presence of eyelids and eyelashes often hamper the conventional iris/pupil localization methods, which utilize the region-based or the gradient-based boundary-finding information. The novelty of this research effort is that we describe a new iris segmentation scheme using game theory to elicit iris/pupil boundaries from a nonideal iris image. We apply a parallel game-theoretic decision making procedure by modifying Chakraborty and Duncan’s algorithm, which integrates (1) the region-based segmentation and gradient-based boundary-finding methods and (2) fuses the complementary strengths of each of these individual methods. This integrated scheme forms a unified approach, which is robust to noise and poor localization, and less affected by weak iris/sclera boundaries. The verification and identification performance of the proposed method are validated using the ICE 2005, the UBIRIS Version 1, WVU Nonideal, and the CASIA Version 3 data sets.

14 citations


Journal ArticleDOI
TL;DR: Fuzzy inference systems are proposed to be used in the initialization step of the optimization process of the noise removal and recognition problem, which can be solved by expectation maximization given that the recognition engine is trained for clean images.

12 citations


Proceedings ArticleDOI
18 Sep 2012
TL;DR: The main idea behind the proposed approach is to learn the geometrical distribution of words within a sentence using a Markov chain or a Hidden Markov Model (HMM).
Abstract: We present a statistical hypothesis testing method for handwritten word segmentation algorithms. Our proposed method can be used along with any word segmentation algorithm in order to detect over-segmented or under-segmented errors or to adapt the word segmentation algorithm to new data in an unsupervised manner. The main idea behind the proposed approach is to learn the geometrical distribution of words within a sentence using a Markov chain or a Hidden Markov Model (HMM). In the former, we assume all the necessary information is observable, where in the latter, we assume the minimum observable variables are the bounding boxes of the words, and the hidden variables are the part of speech information. Our experimental results on a benchmark database show that not only we can achieve a lower over-segmentation and under-segmentation error rate, but also a higher correct segmentation rate as a result of the proposed hypothesis testing.

7 citations


Proceedings ArticleDOI
18 Sep 2012
TL;DR: A novel approach based on Optimum Paths, derived from the degree information and continuation property, is introduced to solve the segmentation ambiguities at intersection points of Off-line Chinese characters recognition.
Abstract: In recognition of Off-line handwritten characters and signatures, stroke extraction is often a crucial step. Given the large number of Chinese handwritten characters, pattern matching based on structural decomposition and analysis is useful and essential to Off-line Chinese recognition to reduce ambiguity. Two challenging problems for stroke extraction are: 1) how to extract primary strokes and 2) how to solve the segmentation ambiguities at intersection points. In this paper, we introduce a novel approach based on Optimum Paths(AOP) to solve this problem. Optimum Paths(AOP) are derived from the degree information and continuation property, we use them to tackle these two problems. Compared with other methods, the proposed approach has extracted strokes from Off-line Chinese handwritten characters with better performance.

Proceedings Article
01 Nov 2012
TL;DR: With properly defined potential functions in the joint probability represented by the graphical model, the disparity in tree representations caused by different image capturing conditions can be tolerated as demonstrated in the encouraging experimental results.
Abstract: A document image matching approach making use of probabilistic graphical models is proposed. The document image is first represented by a tree with the nodes in the tree corresponding to the regions in the image and the edges indicating the parent-child relationships between them, transforming the problem to tree matching. A graphical model, i.e. pairwise Markov Random Field is defined on the tree, in which sense the nodes are considered as random variables and the edges encode the relations among these variables in the probability domain. The tree matching problem is then formulated as Maximum a Posterior (MAP) inference over the graphical model and solved by belief propagation. Since the underlying graphical model is tree-structured, the exact inference can be obtained. With properly defined potential functions in the joint probability represented by the graphical model, the disparity in tree representations caused by different image capturing conditions can be tolerated as demonstrated in the encouraging experimental results.

Proceedings Article
01 Nov 2012
TL;DR: The proposed CSMA method achieves both fastest running time and highest accuracy in the face recognition problem compared to MPCA and some other multifactor based methods on two challenging databases, i.e. CMU-MPIE and Extended YALE-B.
Abstract: This paper proposes a novel approach named Compressed Submanifold Multifactor Analysis (CSMA) to concisely and precisely deal with multifactor analysis. Compared to the state-of-the-art MPCA method that loses the original local geometry structures of input factors due to the averaging process, our proposed approach can preserve their original geometry. In addition, the fast low-rank approximation of a given dataset with multifactors is also provided using Random Projection to reduce space requirements and give more transparent representation. Our proposed method achieves both fastest running time and highest accuracy in the face recognition problem compared to MPCA and some other multifactor based methods on two challenging databases, i.e. CMU-MPIE and Extended YALE-B.

Book ChapterDOI
01 Jan 2012
TL;DR: A Hidden Markov Model based recognizer for Farsi handwritten word recognition systems is developed and first evaluation of the performance of this recognizer shows promising results.
Abstract: One of the most important script groups, which is based on Arabic alphabet, is the Persian/Farsi script This script is the basis of different languages used in Middle East and Central Asian regions For the development of Farsi handwritten word recognition systems, the CENPARMI group designed and collected a database Based on statistical features, a Hidden Markov Model based recognizer is developed First evaluation of the performance of this recognizer shows promising results

Proceedings ArticleDOI
02 Jul 2012
TL;DR: A novel super-resolution approach based on the framework of wavelet transform that reconstructs the more reliable image without obvious visual artifacts.
Abstract: A novel super-resolution approach is presented. An image pyramid has been built based on the framework of wavelet transform, and the detailed coefficients are explored for training the neural networks. The initial high resolution image is estimated by the trained networks and the inverse wavelet transform, and then is constrained with prior knowledge of the error function by iteration. For a factor of 2n, repeat this process and update the networks. The experimental results show that our method reconstructs the more reliable image without obvious visual artifacts.