scispace - formally typeset
Search or ask a question

Showing papers by "Ching Y. Suen published in 2020"


Journal ArticleDOI
28 May 2020
TL;DR: A comprehensive review of research toward robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models.
Abstract: The accuracies for many pattern recognition tasks have increased rapidly year by year, achieving or even outperforming human performance. From the perspective of accuracy, pattern recognition seems to be a nearly solved problem. However, once launched in real applications, the high-accuracy pattern recognition systems may become unstable and unreliable due to the lack of robustness in open and changing environments. In this article, we present a comprehensive review of research toward robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models. Actually, our brain is robust at learning concepts continually and incrementally, in complex, open, and changing environments, with different contexts, modalities, and tasks, by showing only a few examples, under weak or noisy supervision. These are the major differences between human intelligence and machine intelligence, which are closely related to the above three assumptions. After witnessing the significant progress in accuracy improvement nowadays, this review paper will enable us to analyze the shortcomings and limitations of current methods and identify future research directions for robust pattern recognition.

81 citations


Journal ArticleDOI
TL;DR: A method for vehicle License Plates Detection (LPD) and Character Recognition (CR) as a unified application that presents significant accuracy and real-time performance is proposed.
Abstract: The process of detecting vehicles’ license plates, along with recognizing the characters inside them, has always been a challenging issue due to various conditions. These conditions include different weather and illumination, inevitable data acquisition noises, and some other challenging scenarios like the demand for real-time performance in state-of-the-art Intelligent Transportation Systems (ITS) applications. This paper proposes a method for vehicle License Plates Detection (LPD) and Character Recognition (CR) as a unified application that presents significant accuracy and real-time performance. The mentioned system is designed for Iranian vehicle license plates, which have the characteristics of different resolution and layouts, scarce digits/characters, various background colors, and different font sizes. In this regard, the system uses a separate fine-tuned You Only Look Once (YOLO) version 3 platform for each of the mentioned phases and extracts Persian characters from input images in two automatic steps. For training and testing stages, a wide range of vehicle images in different challenging and straightforward conditions have been collected from practical systems installed as surveillance applications. Experimental results show an end-to-end accuracy of 95.05% on 5719 images. The test data included both color and grayscale images containing the vehicles with different distances and shooting angles with various brightness and resolution. Additionally, the system can perform the LPD and CR tasks in an average of 119.73 milliseconds for real life data, which illustrates a real-time performance for the system and usable applicability. The system is fully automated, and no pre-processing, calibration or configuration procedures are needed.

41 citations


Journal ArticleDOI
TL;DR: To improve the classification ability for multi-class problems, a generalized model is proposed to extract multiple discriminative signals and an algorithm is also presented to compute the multiple discrim inative signals simultaneously.
Abstract: Classification finds wide applications in artificial intelligence and expert systems. Feature extraction is a key step for classifier learning. However, the relation among samples is usually ignored in classical feature extraction models. Recently, feature extraction based on graph signal processing that makes use of the relation among samples has attracted great attention. It is a common assumption that the classification information is smooth and of low frequency in these studies. We point out that it is the discrimination ability that essentially makes a good classification feature instead of smoothness. This new perspective prompts us to introduce the concept of discriminative graph signal, and then, based on this concept, we propose a novel feature extraction model for supervised classification. To improve the classification ability for multi-class problems, a generalized model is proposed to extract multiple discriminative signals and an algorithm is also presented to compute the multiple discriminative signals simultaneously. On five publicly available UCI datasets, our proposed method outperforms the existing methods in terms of performance. Finally some drawbacks are discussed and future research directions are also provided.

12 citations


Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, the authors used a pre-trained knowledge from two convolution neural networks (CNN): GoogleNet and ResNet, then they applied it on their data-set.
Abstract: Offline gender detection from Arabic handwritten documents is a very challenging task because of the high similarity between an individual’s writings and the complexity of the Arabic language as well. In this paper, we propose a new way to detect the writer gender from scanned handwritten documents that mainly based on the concept of transfer-learning. We used a pre-trained knowledge from two convolution neural networks (CNN): GoogleNet, and ResNet, then we applied it on our data-set. We use this two CNN architectures as fixed feature extractors. For the analysis and the classification stage, we used a support vector machine (SVM). The performance of the two CNN architectures concerning accuracy is 80.05% for GoogleNet, 83.32% for ResNet.

11 citations


Journal ArticleDOI
TL;DR: The results proved that deep models could potentially change the design structure of the Computer Aided Design systems while excluding the rigorous task of development and selection of problem-oriented features.
Abstract: This research proposes a pre-trained mobile application for medical image diagnosis; it examined the benefit of deep learning approaches for white blood cell and chest radiography analysis. The fea...

8 citations


Book ChapterDOI
19 Oct 2020
TL;DR: A robust approach for license-plate detection based on YOLO v.3 is proposed which takes advantage of high detection accuracy and real-time performance and can detect the license-plates location of vehicles as a general representation of vehicle presence in images.
Abstract: In vision-driven Intelligent Transportation Systems (ITS) where cameras play a vital role, accurate detection and re-identification of vehicles are fundamental demands. Hence, recent approaches have employed a wide range of algorithms to provide the best possible accuracy. These methods commonly generate a vehicle detection model based on its visual appearance features such as license-plate, headlights or some other distinguishable specifications. Among different object detection approaches, Deep Neural Networks (DNNs) have the advantage of magnificent detection accuracy in case a huge amount of training data is provided. In this paper, a robust approach for license-plate detection based on YOLO v.3 is proposed which takes advantage of high detection accuracy and real-time performance. The mentioned approach can detect the license-plate location of vehicles as a general representation of vehicle presence in images. To train the model, a dataset of vehicle images with Iranian license-plates has been generated by the authors and augmented to provide a wider range of data for test and train purposes. It should be mentioned that the proposed method can detect the license-plate area as an indicator of vehicle presence with no Optical Character Recognition (OCR) algorithm to distinguish characters inside the license-plate. Experimental results have shown the high performance of the system with precision 0.979 and recall 0.972.

8 citations


Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, a multi-task learning scheme is employed by the model to learn optimal shared features for these correlated tasks in an end-to-end manner. And a deep residual network originally trained on massive face datasets is utilized which is capable of learning high-level and robust features from face images.
Abstract: The objective of facial beauty prediction, which is a significant yet challenging problem in the domains of computer vision and machine learning, is to develop a human-like model that automatically evaluates facial attractiveness. Using deep learning methods to enhance facial beauty prediction is a promising and important area. This study provides a new framework for simultaneous facial attractiveness assessment, gender recognition as well as ethnicity identification using deep Convolutional Neural Networks (CNNs). Specifically, a deep residual network originally trained on massive face datasets is utilized which is capable of learning high-level and robust features from face images. Furthermore, a multi-task learning algorithm that operates on the effective features, exploits the synergy among the tasks. Said differently, a multi-task learning scheme is employed by our model to learn optimal shared features for these correlated tasks in an end-to-end manner. Interestingly, prediction correlation of 0.94 is achieved by our method for the SCUT-FBP5500 benchmark dataset (spanning 5500 facial images), which would certainly support the efficacy of our proposed model. This would also indicate significant improvement in accuracy over the other state-of-the-art methods.

7 citations


Journal ArticleDOI
TL;DR: This work proposes a methodology that utilizes Locality Sensitive Hashing (LSH) to create a novel balanced dataset of 2500 synthetic blood smears, which was automatically annotated during the generation phase and will be made public for research purposes.
Abstract: Peripheral Blood Smear (PBS) analysis is a vital routine test carried out by hematologists to assess some aspects of humans' health status. PBS analysis is prone to human errors and utilizing computer-based analysis can greatly enhance this process in terms of accuracy and cost. Recent approaches in learning algorithms, such as deep learning, are data hungry, but due to the scarcity of labeled medical images, researchers had to find viable alternative solutions to increase the size of available datasets. Synthetic datasets provide a promising solution to data scarcity, however, the complexity of blood smears' natural structure adds an extra layer of challenge to its synthesizing process. In this work, we propose a methodology that utilizes Locality Sensitive Hashing (LSH) to create a novel balanced dataset of 2500 synthetic blood smears. This dataset, which was automatically annotated during the generation phase, will be made public for research purposes and covers 17 essential categories of blood cells. We proved the effectiveness of the proposed dataset by utilizing it for training a deep neural network, this model got a very high accuracy score of 98.72% when tested with the well known ALL-IDB dataset. The dataset also got the approval of 5 experienced hematologists to meet the general standards of making thin blood smears.

6 citations


Journal ArticleDOI
TL;DR: An intelligent system to automatically detect fake coins based on their images is presented and a new spatially enhanced bag-of-visual-words model, called SEBOVW model, is proposed, which is trained to discriminate between genuine and fake coins.
Abstract: Fake coins are harmful for society, the detection of which is of paramount importance. Due to the large quantities of fake coins in the real world, it is impossible to examine them manually. To address this issue, we present an intelligent system to automatically detect fake coins based on their images. The intelligent system consists of two components: coin image representation and classifier learning. To represent the coin image, a new spatially enhanced bag-of-visual-words model, called SEBOVW model, is proposed. Afterwards, we improve the representation by building a genuine difference subspace. The coin is finally represented based on its projection onto this subspace. In order to discriminate between genuine and fake coins, we train a classifier using the subspace representations. A thorough evaluation of the proposed intelligent system has been conducted on four coin datasets, consisting of thousands of coins of different denominations and from two countries. Promising experimental results in excess of 98 % accuracy demonstrate its effectiveness and validity.

3 citations


Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, the motivation of their work is to help protect our children from this potentially hostile environment, without excluding them from utilizing its benefits, by providing a powerful platform for individuals to communicate globally, but it can also be used by malevolent individuals.
Abstract: Social media provides a powerful platform for individuals to communicate globally. This capability has many benefits, but it can also be used by malevolent individuals, i.e. predators. Anonymity exacerbates the problem. The motivation of our work is to help protect our children from this potentially hostile environment, without excluding them from utilizing its benefits.

2 citations


Book ChapterDOI
19 Oct 2020
TL;DR: An end-to-end Deep Convolutional Neural Network system for license plate recognition that is not limited to a specific region or country, and applies a modified version of YOLO v2 to first recognize the vehicle and then locate the license plate.
Abstract: The current advancements in machine intelligence have expedited the process of recognizing vehicles and other objects on the roads. Several methods including Deep Learning techniques have been proposed recently for LPR, yet those methods are limited to specific regions or privately collected datasets. In this paper, we propose an end-to-end Deep Convolutional Neural Network system for license plate recognition that is not limited to a specific region or country. We apply a modified version of YOLO v2 to first recognize the vehicle and then locate the license plate. Moreover, through the convolutional procedures, we improve an Optical Character Recognition network (OCR-Net) to recognize the license plate numbers and letters. Our method performs well for different vehicle types. Our system overcomes tilted and distorted license plate images and performs adequately under various illumination conditions, and noisy backgrounds. Our experimental results on 4,837 images of stationary and moving vehicles (cars, buses, motorbikes, and trucks) from different countries show that our proposed system achieved recognition rates between 88.5% and 98.04%, outperforming the state-of-the-art commercial and academic methods for challenging images.

Book ChapterDOI
19 Oct 2020
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution that learns an end-to-end mapping between the low and high-resolution images, using VGG19 network for feature extraction, setting discriminator network's working space as feature space, and adding the loss function based on the mean square error of pixel space.
Abstract: In medical imaging, high-resolution images are expected to have the ability to deliver a more precise diagnosis with the practical application of high-resolution displays. This research proposes a deep learning method for single image super-resolution that learns an end-to-end mapping between the low and high-resolution images. It redesigns the SRGAN, using VGG19 network for feature extraction, setting discriminator network’s working space as feature space, and adding the loss function based on the mean square error of pixel space, gaining more details by incorporating SRCNN layers to increase the PSNR in the reconstruction at the same time. To thoroughly investigate the system, we compared the performance with other architectures on MNIST and CIFAR-10 dataset with a further evaluation conducted on Chest x-ray.

Book ChapterDOI
19 Oct 2020
TL;DR: Wartegg Drawing Completion Test (WDCT) is one of the most commonly used personality analysis tools of graphology, which is the mapping from the inside world to personal qualities as discussed by the authors.
Abstract: Wartegg Drawing Completion Test (WDCT) is one of the most commonly used personality analysis tools of graphology, which is the mapping from the inside world to personal qualities. It helps institutes or individuals to have a better knowledge of intrinsic personality characters. However, the WDCT evaluation of the applicants was manually performed by human experts, thus the accessibility and outcome of WDCT are heavily restricted by the availability and experience of the expert. To overcome such issues, this paper proposes the computer-aided WDCT (CA-WDCT) system, a fully-automatic WDCT system based on Digital Image Processing (DIP) and Machine Learning techniques. The CA-WDCT system extracts multimodal features and analyzes them under the Big-Five traits automatically. This CA-WDCT system can mitigate the heavy manual labour of psychologists and provide clients with flexible access.

Posted Content
TL;DR: The authors presented a comprehensive review of research towards robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models.
Abstract: The accuracies for many pattern recognition tasks have increased rapidly year by year, achieving or even outperforming human performance. From the perspective of accuracy, pattern recognition seems to be a nearly-solved problem. However, once launched in real applications, the high-accuracy pattern recognition systems may become unstable and unreliable, due to the lack of robustness in open and changing environments. In this paper, we present a comprehensive review of research towards robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models. Actually, our brain is robust at learning concepts continually and incrementally, in complex, open and changing environments, with different contexts, modalities and tasks, by showing only a few examples, under weak or noisy supervision. These are the major differences between human intelligence and machine intelligence, which are closely related to the above three assumptions. After witnessing the significant progress in accuracy improvement nowadays, this review paper will enable us to analyze the shortcomings and limitations of current methods and identify future research directions for robust pattern recognition.

Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, Deep Learning neural networks have shown impressive performance in this context and the recent contributions are summarized along with the main challenges and future directions in the context of peripheral blood smear analysis.
Abstract: Peripheral Blood Smear (PBS) analysis is a routine test carried out in specialized medical laboratories by specialists to assess some aspects of health status that are measured and assessed through blood. PBS analysis is prone to human errors and the usage of computer-based analysis can greatly enhance this process in terms of accuracy and cost. Despite the challenges, Deep Learning neural networks have shown impressive performance in this context. In this study the recent contributions are summarized along with the main challenges and future directions in this context.

Book ChapterDOI
05 Jan 2020
TL;DR: The proposed method utilizes CNN object detectors to propose coarse guide panels, then use heuristics to propose panel candidates and finally optimize an energy function to select the most plausible candidates to ensure roughly localized detection of almost all kinds of panels.
Abstract: Panels are the fundamental elements of manga pages, and hence their detection serves as the basis of high-level manga content understanding. Existing panel detection methods could be categorized into heuristic-based methods and CNN-based (Convolutional Neural Network-based) ones. Although the former can accurately localize panels, they cannot handle well elaborate panels and require considerable effort to hand-craft rules for every new hard case. In contrast, detection results of CNN-based methods could be rough and inaccurate. We utilize CNN object detectors to propose coarse guide panels, then use heuristics to propose panel candidates and finally optimize an energy function to select the most plausible candidates. CNN assures roughly localized detection of almost all kinds of panels, while the follow-up procedure refines the detection results and minimizes the margin between detected panels and ground-truth with the help of heuristics and energy minimization. Experimental results show the proposed method surpasses previous methods regarding panel detection F1-score and page accuracy.

Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, the authors used the YOLO (You Only Look Once) model to detect the hand-drawn square boxes from the Wartegg Zeichen Test (WZT) form.
Abstract: The Wartegg Zeichen Test (WZT) is a method of personality evaluation developed by the psychologist Ehrig Wartegg. Three new scoring categories for the WZT consist of Evocative Character, Form Quality, and Affective Quality. In this paper, we present the object detection model in scoring the Affective Quality of WZT. Our works consist of two main parts: 1) using the YOLO (You Only Look Once) model to detect the hand-drawn square boxes from the WZT form, and 2) using YOLO to detect the object in the hand-drawn square box. In the experiments, YOLOv3 achieved 88.94% of mAP for hand-drawn square box detection and 46.90% of mAP for hand-drawn object detection.

Journal ArticleDOI
TL;DR: The proposed method achieved precision and recall rates as high as 99.6% and 99.3% respectively, demonstrating the effectiveness and robustness of the selected edge features in authenticating coins.
Abstract: The number of counterfeit coins released into circulation is persistently increasing. According to official reports, the mass majority of these coins are circulated in the European Union member countries. This paper presents a robust method for counterfeit coin detection based on coin stamp differences between genuine and counterfeit coins. A set of measures based on edge differences are proposed in this paper. The proposed method compares the edge width, edge thickness, number of horizontal and vertical edges, and total number of edges between a test coin and a set of genuine reference coins. The method extends the measures to generate a defect map by subtracting the test coin image from the reference coins to count the number of pixels in small regions of the coin. Additionally, the Signal-to-Noise Ratio (SNR), Mean Square Error (MSE), and Structural Similarity (SSIM) which are well-known measures to track the differences between two images are also applied to the coin image. The sets of features are then placed into index space where each vector represents the features of one test coin and a reference coin. The final feature vector represents the features set of one test coin and is computed by averaging the feature value of vectors in the index space. This feature vector is used to train a classifier to learn the edge feature differences between the two classes. The proposed method achieved precision and recall rates as high as 99.6% and 99.3% respectively, demonstrating the effectiveness and robustness of the selected edge features in authenticating coins. The method was evaluated on a real-life dataset of Danish coins as part of a collaborative effort.

Book ChapterDOI
19 Oct 2020
TL;DR: Zhang et al. as discussed by the authors proposed a multi-modal based on image and text information for commodity classification algorithm (MMIT), which utilized a dataset to train MMIT model, and then employed trained MMIT classifier to classify different commodities.
Abstract: Considering that there exists image and text information almost on every commodity web page, although these two kinds of information belong to different modals, both of them describe the same commodity, so there must be a certain relationship between them. We name this relationship “symbiosis and complementary”, and propose a multi-modal based on image and text information for commodity classification algorithm (MMIT). Firstly, we use \(\ell _{2,0}\) mixed norm to optimize sparse representation method for image classification, and then employ Bayesian posterior probability to optimize k-nearest neighbor method for text classification. Secondly, we fuse two modal classification results, and build MMIT mathematical model. Finally, we utilize a dataset to train MMIT model, and then employ trained MMIT classifier to classify different commodities. Experimental results show that our method can achieve better classification performance than other state-of-the-art methods, which only exploit image information.

Book ChapterDOI
19 Oct 2020
TL;DR: In this article, the authors used Convolutional Neural Networks (CNNs) for handwritten digit recognition in three different color modalities of handwritten digits, including black and white, black-and-white, and color modality.
Abstract: Most of the methods on handwritten digit recognition in the literature are focused and evaluated on black and white image databases. In this paper we try to answer a fundamental question in document recognition. Using Convolutional Neural Networks (CNNs), we investigate to see whether color modalities of handwritten digits affect their recognition rate? To the best of our knowledge, so far this question has not been answered due to the lack of handwritten digit databases that have all three color modalities of handwritten digits. To answer this question, we select 13,330 isolated digits from novel Persian handwritten database, which have three different color modalities and are unique in term of size and variety. Our selected dataset are divided into training, validation, and testing sets. Afterward, similar conventional CNN models are trained with the samples of our training set. While the experimental results on the testing set show that CNN on the black and white digit images has a better performance compared to the other two color modalities, in general there are no significant differences for network accuracy in different color modality. Also, comparisons of training times in three color modalities show that recognition of handwritten digits in black and white using CNN is much more efficient.

Book ChapterDOI
19 Oct 2020
TL;DR: In this article, a Hybrid Multiple Classifier System, combining three classifiers with rejection options: XGBoost, Random Forest, and SVM, was designed and applied on production.
Abstract: In an insurance company, manual underwriting is costly, time consuming, and complex. Simulating underwriters with AI is an absolutely time saving and cost fitting solution. As a result, a Hybrid Multiple Classifier System, combining three classifiers with rejection options: XGBoost, Random Forest, and SVM, was designed and applied on production. An optimal rejection criterion on classification, so-called Linear Discriminant Analysis Measurement (LDAM), is applied the first time in industry. This system is the first AI driven underwriting system in Canadian life insurance, and it helps Manulife expand digital capabilities, reorient customer experience focus and grow its business.

Book ChapterDOI
19 Oct 2020
TL;DR: In this article, the authors focus on using the subscriber location field in the profile of each candidate to estimate support in each state and find a correlation between popularity on Twitter, and eventual popular vote in the election.
Abstract: Increasingly, politicians and political parties are engaging their electors using social media. In the US Federal Election of 2016, candidates from both parties made heavy use of social media, particularly Twitter. It is then reasonable to attempt to find a correlation between popularity on Twitter, and eventual popular vote in the election. In this paper, we will focus on using the subscriber ‘location’ field in the profile of each candidate to estimate support in each state.

Book ChapterDOI
19 Oct 2020
TL;DR: In this paper, a classifier ensemble using nonlinear manifolds is presented, where Grassmann manifolds as some particular case of Riemannian manifolds are constructed using decision profiles.
Abstract: In this paper, we briefly present classifier ensembles making use of nonlinear manifolds. Riemannian manifolds have been created using classifier interactions which are presented as symmetric and positive-definite (SPD) matrices. Grassmann manifolds as some particular case of Riemannian manifolds are constructed using decision profiles. Experimental routine shows advantages of Riemannian geometry and nonlinear manifolds for classifier ensemble learning.

Book ChapterDOI
19 Oct 2020
TL;DR: The authors used pre-trained BERT models for toxic language detection and achieved state-of-the-art performance on the Twitter dataset and outperformed the state of the art on the Wikipedia dataset.
Abstract: The rapid growth in social communication increases the importance of detecting toxic languages. However, detecting toxic language is difficult because of deliberately noisy words and lack of labeled data. These issues cause a low recall in toxic language detection. To address these, we utilized pre-trained BERT models for toxic language detection. We hypothesize pre-trained sub-words embeddings allow BERT models to quickly learn the meaning of obfuscation words and, hence improve the recall of the models on toxic language detection. Our results confirm this hypothesis and show that fine-tuned BERT models perform on a par with the state-of-the-art on the Twitter dataset and outperform the state-of-the-art on the Wikipedia dataset.

Book ChapterDOI
19 Oct 2020
TL;DR: In this article, a blob detector image-based method by fuzzy association rules mining is proposed to detect counterfeit coins, which can be used for other applications based on image content and can surpass the performance of other methods and demonstrate that their framework surpasses in terms of classification accuracy.
Abstract: Image processing techniques using the knowledge obtained from known historical data has become recently one of the most intensively studied topics in decision science and computer science. This paper presents an automatic system for fake coins detection based on image content. In this study, a blob detector image-based method by fuzzy association rules mining is proposed to detect counterfeit coins. This method consists of two-stages. In the first stage, the original image dataset is preprocessed by a blob detector. This provides all frequent features that must be mined in the next stage. In the second stage, fuzzy association rules mining extracts the effective fuzzy rules and classifies automatically the coin image data. The performance of the proposed method has been compared with some other methods and we demonstrate that our framework surpasses in terms of classification accuracy, which is a desirable level when compared with recent studies in this field. This research demonstrates the proposed framework is a reliable intelligent detection system and can be utilized for other applications based on image content.