scispace - formally typeset
Search or ask a question

Showing papers by "Ching Y. Suen published in 2019"


Journal ArticleDOI
TL;DR: This paper finds that for two distributions in hierarchically organized data space, WD has a closed-form solution, which is called “hierarchical WD (HWD),” and uses this theory to construct novel loss functions that overcome the shortcomings of CE loss.
Abstract: In multiclass classification, convolutional neural network (CNN) is generally coupled with the cross-entropy (CE) loss, which only penalizes the predicted probability corresponding to a ground truth class and ignores the interclass relationship. We argue that CNN can be improved by using a better loss function. On the other hand, the Wasserstein distance (WD) is a well-known metric used to measure the distance between two distributions. Directly solving the WD problem requires a prohibitively large amount of computation time, whereas the cheaper iterative algorithms have a variety of shortcomings such as computational instability and difficulty in selecting parameters. In this paper, we address these issues by giving an analytical solution to the WD problem—for the first time, we find that for two distributions in hierarchically organized data space, WD has a closed-form solution, which we call “hierarchical WD (HWD).” We use this theory to construct novel loss functions that overcome the shortcomings of CE loss. To this end, multi-CNN information fusion that provides the basis for building category hierarchies is carried out first. Then, the semantic relationship among classes is modeled as a binary tree. Then, CNN coupled with an HWD-based loss, i.e., hierarchical Wasserstein CNN (HW-CNN), is trained to learn deep features. In this way, prior knowledge about the interclass relationship is embedded into HW-CNN, and information from several CNNs provides guidance in the process of training individual HW-CNNs. We conducted extensive experiments over two publicly available remote sensing data sets and achieved a state-of-the-art performance in scene classification tasks.

59 citations


Journal ArticleDOI
TL;DR: A new flexible approach to predict the gender of the writers from their handwriting samples, named kernel mutual information (KMI), that focuses on feature selection, which can decrease redundancies and conflicts.

27 citations


Journal ArticleDOI
TL;DR: This paper presents a new vehicle speed measurement approach based on motion detection that is able to estimate vehicle’s speed by analyzing its motion parameters inside a pre-defined Region of Interest (ROI) with specified dimensions.
Abstract: Video-based vehicle speed measurement systems are known as effective applications for Intelligent Transportation Systems (ITS) due to their great development capabilities and low costs. These systems utilize camera outputs to apply video processing techniques and extract the desired information. This paper presents a new vehicle speed measurement approach based on motion detection. Contrary to featurebased methods that need visual features of the vehicles like license-plate or windshield, the proposed method is able to estimate vehicle’s speed by analyzing its motion parameters inside a pre-defined Region of Interest (ROI) with specified dimensions. This capability provides realtime computing and performs better than feature-based approaches. The proposed method consists of three primary modules including vehicle detection, tracking, and speed measurement. Each moving object is detected as it enters the ROI by the means of Mixture-of-Gaussian background subtraction method. Then by applying morphology transforms, the distinct parts of these objects turn into unified filled shapes and some defined filtration functions leave behind only the objects with the highest possibility of being a vehicle. Detected vehicles are then tracked using blob tracking algorithm and their displacement among sequential frames are calculated for final speed measurement module. The outputs of the system include the vehicle’s image, its corresponding speed, and detection time. Experimental results show that the proposed approach has an acceptable accuracy in comparison with current speed measurement systems.

17 citations


Book ChapterDOI
27 Aug 2019
TL;DR: A pre-trained Convolutional Neural Network originally trained on relatively similar datasets for face recognition task, namely Ms-Celeb-1M and VGGFace2 is utilized to acquire high-level and robust features of female face images, followed by leveraging a stacking ensemble model which combines the predictions of several base models to predict the attractiveness of a face.
Abstract: Automatic analysis of facial beauty has become an emerging research topic in recent years and has fascinated many researchers. One of the key challenges of facial attractiveness prediction is to obtain accurate and discriminative face representation. This study provides a new framework to analyze the attractiveness of female faces using transfer learning methodology as well as stacking ensemble model. Specifically, a pre-trained Convolutional Neural Network (CNN) originally trained on relatively similar datasets for face recognition task, namely Ms-Celeb-1M and VGGFace2, is utilized to acquire high-level and robust features of female face images. This is followed by leveraging a stacking ensemble model which combines the predictions of several base models to predict the attractiveness of a face. Extensive experiments conducted on SCUT-FBP and SCUT-FBP 5500 benchmark datasets, confirm the strong robustness of the proposed approach. Interestingly, prediction correlations of 0.89 and 0.91 are achieved by our new method for SCUT-FBP and SCUT-FBP5500 datasets, respectively. This would indicate significant advantages over the other state-of-the-art work. Moreover, our successful results would certainly support the efficacy of transfer learning when applying deep learning techniques to compute facial attractiveness.

6 citations



Proceedings ArticleDOI
26 Aug 2019
TL;DR: The obtained results show that the Driver font has less severe confusion cases than the Mandatory font, and is evaluated in context for two datasets using two commercial products: OpenALPR and Plate Recognizer.
Abstract: The chosen font type in the license plate (LP) plays a vital role in the recognition phase in computer-based operations. Some fonts are challenging for humans to read; however, other fonts are challenging for computer systems to recognize. Here, we present two sets of results for font evaluation: font anatomy results, and recognition results for commercial products. For anatomy results, two typical LP fonts are considered: Mandatory, and Driver Gothic. Moreover, we evaluate the effect of these fonts in context for two datasets using two commercial products: OpenALPR and Plate Recognizer. The font anatomy results revealed some important confusion cases and some quality features of both fonts. The obtained results show that the Driver font has less severe confusion cases than the Mandatory font.

2 citations


Journal ArticleDOI
TL;DR: This research presents a client and server-based mobile application for recognition and authentication of banknotes; the system extracted the shape context, Scale Invariant Feature Transform, and shape context context during development.
Abstract: This research presents a client and server-based mobile application for recognition and authentication of banknotes; the system extracted the shape context (SC), Scale Invariant Feature Transform (...