scispace - formally typeset
Search or ask a question
Author

Mahdi Pourmirzaei

Bio: Mahdi Pourmirzaei is an academic researcher. The author has contributed to research in topics: Supervised learning & Phosphorylation. The author has an hindex of 2, co-authored 5 publications receiving 8 citations.

Papers
More filters
Posted Content
13 May 2021
TL;DR: In this article, a Hybrid Learning (HL) framework was proposed for standard Supervised Learning (SL), which used Self-Supervised co-training with SL in Multi-Task Learning (MTL) manner.
Abstract: In this paper, at first, the impact of ImageNet pre-training on Facial Expression Recognition (FER) was tested under different augmentation levels. It could be seen from the results that training from scratch could reach better performance compared to ImageNet fine-tuning at stronger augmentation levels. After that, a framework was proposed for standard Supervised Learning (SL), called Hybrid Learning (HL) which used Self-Supervised co-training with SL in Multi-Task Learning (MTL) manner. Leveraging Self-Supervised Learning (SSL) could gain additional information from input data like spatial information from faces which helped the main SL task. It is been investigated how this method could be used for FER problems with self-supervised pre-tasks such as Jigsaw puzzling and in-painting. The supervised head (SH) was helped by these two methods to lower the error rate under different augmentations and low data regime in the same training settings. The state-of-the-art was reached on AffectNet via two completely different HL methods, without utilizing additional datasets. Moreover, HL's effect was shown on two different facial-related problem, head poses estimation and gender recognition, which concluded to reduce in error rate by up to 9% and 1% respectively. Also, we saw that the HL methods prevented the model from reaching overfitting.

6 citations

Posted Content
TL;DR: In this paper, the impact of ImageNet pre-training on fine-grained facial expression recognition (FER) was tested and it was shown from the results that training from scratch is better than ImageNet fine-tuning at stronger augmentation levels.
Abstract: Over the past few years, best SSL methods, gradually moved from the pre-text task learning to the Contrastive learning. But contrastive methods have some drawbacks which could not be solved completely, such as performing poor on fine-grained visual tasks compare to supervised learning methods. In this study, at first, the impact of ImageNet pre-training on fine-grained Facial Expression Recognition (FER) was tested. It could be seen from the results that training from scratch is better than ImageNet fine-tuning at stronger augmentation levels. After that, a framework was proposed for standard Supervised Learning (SL), called Hybrid Multi-Task Learning (HMTL) which merged Self-Supervised as auxiliary task to the SL training setting. Leveraging Self-Supervised Learning (SSL) can gain additional information from input data than labels which can help the main fine-grained SL task. It is been investigated how this method could be used for FER by designing two customized version of common pre-text techniques, Jigsaw puzzling and in-painting. The state-of-the-art was reached on AffectNet via two types of HMTL, without utilizing pre-training on additional datasets. Moreover, we showed the difference between SS pre-training and HMTL to demonstrate superiority of proposed method. Furthermore, the impact of proposed method was shown on two other fine-grained facial tasks, Head Poses estimation and Gender Recognition, which concluded to reduce in error rate by 11% and 1% respectively.

3 citations

Posted Content
TL;DR: In this article, modified versions of jigsaw puzzling and rotation are used as SSL pre-text tasks and the best architecture for Hybrid Multi-Task Learning (HMTL) is found.
Abstract: Recent progress of Self-Supervised Learning (SSL) demonstrates the capability of these methods in computer vision field. However, this progress could not show any promises for fine-grained tasks such as Head Pose estimation. In this article, we have tried to answer a question: How SSL can be used for Head Pose estimation? In general, there are two main approaches to use SSL: (1) Using pre-trained weights which can be done via SSL tasks, (2) Leveraging SSL as an auxiliary co-training task besides of Supervised Learning (SL) tasks at the same time. In this study, modified versions of jigsaw puzzling and rotation as SSL pre-text tasks are used and the best architecture for our proposed Hybrid Multi-Task Learning (HMTL) is found. Finally, the HopeNet method as a baseline SL is selected and the impact of SSL pre-training and ImageNet pre-training on both HMTL and SL are compared. The error rate reduced by the HMTL method up to 13% compare to the SL. Moreover, HMTL method showed that it was good with all kinds of initial weights: random, ImageNet and SSL pre-training weights. Also, it was observed, when puzzled images are used for SL alone, the average error rate placed between SL and HMTL, showed the importance of local spatial features compare to global spatial features.
Posted Content
TL;DR: In this article, the authors designed, implemented, and evaluated a system to personalize the learning environment based on the facial emotions recognition, head pose estimation, and cognitive style of learners.
Abstract: In recent years, the main problem in e-learning has shifted from analyzing content to personalization of learning environment by Intelligence Tutoring Systems (ITSs). Therefore, by designing personalized teaching models, learners are able to have a successful and satisfying experience in achieving their learning goals. Affective Tutoring Systems (ATSs) are some kinds of ITS that can recognize and respond to affective states of learner. In this study, we designed, implemented, and evaluated a system to personalize the learning environment based on the facial emotions recognition, head pose estimation, and cognitive style of learners. First, a unit called Intelligent Analyzer (AI) created which was responsible for recognizing facial expression and head angles of learners. Next, the ATS was built which mainly made of two units: ITS, IA. Results indicated that with the ATS, participants needed less efforts to pass the tests. In other words, we observed when the IA unit was activated, learners could pass the final tests in fewer attempts than those for whom the IA unit was deactivated. Additionally, they showed an improvement in terms of the mean passing score and academic satisfaction.
Posted Content
TL;DR: In this paper, the authors comprehensively reviewed all reversible post-translational modification (PTM) data and showed that there are basically two main approaches for phosphorylation prediction by machine learning: end-to-end and conventional.
Abstract: Reversible Post-Translational Modifications (PTMs) have vital roles in extending the functional diversity of proteins and effect meaningfully the regulation of protein functions in prokaryotic and eukaryotic organisms. PTMs have happened as crucial molecular regulatory mechanisms that are utilized to regulate diverse cellular processes. Nevertheless, among the most well-studied PTMs can say mainly types of proteins are containing phosphorylation and significant roles in many biological processes. Disorder in this modification can be caused by multiple diseases including neurological disorders and cancers. Therefore, it is necessary to predict the phosphorylation of target residues in an uncharacterized amino acid sequence. Most experimental techniques for predicting phosphorylation are time-consuming, costly, and error-prone. By the way, computational methods have replaced these techniques. These days, a vast amount of phosphorylation data is publicly accessible through many online databases. In this study, at first, all datasets of PTMs that include phosphorylation sites (p-sites) were comprehensively reviewed. Furthermore, we showed that there are basically two main approaches for phosphorylation prediction by machine learning: End-to-End and conventional. We gave an overview for both of them. Also, we introduced 15 important feature extraction techniques which mostly have been used for conventional machine learning methods

Cited by
More filters
Proceedings ArticleDOI
TL;DR: In this article, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins.
Abstract: In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins The necessity to fine-tune these networks to predict facial expressions is highlighted Several models are presented based on MobileNet, EfficientNet and RexNet architectures It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 45% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges The models and source code are publicly available at this https URL

105 citations

Proceedings ArticleDOI
16 Sep 2021
TL;DR: In this article, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins.
Abstract: In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins. The necessity to fine-tune these networks to predict facial expressions is highlighted. Several models are presented based on lightweight architectures, such as MobileNet, EfficientNet and RexNet. It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset. Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 4.5% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges.

61 citations

Proceedings ArticleDOI
TL;DR: This paper proposed Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER, which achieved state-of-the-art performance on two multi-view FER datasets.
Abstract: Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems. Despite recent advancements in FER, performance often drops significantly for non-frontal facial images. We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER. CL-MEx is a two-step training framework. In the first step, an encoder network is pre-trained with the proposed self-supervised contrastive loss, where it learns to generate view-invariant embeddings for different views of a subject. The model is then fine-tuned with labeled data in a supervised setting. We demonstrate the performance of the proposed method on two multi-view FER datasets, KDEF and DDCF, where state-of-the-art performances are achieved. Further experiments show the robustness of our method in dealing with challenging angles and reduced amounts of labeled data.

26 citations

Proceedings ArticleDOI
18 Oct 2021
TL;DR: This paper proposed Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER, which achieved state-of-the-art performance on two multi-view FER datasets.
Abstract: Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems. Despite recent advancements in FER, performance often drops significantly for non-frontal facial images. We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER. CL-MEx is a two-step training framework. In the first step, an encoder network is pre-trained with the proposed self-supervised contrastive loss, where it learns to generate view-invariant embeddings for different views of a subject. The model is then fine-tuned with labeled data in a supervised setting. We demonstrate the performance of the proposed method on two multi-view FER datasets, KDEF and DDCF, where state-of-the-art performances are achieved. Further experiments show the robustness of our method in dealing with challenging angles and reduced amounts of labeled data.

21 citations

Posted Content
TL;DR: In this article, the authors compared the performance and attention patterns of humans and machines during a two-alternative forced-choice FER task, and found that humans outperformed machines quite significantly.
Abstract: Facial expression recognition (FER) is a topic attracting significant research in both psychology and machine learning with a wide range of applications. Despite a wealth of research on human FER and considerable progress in computational FER made possible by deep neural networks (DNNs), comparatively less work has been done on comparing the degree to which DNNs may be comparable to human performance. In this work, we compared the recognition performance and attention patterns of humans and machines during a two-alternative forced-choice FER task. Human attention was here gathered through click data that progressively uncovered a face, whereas model attention was obtained using three different popular techniques from explainable AI: CAM, GradCAM and Extremal Perturbation. In both cases, performance was gathered as percent correct. For this task, we found that humans outperformed machines quite significantly. In terms of attention patterns, we found that Extremal Perturbation had the best overall fit with the human attention map during the task.

1 citations