scispace - formally typeset
Search or ask a question

Showing papers in "Multimedia Tools and Applications in 2022"


Journal ArticleDOI
TL;DR: The proposed paper suggested two phases EfficientNet Convolution Neural Network-based framework for identifying the real or spoofed user sample and the proposed system is trained using Efficient net convolution neural Network on different datasets of spoofed and actual iris biometric samples to discriminate the original and spoofed one.

114 citations



Journal ArticleDOI
TL;DR: In this paper , the authors distinguish four phases by discussing different levels of NLP and components of Natural Language Generation followed by presenting the history and evolution of natural language processing (NLP).
Abstract: Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP and components of Natural Language Generation followed by presenting the history and evolution of NLP. We then discuss in detail the state of the art presenting the various applications of NLP, current trends, and challenges. Finally, we present a discussion on some available datasets, models, and evaluation metrics in NLP.

65 citations



Journal ArticleDOI
TL;DR: In this article , a comprehensive review of single stage object detectors, regression formulation, their architecture advancements, and performance statistics is presented, among different versions of YOLO, applications based on two-stage detectors, and applications with different methods for detecting objects.
Abstract: Object detection is one of the predominant and challenging problems in computer vision. Over the decade, with the expeditious evolution of deep learning, researchers have extensively experimented and contributed in the performance enhancement of object detection and related tasks such as object classification, localization, and segmentation using underlying deep models. Broadly, object detectors are classified into two categories viz. two stage and single stage object detectors. Two stage detectors mainly focus on selective region proposals strategy via complex architecture; however, single stage detectors focus on all the spatial region proposals for the possible detection of objects via relatively simpler architecture in one shot. Performance of any object detector is evaluated through detection accuracy and inference time. Generally, the detection accuracy of two stage detectors outperforms single stage object detectors. However, the inference time of single stage detectors is better compared to its counterparts. Moreover, with the advent of YOLO (You Only Look Once) and its architectural successors, the detection accuracy is improving significantly and sometime it is better than two stage detectors. YOLOs are adopted in various applications majorly due to their faster inferences rather than considering detection accuracy. As an example, detection accuracies are 63.4 and 70 for YOLO and Fast-RCNN respectively, however, inference time is around 300 times faster in case of YOLO. In this paper, we present a comprehensive review of single stage object detectors specially YOLOs, regression formulation, their architecture advancements, and performance statistics. Moreover, we summarize the comparative illustration between two stage and single stage object detectors, among different versions of YOLOs, applications based on two stage detectors, and different versions of YOLOs along with the future research directions.

57 citations



Journal ArticleDOI
TL;DR: In this paper , the authors used deep convolutional neural networks in a large dataset of chest X-ray images to detect the COVID-19 pneumonia using the transfer learning paradigm.
Abstract: One of the primary clinical observations for screening the novel coronavirus is capturing a chest x-ray image. In most patients, a chest x-ray contains abnormalities, such as consolidation, resulting from COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from several sources are collected, and one of the largest publicly accessible datasets is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized to develop COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system.

52 citations



Journal ArticleDOI
TL;DR: Experiments show that Yolo V4_1 (with SPP) outperforms the state-of-the-art schemes, achieving 99.4% accuracy in the authors' experiments, along with the best total BFLOPS and mAP (99.32%) in their experiment, and SPP can enhance the achievement of all models in the experiment.

43 citations


Journal ArticleDOI
TL;DR: In this article , a face mask detection model for static and real-time videos has been presented which classifies the images as "with mask" and "without mask". The model is trained and evaluated using the Kaggle data-set.
Abstract: In current times, after the rapid expansion and spread of the COVID-19 outbreak globally, people have experienced severe disruption to their daily lives. One idea to manage the outbreak is to enforce people wear a face mask in public places. Therefore, automated and efficient face detection methods are essential for such enforcement. In this paper, a face mask detection model for static and real time videos has been presented which classifies the images as "with mask" and "without mask". The model is trained and evaluated using the Kaggle data-set. The gathered data-set comprises approximately about 4,000 pictures and attained a performance accuracy rate of 98%. The proposed model is computationally efficient and precise as compared to DenseNet-121, MobileNet-V2, VGG-19, and Inception-V3. This work can be utilized as a digitized scanning tool in schools, hospitals, banks, and airports, and many other public or commercial locations.

37 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) in order to compare results between them.
Abstract: Abstract Nowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Journal ArticleDOI
TL;DR: In this article , the authors implemented an experiment to evaluate the performance of the latest version of YOLOv5 based on our dataset for traffic sign recognition (TSR), which unfolds how the model for visual object recognition in deep learning is suitable for TSR through a comprehensive comparison with SSD (i.e., single shot multibox detector).
Abstract: Abstract Intelligent Transportation System (ITS), including unmanned vehicles, has been gradually matured despite on road. How to eliminate the interference due to various environmental factors, carry out accurate and efficient traffic sign detection and recognition, is a key technical problem. However, traditional visual object recognition mainly relies on visual feature extraction, e.g., color and edge, which has limitations. Convolutional neural network (CNN) was designed for visual object recognition based on deep learning, which has successfully overcome the shortcomings of conventional object recognition. In this paper, we implement an experiment to evaluate the performance of the latest version of YOLOv5 based on our dataset for Traffic Sign Recognition (TSR), which unfolds how the model for visual object recognition in deep learning is suitable for TSR through a comprehensive comparison with SSD (i.e., single shot multibox detector) as the objective of this paper. The experiments in this project utilize our own dataset. Pertaining to the experimental results, YOLOv5 achieves 97.70% in terms of mAP@0.5 for all classes, SSD obtains 90.14% mAP in the same term. Meanwhile, regarding recognition speed, YOLOv5 also outperforms SSD.

Journal ArticleDOI
Qi Feng1
TL;DR: In this paper , a channel enhancement feature pyramid network (CE-FPN) is proposed to solve the channel reduction problem in FPN-based methods, which is inspired by sub-pixel convolution.
Abstract: Feature pyramid network (FPN) has been an efficient framework to extract multi-scale features in object detection. However, current FPN-based methods mostly suffer from the intrinsic flaw of channel reduction, which brings about the loss of semantical information. And the miscellaneous feature maps may cause serious aliasing effects. In this paper, we present a novel channel enhancement feature pyramid network (CE-FPN) to alleviate these problems. Specifically, inspired by sub-pixel convolution, we propose sub-pixel skip fusion (SSF) to perform both channel enhancement and upsampling. Instead of the original 1 × 1 convolution and linear upsampling, it mitigates the information loss due to channel reduction. Then we propose sub-pixel context enhancement (SCE) for extracting stronger feature representations, which is superior to other context methods due to the utilization of rich channel information by sub-pixel convolution. Furthermore, we introduce a channel attention guided module (CAG) to optimize the final integrated features on each level. It alleviates the aliasing effect only with a few computational burdens. We evaluate our approaches on Pascal VOC and MS COCO benchmark. Extensive experiments show that CE-FPN achieves competitive performance and is more lightweight compared to state-of-the-art FPN-based detectors.





Journal ArticleDOI
TL;DR: In this article , a summary of concepts and few practical applications of Digital Twins are introduced, combined with the current development status of DT, predict the future development trend of DT and make a summary.
Abstract: Abstract With the development of science and technology, the high-tech industry is developing rapidly, and various new-age technologies continue to appear, and Digital Twins (DT) is one of them. As a brand-new interactive technology, DT technology can handle the interaction between the real world and the virtual world well. It has become a hot spot in the academic circles of all countries in the world. DT have developed rapidly in recent years result from centrality, integrity and dynamics. It is integrated with other technologies and has been applied in many fields, such as smart factory in industrial production, digital model of life in medical field, construction of smart city, security guarantee in aerospace field, immersive shopping in commercial field and so on. The introduction of DT is mostly a summary of concepts, and few practical applications of Digital Twins are introduced. The purpose of this paper is to enable people to understand the application status of DT technology. At the same time, the introduction of core technologies related to DT is interspersed in the application introduction. Finally, combined with the current development status of DT, predict the future development trend of DT and make a summary.



Journal ArticleDOI
TL;DR: A survey on state-ofthe-art deepfake generation methods, detection methods, and existing datasets is made and future trends on deepfake detection can be efficient, robust and systematical detection methods and high quality datasets.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented an adaptive framework aimed at preserving the security and confidentiality of images transmitted through an e-healthcare system, which utilizes the 3D-chaotic system to generate a keystream which is used to perform 8-bit and 2-bit permutations of the image.
Abstract: In recent years, there has been an enormous demand for the security of image multimedia in healthcare organizations. Many schemes have been developed for the security preservation of data in e-health systems however the schemes are not adaptive and cannot resist chosen and known-plaintext attacks. In this contribution, we present an adaptive framework aimed at preserving the security and confidentiality of images transmitted through an e-healthcare system. Our scheme utilizes the 3D-chaotic system to generate a keystream which is used to perform 8-bit and 2-bit permutations of the image. We perform pixel diffusion by a key-image generated using the Piecewise Linear Chaotic Map (PWLCM). We calculate an image parameter using the pixels of the image and perform criss-cross diffusion to enhance security. We evaluate the scheme's performance in terms of histogram analysis, information entropy analysis, statistical analysis, and differential analysis. Using the scheme, we obtain the average Number of Pixels Change Rate (NPCR) and Unified Average Changing Intensity (UACI) values for an image of size 256 × 256 equal to 99.5996 and 33.499 respectively. Furthermore, the average entropy is 7.9971 and the average Peak Signal to Noise Ratio (PSNR) is 7.4756. We further test the scheme on 50 chest X-Ray images of patients having COVID-19 and viral pneumonia and found the average values of variance, PSNR, entropy, and Structural Similarity Index (SSIM) to be 257.6268, 7.7389, 7.9971, and 0.0089 respectively. Furthermore, the scheme generates completely uniform histograms for medical images which reveals that the scheme can resist statistical attacks and can be applied as a security framework in AI-based healthcare.

Journal ArticleDOI
TL;DR: In this paper , a hybrid Marine Predators Algorithm (MPA) with Salp Swarm Algorithm(SSA) was proposed to determine the optimal multilevel threshold image segmentation MPASSA.
Abstract: Pixel rating is considered one of the commonly used critical factors in digital image processing that depends on intensity. It is used to determine the optimal image segmentation threshold. In recent years, the optimum threshold has been selected with great interest due to its many applications. Several methods have been used to find the optimum threshold, including the Otsu and Kapur methods. These methods are appropriate and easy to implement to define a single or bi-level threshold. However, when they are extended to multiple levels, they will cause some problems, such as long time-consuming, the high computational cost, and the needed improvement in their accuracy. To avoid these problems and determine the optimal multilevel image segmentation threshold, we proposed a hybrid Marine Predators Algorithm (MPA) with Salp Swarm Algorithm (SSA) to determine the optimal multilevel threshold image segmentation MPASSA. The obtained solutions of the proposed method are represented using the image histogram. Several standard evaluation measures, such as (the fitness function, time consumer, Peak Signal-to-Noise Ratio, Structural Similarity Index, etc.…) are employed to evaluate the proposed segmentation method's effectiveness. Several benchmark images are used to validate the proposed algorithm's performance (MPASSA). The results showed that the proposed MPASSA got better results than other well-known optimization algorithms published in the literature.


Journal ArticleDOI
TL;DR: In this paper , a hybrid genetic algorithm (GA) and particle swarm optimization (PSO) optimized approach based on random forest (RF), called GAPSO-RF, is developed and used to select the optimal features that can increase the accuracy of heart disease prediction.
Abstract: Abstract Nowadays, heart diseases are significantly contributing to deaths all over the world. Thus, heart-disease prediction has garnered considerable attention in the medical domain globally. Accordingly, machine-learning algorithms for the early prediction of heart diseases were developed in several studies to help physicians design medical procedures. In this study, a hybrid genetic algorithm (GA) and particle swarm optimization (PSO) optimized approach based on random forest (RF), called GAPSO-RF, is developed and used to select the optimal features that can increase the accuracy of heart-disease prediction. The proposed GAPSO-RF implements multivariate statistical analysis in the first step to select the most significant features used in the initial population. After that, a discriminate mutation strategy is implemented in GA. GAPSO-RF combines a modified GA for global search and a PSO for local search. Moreover, PSO achieved the concept of rehabbing individuals that had been refused in the selection process. The performance of the proposed GAPSO-RF approach is validated via evaluation metrics, namely, accuracy, specificity, sensitivity, and area under the receiver operating characteristic (ROC) curve by using two datasets from the University of California, namely, Cleveland and Statlog. The experimental results confirm that the GAPSO-RF approach attained the high heart-disease-prediction accuracies of 95.6% and 91.4% on the Cleveland and Statlog datasets, respectively. Furthermore, the proposed approach outperformed other state-of-the-art prediction methods.




Journal ArticleDOI
TL;DR: In this paper , a detailed literature review focusing on object detection and discusses the object detection techniques is provided, and a systematic review has been followed to summarize the current research work's findings and discuss seven research questions related to object detection.
Abstract: Object detection is one of the most fundamental and challenging tasks to locate objects in images and videos. Over the past, it has gained much attention to do more research on computer vision tasks such as object classification, counting of objects, and object monitoring. This study provides a detailed literature review focusing on object detection and discusses the object detection techniques. A systematic review has been followed to summarize the current research work’s findings and discuss seven research questions related to object detection. Our contribution to the current research work is (i) analysis of traditional, two-stage, one-stage object detection techniques, (ii) Dataset preparation and available standard dataset, (iii) Annotation tools, and (iv) performance evaluation metrics. In addition, a comparative analysis has been performed and analyzed that the proposed techniques are different in their architecture, optimization function, and training strategies. With the remarkable success of deep neural networks in object detection, the performance of the detectors has improved. Various research challenges and future directions for object detection also has been discussed in this research paper.

Journal ArticleDOI
TL;DR: A blind-watermark backdoor method whose results are imperceptible to humans is proposed, which avoids the human detectability of the backdoor sample attack by making the trigger invisible.