scispace - formally typeset
Search or ask a question

Showing papers in "Signal, Image and Video Processing in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a robust deep learning model based on a novel multi-scale contextual information fusion strategy, called Multi-Level Context Attentional Feature Fusion (MLCA2F), which consists of the Multi-Scale Context-Attention Network (MSCA-Net) blocks for segmenting COVID-19 lesions from CT images.
Abstract: In the field of diagnosis and treatment planning of Coronavirus disease 2019 (COVID-19), accurate infected area segmentation is challenging due to the significant variations in the COVID-19 lesion size, shape, and position, boundary ambiguity, as well as complex structure. To bridge these gaps, this study presents a robust deep learning model based on a novel multi-scale contextual information fusion strategy, called Multi-Level Context Attentional Feature Fusion (MLCA2F), which consists of the Multi-Scale Context-Attention Network (MSCA-Net) blocks for segmenting COVID-19 lesions from Computed Tomography (CT) images. Unlike the previous classical deep learning models, the MSCA-Net integrates Multi-Scale Contextual Feature Fusion (MC2F) and Multi-Context Attentional Feature (MCAF) to learn more lesion details and guide the model to estimate the position of the boundary of infected regions, respectively. Practically, extensive experiments are performed on the Kaggle CT dataset to explore the optimal structure of MLCA2F. In comparison with the current state-of-the-art methods, the experiments show that the proposed methodology provides efficient results. Therefore, we can conclude that the MLCA2F framework has the potential to dramatically improve the conventional segmentation methods for assisting clinical decision-making.

20 citations


Journal ArticleDOI
TL;DR: A novel method to detect the pothole by using a thermal imaging system known as convolutional neural network (CNN)-based modified aquilla optimization (AO) algorithm that enhances the classification accuracy, precision, recall, and F1-score and minimizes the classification error and detection time.

17 citations



Journal ArticleDOI
TL;DR: A lightweight detector called Light-YOLOv4 is proposed, which meets flame and smoke detection tasks’ requirements on the accuracy and real time, and has good detection performance and speed on embedded scenarios.

15 citations




Journal ArticleDOI
TL;DR: In this paper , a potential crack region method is proposed to detect road pavement cracks by using the adaptive threshold, which combines the global threshold and the local threshold to segment the image according to the grayscale distribution characteristics of the crack image.
Abstract: Abstract In this paper, a potential crack region method is proposed to detect road pavement cracks by using the adaptive threshold. To reduce the noises of the image, the pre-treatment algorithm was applied according to the following steps: grayscale processing, histogram equalization, filtering traffic lane. From the image segmentation methods, the algorithm combines the global threshold and the local threshold to segment the image. According to the grayscale distribution characteristics of the crack image, the sliding window is used to obtain the window deviation, and then, the deviation image is segmented based on the maximum inter-class deviation. Obtain a potential crack region and then perform a local threshold-based segmentation algorithm. Real images of pavement surface were used at the Su Tong Li road in Suzhou, China. It was found that the proposed approach could give a more explicit description of pavement cracks in images. The method was tested on 509 images of the German asphalt pavement distress (Gap) dataset: The test results were found to be promising (precision = 0.82, recall = 0.81, F 1 score = 0.83).

12 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an absolute value class activation mapping-based (Abs-CAM) method, which optimizes the gradients derived from the backpropagation and turns all of them into positive gradients to enhance the visual features of output neurons' activation and improve the localization ability of the saliency map.
Abstract: The black-box nature of deep neural networks severely hinders its performance improvement and application in specific scenes. In recent years, class activation mapping-based method has been widely used to interpret the internal decisions of models in computer vision tasks. However, when this method uses backpropagation to obtain gradients, it will cause noise in the saliency map and even locate features that are irrelevant to decisions. In this paper, we propose an absolute value class activation mapping-based (Abs-CAM) method, which optimizes the gradients derived from the backpropagation and turns all of them into positive gradients to enhance the visual features of output neurons’ activation and improve the localization ability of the saliency map. The framework of Abs-CAM is divided into two phases: generating initial saliency map and generating final saliency map. The first phase improves the localization ability of the saliency map by optimizing the gradient, and the second phase linearly combines the initial saliency map with the original image to enhance the semantic information of the saliency map. We conduct qualitative and quantitative evaluation of the proposed method, including Deletion, Insertion, and Pointing Game. The experimental results show that the Abs-CAM can obviously eliminate the noise in the saliency map, and can better locate the features related to decisions, and is superior to the previous methods in recognition and localization tasks.

12 citations





Journal ArticleDOI
TL;DR: An end-to-end pest identification network that combines deep learning and hyperspectral imaging technology is proposed and results prove that this method has higher pest identification accuracy and is more suitable for pest identification tasks than other methods.


Journal ArticleDOI
TL;DR: The method proposed in this paper meets the real-time detection requirements and does well in swimmer behavior recognition and provides technical support for reducing drowning accidents in public swimming pools.


Journal ArticleDOI
TL;DR: In this article , a lightweight UNet using depthwise separable convolutions (DSUNet) was proposed for end-to-end learning of lane detection and path prediction in autonomous driving.
Abstract: Inspired by the UNet architecture of semantic image segmentation, we propose a lightweight UNet using depthwise separable convolutions (DSUNet) for end-to-end learning of lane detection and path prediction (PP) in autonomous driving. We also design and integrate a PP algorithm with convolutional neural network (CNN) to form a simulation model (CNN-PP) that can be used to assess CNN’s performance qualitatively, quantitatively, and dynamically in a host agent car driving along with other agents all in a real-time autonomous manner. DSUNet is 5.12 $$\times $$ lighter in model size and 1.61 $$\times $$ faster in inference than UNet. DSUNet-PP outperforms UNet-PP in mean average errors of predicted curvature and lateral offset for path planning in dynamic simulation. DSUNet-PP outperforms a modified UNet in lateral error, which is tested in a real car on real road. These results show that DSUNet is efficient and effective for lane detection and path prediction in autonomous driving.

Journal ArticleDOI
TL;DR: In this article , a multi-stream convolutional neural network (CNN) was used for feature extraction and classification of X-ray images in diagnosing COVID-19 patients.
Abstract: The year 2020 will certainly be remembered in human history as the year in which humans faced a global pandemic that drastically affected every living soul on planet earth. The COVID-19 pandemic certainly had a massive impact on human's social and daily lives. The economy and relations of all countries were also radically impacted. Due to such unexpected situations, healthcare systems either collapsed or failed under colossal pressure to cope with the overwhelming numbers of patients arriving at emergency rooms and intensive care units. The COVID -19 tests used for diagnosis were expensive, slow, and gave indecisive results. Unfortunately, such a hindered diagnosis of the infection prevented abrupt isolation of the infected people which, in turn, caused the rapid spread of the virus. In this paper, we proposed the use of cost-effective X-ray images in diagnosing COVID-19 patients. Compared to other imaging modalities, X-ray imaging is available in most healthcare units. Deep learning was used for feature extraction and classification by implementing a multi-stream convolutional neural network model. The model extracts and concatenates features from its three inputs, namely; grayscale, local binary patterns, and histograms of oriented gradients images. Extensive experiments using fivefold cross-validation were carried out on a publicly available X-ray database with 3886 images of three classes. Obtained results outperform the results of other algorithms with an accuracy of 97.76%. The results also show that the proposed model can make a significant contribution to the rapidly increasing workload in health systems with an artificial intelligence-based automatic diagnosis tool.

Journal ArticleDOI
TL;DR: A two-stage residual CNN (2RCNN) architecture for learning of features from the color hand gesture images which overcomes the need of a specific preprocessing step, and a novel residual block intensity (RBI) feature to extract the global and local information from the hand gestures images.



Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a dilated convolutional neural network (DCNN) for low-dose CT image denoising, where preprocessing and post-processing techniques are integrated into a DCNN to extend receptive fields.
Abstract: How to reduce radiation dose while preserving the image quality as when using standard dose is an important topic in the computed tomography (CT) imaging domain because the quality of low-dose CT (LDCT) images is often strongly affected by noise and artifacts. Recently, there has been considerable interest in using deep learning as a post-processing step to improve the quality of reconstructed LDCT images. This paper provides, first, an overview of learning-based LDCT image denoising methods from patch-based early learning methods to state-of-the-art CNN-based ones and, then, a novel CNN-based method is presented. In the proposed method, preprocessing and post-processing techniques are integrated into a dilated convolutional neural network to extend receptive fields. Hence, large distance pixels in input images will participate in enriching feature maps of the learned model, leading to effective denoising. Experimental results showed that the proposed method is light, while its denoising effectiveness is competitive with well-known CNN-based models.



Journal ArticleDOI
TL;DR: In this article , a computer vision system was developed to help prevent the transmission of COVID-19 by detecting face mask usage, face-hand interaction detection, and measuring social distance between people.
Abstract: Health organizations advise social distancing, wearing face mask, and avoiding touching face to prevent the spread of coronavirus. Based on these protective measures, we developed a computer vision system to help prevent the transmission of COVID-19. Specifically, the developed system performs face mask detection, face-hand interaction detection, and measures social distance. To train and evaluate the developed system, we collected and annotated images that represent face mask usage and face-hand interaction in the real world. Besides assessing the performance of the developed system on our own datasets, we also tested it on existing datasets in the literature without performing any adaptation on them. In addition, we proposed a module to track social distance between people. Experimental results indicate that our datasets represent the real-world's diversity well. The proposed system achieved very high performance and generalization capacity for face mask usage detection, face-hand interaction detection, and measuring social distance in a real-world scenario on unseen data. The datasets are available at https://github.com/iremeyiokur/COVID-19-Preventions-Control-System.



Journal ArticleDOI
TL;DR: A new imbalance domain adaptation network with adversarial learning (IDAL) is proposed that applies adversarialLearning to data augmentation of the target domain and uses the domain adaptation based on a neural network to narrow the feature distribution discrepancy between the source and target domains.

Journal ArticleDOI
TL;DR: In this article , a cycle-generative adversarial network (cycle-GAN) was proposed for data augmentation in speech emotion recognition (SER) systems, which is trained in an adversarial way to produce feature vectors similar to those in the training set.
Abstract: One of the obstacles in developing speech emotion recognition (SER) systems is the data scarcity problem, i.e., the lack of labeled data for training these systems. Data augmentation is an effective method for increasing the amount of training data. In this paper, we propose a cycle-generative adversarial network (cycle-GAN) for data augmentation in the SER systems. For each of the five emotions considered, an adversarial network is designed to generate data that have a similar distribution to the main data in that class but have a different distribution to those of other classes. These networks are trained in an adversarial way to produce feature vectors similar to those in the training set, which are then added to the original training sets. Instead of using the common cross-entropy loss to train cycle-GANs, we use the Wasserstein divergence to mitigate the gradient vanishing problem and to generate high-quality samples. The proposed network has been applied to SER using the EMO-DB dataset. The quality of the generated data is evaluated using two classifiers based on support vector machine and deep neural network. The results showed that the recognition accuracy in unweighted average recall was about 83.33%, which is better than the baseline methods compared.

Journal ArticleDOI
TL;DR: In this article , a multi-modal system is developed which has integrated information from audio and video modalities, and features are extracted, and neural network models with backpropagation are attempted for developing the models.
Abstract: The objective of the work is to develop an automated emotion recognition system specifically targeted to elderly people. A multi-modal system is developed which has integrated information from audio and video modalities. The database selected for experiments is ElderReact, which contains 1323 video clips of 3 to 8 s duration of people above the age of 50. Here, all the six available emotions Disgust, Anger, Fear, Happiness, Sadness and Surprise are considered. In order to develop an automated emotion recognition system for aged adults, we attempted different modeling techniques. Features are extracted, and neural network models with backpropagation are attempted for developing the models. Further, for the raw video model, transfer learning from pretrained networks is attempted. Convolutional neural network and long short-time memory-based models were taken by maintaining the continuity in time between the frames while capturing the emotions. For the audio model, cross-model transfer learning is applied. Both the models are combined by fusion of intermediate layers. The layers are selected through a grid-based search algorithm. The accuracy and F1-score show that the proposed approach is outperforming the state-of-the-art results. Classification of all the images shows a minimum relative improvement of 6.5% for happiness to a maximum of 46% increase for sadness over the baseline results.

Journal ArticleDOI
TL;DR: In this article , a semi-automatic threshold-based segmentation method was proposed to generate region of interest (ROI) segmentations of infection visible on lung computed tomography (CT) scans.
Abstract: Since December 2019, the novel coronavirus disease 2019 (COVID-19) has claimed the lives of more than 3.75 million people worldwide. Consequently, methods for accurate COVID-19 diagnosis and classification are necessary to facilitate rapid patient care and terminate viral spread. Lung infection segmentations are useful to identify unique infection patterns that may support rapid diagnosis, severity assessment, and patient prognosis prediction, but manual segmentations are time-consuming and depend on radiologic expertise. Deep learning-based methods have been explored to reduce the burdens of segmentation; however, their accuracies are limited due to the lack of large, publicly available annotated datasets that are required to establish ground truths. For these reasons, we propose a semi-automatic, threshold-based segmentation method to generate region of interest (ROI) segmentations of infection visible on lung computed tomography (CT) scans. Infection masks are then used to calculate the percentage of lung abnormality (PLA) to determine COVID-19 severity and to analyze the disease progression in follow-up CTs. Compared with other COVID-19 ROI segmentation methods, on average, the proposed method achieved improved precision ( 47.49% ) and specificity ( 98.40% ) scores. Furthermore, the proposed method generated PLAs with a difference of ±3.89% from the ground-truth PLAs. The improved ROI segmentation results suggest that the proposed method has potential to assist radiologists in assessing infection severity and analyzing disease progression in follow-up CTs.