scispace - formally typeset
Search or ask a question

Showing papers in "Multimedia Tools and Applications in 2021"


Journal ArticleDOI
TL;DR: The analysis of recent advances in genetic algorithms is discussed and the well-known algorithms and their implementation are presented with their pros and cons with the aim of facilitating new researchers.
Abstract: In this paper, the analysis of recent advances in genetic algorithms is discussed. The genetic algorithms of great interest in research community are selected for analysis. This review will help the new and demanding researchers to provide the wider vision of genetic algorithms. The well-known algorithms and their implementation are presented with their pros and cons. The genetic operators and their usages are discussed with the aim of facilitating new researchers. The different research domains involved in genetic algorithms are covered. The future research directions in the area of genetic operators, fitness function and hybrid algorithms are discussed. This structured review will be helpful for research and graduate teaching.

1,271 citations


Journal ArticleDOI
TL;DR: This research concludes that SSIM is a better measure of imperceptibility in all aspects and it is preferable that in the next steganographic research at least use SSIM.
Abstract: Peak signal to noise ratio (PSNR) and structural index similarity (SSIM) are two measuring tools that are widely used in image quality assessment. Especially in the steganography image, these two measuring instruments are used to measure the quality of imperceptibility. PSNR is used earlier than SSIM, is easy, has been widely used in various digital image measurements, and has been considered tested and valid. SSIM is a newer measurement tool that is designed based on three factors i.e. luminance, contrast, and structure to better suit the workings of the human visual system. Some research has discussed the correlation and comparison of these two measuring tools, but no research explicitly discusses and suggests which measurement tool is more suitable for steganography. This study aims to review, prove, and analyze the results of PSNR and SSIM measurements on three spatial domain image steganography methods, i.e. LSB, PVD, and CRT. Color images were chosen as container images because human vision is more sensitive to color changes than grayscale changes. Based on the test results found several opposing findings, where LSB has the most superior value based on PSNR and PVD get the most superior value based on SSIM. Additionally, the changes based on the histogram are more noticeable in LSB and CRT than in PVD. Other analyzes such as RS attack also show results that are more in line with SSIM measurements when compared to PSNR. Based on the results of testing and analysis, this research concludes that SSIM is a better measure of imperceptibility in all aspects and it is preferable that in the next steganographic research at least use SSIM.

204 citations


Journal ArticleDOI
TL;DR: FakeBERT as discussed by the authors combines different parallel blocks of the single-layer deep Convolutional Neural Network (CNN) having different kernel sizes and filters with the BERT, which is useful to handle ambiguity.
Abstract: In the modern era of computing, the news ecosystem has transformed from old traditional print media to social media outlets. Social media platforms allow us to consume news much faster, with less restricted editing results in the spread of fake news at an incredible pace and scale. In recent researches, many useful methods for fake news detection employ sequential neural networks to encode news content and social context-level information where the text sequence was analyzed in a unidirectional way. Therefore, a bidirectional training approach is a priority for modelling the relevant information of fake news that is capable of improving the classification performance with the ability to capture semantic and long-distance dependencies in sentences. In this paper, we propose a BERT-based (Bidirectional Encoder Representations from Transformers) deep learning approach (FakeBERT) by combining different parallel blocks of the single-layer deep Convolutional Neural Network (CNN) having different kernel sizes and filters with the BERT. Such a combination is useful to handle ambiguity, which is the greatest challenge to natural language understanding. Classification results demonstrate that our proposed model (FakeBERT) outperforms the existing models with an accuracy of 98.90%.

166 citations


Journal ArticleDOI
TL;DR: This paper presents an efficient deep features-based intelligent anomaly detection framework that can operate in surveillance networks with reduced time complexity and reports a 3.41% and 8.09% increase in accuracy on UCF-Crime and UCFCrime2Local datasets compared to state-of-the-art methods.
Abstract: In current technological era, surveillance systems generate an enormous volume of video data on a daily basis, making its analysis a difficult task for computer vision experts. Manually searching for unusual events in these massive video streams is a challenging task, since they occur inconsistently and with low probability in real-world surveillance. In contrast, deep learning-based anomaly detection reduces human labour and its decision making ability is comparatively reliable, thereby ensuring public safety. In this paper, we present an efficient deep features-based intelligent anomaly detection framework that can operate in surveillance networks with reduced time complexity. In the proposed framework, we first extract spatiotemporal features from a series of frames by passing each one to a pre-trained Convolutional Neural Network (CNN) model. The features extracted from the sequence of frames are valuable in capturing anomalous events. We then pass the extracted deep features to multi-layer Bi-directional Long Short-term Memory (BD-LSTM) model, which can accurately classify ongoing anomalous/normal events in complex surveillance scenes of smart cities. We performed extensive experiments on various anomaly detection benchmark datasets to validate the functionality of the proposed framework within complex surveillance scenarios. We reported a 3.41% and 8.09% increase in accuracy on UCF-Crime and UCFCrime2Local datasets compared to state-of-the-art methods.

129 citations


Journal ArticleDOI
TL;DR: A dynamic deep hybrid spatio-temporal neural network namely DHSTNet is proposed, to predict traffic flows in every region of a city with high accuracy, and the exaggeration approach based on an attention mechanism is applied to predict citywide short-term traffic crowd flows.
Abstract: Accurate and timely predicting citywide traffic crowd flows precisely is crucial for public safety and traffic management in smart cities. Nevertheless, its crucial challenge lies in how to model multiple complicated spatial dependencies between different regions, dynamic temporal laws among different time intervals with external factors such as holidays, events, and weather. Some existing work leverage the long short-term memory (LSTM) and convolutional neural network (CNN) to explore temporal relations and spatial relations, respectively; which have outperformed the classical statistical methods. However, it is difficult for these approaches to jointly model spatial and temporal correlations. To address this problem, we propose a dynamic deep hybrid spatio-temporal neural network namely DHSTNet, to predict traffic flows in every region of a city with high accuracy. In particular, our DSHTNet model comprises four properties i.e., closeness volume, daily volume, trend volume, and external branch, respectively. Moreover, the projected model dynamically assigns different weights to various branches and, then, integrate outputs of four properties to produce final prediction outcomes. The model has been evaluated, both for offline and online predictions, using an edge/fog infrastructure where training happens on the remote cloud and prediction occurs at the edge i.e. in the proximity of users. Extensive experiments and evaluation on two real-world datasets demonstrate the advantage of the proposed model, in terms of high accuracy over prevailing state-of-the-art baseline methods. Moreover, we apply the exaggeration approach based on an attention mechanism to the above model, called as AAtt-DHSTNet; to predict citywide short-term traffic crowd flows; and show its notable performance in the traffic flows prediction. The aggregation method collects information from the related time series, remove redundancy and, thus, increases prediction speed and accuracy. Our empirical evaluation suggests that the AAtt-DHSTNet model is approximately 20.8% and 8.8% more accurate than the DHSTNet technique, for two different real-world traffic datasets.

125 citations


Journal ArticleDOI
TL;DR: The theoretical analysis including universal approximation theory and generalization, and the various improvements are listed, which help ELM works better in terms of stability, efficiency, and accuracy.
Abstract: Extreme learning machine (ELM) is a training algorithm for single hidden layer feedforward neural network (SLFN), which converges much faster than traditional methods and yields promising performance In this paper, we hope to present a comprehensive review on ELM Firstly, we will focus on the theoretical analysis including universal approximation theory and generalization Then, the various improvements are listed, which help ELM works better in terms of stability, efficiency, and accuracy Because of its outstanding performance, ELM has been successfully applied in many real-time learning tasks for classification, clustering, and regression Besides, we report the applications of ELM in medical imaging: MRI, CT, and mammogram The controversies of ELM were also discussed in this paper We aim to report these advances and find some future perspectives

121 citations


Journal ArticleDOI
TL;DR: In this paper, two state-of-the-art object detection models, namely, YOLOv3 and faster R-CNN, are used to detect face masks.
Abstract: There are many solutions to prevent the spread of the COVID-19 virus and one of the most effective solutions is wearing a face mask. Almost everyone is wearing face masks at all times in public places during the coronavirus pandemic. This encourages us to explore face mask detection technology to monitor people wearing masks in public places. Most recent and advanced face mask detection approaches are designed using deep learning. In this article, two state-of-the-art object detection models, namely, YOLOv3 and faster R-CNN are used to achieve this task. The authors have trained both the models on a dataset that consists of images of people of two categories that are with and without face masks. This work proposes a technique that will draw bounding boxes (red or green) around the faces of people, based on whether a person is wearing a mask or not, and keeps the record of the ratio of people wearing face masks on the daily basis. The authors have also compared the performance of both the models i.e., their precision rate and inference time.

118 citations


Journal ArticleDOI
TL;DR: A comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work.
Abstract: Object recognition is a key research area in the field of image processing and computer vision, which recognizes the object in an image and provides a proper label. In the paper, three popular feature descriptor algorithms that are Scale Invariant Feature Transform (SIFT), Speeded Up Robust Feature (SURF) and Oriented Fast and Rotated BRIEF (ORB) are used for experimental work of an object recognition system. A comparison among these three descriptors is exhibited in the paper by determining them individually and with different combinations of these three methodologies. The amount of the features extracted using these feature extraction methods are further reduced using a feature selection (k-means clustering) and a dimensionality reduction method (Locality Preserving Projection). Various classifiers i.e. K-Nearest Neighbor, Naive Bayes, Decision Tree, and Random Forest are used to classify objects based on their similarity. The focus of this article is to present a study of the performance comparison among these three feature extraction methods, particularly when their combination derives in recognizing the object more efficiently. In this paper, the authors have presented a comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition. The Caltech-101 public dataset is considered in this article for experimental work. The experiment reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work. The comparative analysis has been presented in terms of recognition accuracy, True Positive Rate (TPR), False Positive Rate (FPR), and Area Under Curve (AUC) parameters.

100 citations


Journal ArticleDOI
TL;DR: A fine-grained VTC method using lightweight convolutional neural network with feature optimization and joint learning strategy combining softmax loss and contrastive-center loss to class vehicle types is proposed, thereby improving model’s fine- grained classification ability.
Abstract: Vehicle type classification (VTC) plays an important role in today’s intelligent transportation. Previous VTC systems usually run on a monitoring center’s host machine due to the models’ complexity, which consume lots of computing resources and have poor real-time performance. If these systems are deployed to embedded terminals by making the model lightweight while ensuring accuracy, then the problem can be addressed. To this end, we propose a fine-grained VTC method using lightweight convolutional neural network with feature optimization and joint learning strategy. Firstly, a lightweight convolutional network with feature optimization (LWCNN-FO) is designed. We use depthwise separable convolution to reduce network parameters. Besides, the SENet module is added to obtain the important degree of each feature channel automatically through the sample-based self-learning, which can improve recognition accuracy with less network parameters growth. In addition, considering both between-class similarity and intra-class variance, this paper adopts the joint learning strategy combining softmax loss and contrastive-center loss to class vehicle types, thereby improving model’s fine-grained classification ability. We also build a dataset, called Car-159, consisting of 7998 pictures for 159 vehicle types, to evaluate our method. Compared with the state-of-the-art methods, experimental results show that our method can effectively decrease model’s complexity while maintaining accuracy.

85 citations


Journal ArticleDOI
TL;DR: In this article, a new multimedia cloud content distribution method is proposed based upon the integrated user utility and interest discovery, where the interest features of users are extracted through applying an important feature extraction method, and the separation of service and non-service users are formed through the development of a group depending on the categorization of same service interest and adjacent region included users.
Abstract: Multimedia services are offered to the demanding user by the multimedia cloud. By the fact, the sudden increase of network users has reduced the service receiving response time. Hence, it is impossible to achieve service satisfaction. Therefore, user experience based characteristic is a main role in the multimedia content acquisition, such as server distribution imbalance, huge user visits and limited bandwidth. In order to withstand these problems, a new multimedia cloud content distribution method is proposed based upon the integrated user utility and interest discovery. Initially, the interest features of users are extracted through applying an important feature extraction method. Subsequently, the separation of service and non-service users are formed through the development of a group depending on the categorization of same service interest and adjacent region included users. Followed with this, the integrated utility value is adopted to introduce user evaluation strategies. The integrated utility values are computed combining different user experience characteristics such as, user reputation, user selfish behaviours and user physical performance. However, the service user number evaluated by employing the Opposition Grasshopper Optimizer (OGHA) has minimized the content distribution time and user cost. Furthermore, the convergence profile and computational speed of standard GHA is enhanced by introducing the notion of opposition based population initialization in the proposed approach. Simulation outcomes have evidently proved the improvement of multimedia cloud users, minimizing the total cost of multimedia cloud users, and improvement of multimedia content utilization.

79 citations


Journal ArticleDOI
TL;DR: General concepts of watermarking, major characteristics, recent applications, concepts of embedding and recovery process of watermarks, and the summary of various techniques are highlighted in brief.
Abstract: With the widespread growth of medical images and improved communication and computer technologies in recent years, authenticity of the images has been a serious issue for E-health applications. In order to this, various notable watermarking techniques are developed by potential researchers. However, those techniques are unable to solve many issues that are necessary to be measured in future investigations. This paper surveys various watermarking techniques in medical domain. Along with the survey, general concepts of watermarking, major characteristics, recent applications, concepts of embedding and recovery process of watermark, and the summary of various techniques (in tabular form) are highlighted in brief. Further, major issues associated with medical image watermarking are also discussed to find out research directions for fledgling researchers and developers.

Journal ArticleDOI
TL;DR: Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.
Abstract: Nowadays secure medical image watermarking had become a stringent task in telemedicine. This paper presents a novel medical image watermarking method by fuzzy based Region of Interest (ROI) selection and wavelet transformation approach to embed encrypted watermark. First, the source image will undergo fuzzification to determine the critical points through central and final intensity along the radial line for selecting region of interest (ROI). Second, watermark image is altered to time-frequency domain through wavelet decomposition where the sub-bands are swapped based on the magnitude value obtained through logistic mapping. In the each sub-band all the pixels get swapped, results in fully encrypted image which guarantees the watermark to a secure, reliable and an unbreakable form. In order to provide more robustness to watermark image, singular values are obtained for encrypted watermark image and key component is calculated for avoiding false positive error. Singular values of the source and watermark image are modified through key component. Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.

Journal ArticleDOI
TL;DR: Human high-level emotions are automatically well classified in the proposed CNN-based multimodal networks, even though a small amount of labeled data samples is available for training.
Abstract: Affective computing is an emerging area of research that aims to enable intelligent systems to recognize, feel, infer and interpret human emotions. The widely spread online and off-line music videos are one of the rich sources of human emotion analysis because it integrates the composer’s internal feeling through song lyrics, musical instruments performance and visual expression. In general, the metadata which music video customers to choose a product includes high-level semantics like emotion so that automatic emotion analysis might be necessary. In this research area, however, the lack of a labeled dataset is a major problem. Therefore, we first construct a balanced music video emotion dataset including diversity of territory, language, culture and musical instruments. We test this dataset over four unimodal and four multimodal convolutional neural networks (CNN) of music and video. First, we separately fine-tuned each pre-trained unimodal CNN and test the performance on unseen data. In addition, we train a 1-dimensional CNN-based music emotion classifier with raw waveform input. The comparative analysis of each unimodal classifier over various optimizers is made to find the best model that can be integrate into a multimodal structure. The best unimodal modality is integrated with corresponding music and video network features for multimodal classifier. The multimodal structure integrates whole music video features and makes final classification with the SoftMax classifier by a late feature fusion strategy. All possible multimodal structures are also combined into one predictive model to get the overall prediction. All the proposed multimodal structure uses cross-validation to overcome the data scarcity problem (overfitting) at the decision level. The evaluation results using various metrics show a boost in the performance of the multimodal architectures compared to each unimodal emotion classifier. The predictive model by integration of all multimodal structure achieves 88.56% in accuracy, 0.88 in f1-score, and 0.987 in area under the curve (AUC) score. The result suggests human high-level emotions are automatically well classified in the proposed CNN-based multimodal networks, even though a small amount of labeled data samples is available for training.

Journal ArticleDOI
TL;DR: The proposed CNN architecture takes the long contextual information into considerations, which transfers more suitable information for the decision-making layer, and develops a novel CNN architecture for music genre classification.
Abstract: Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification .

Journal ArticleDOI
TL;DR: This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR.
Abstract: Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.

Journal ArticleDOI
TL;DR: This paper applies deep learning-based convolutional neural networks (CNNs) for robust modeling of static signs in the context of sign language recognition and highlights the recognition accuracy of each character, and their similarities with identical gestures.
Abstract: Hand gesture is one of the most prominent ways of communication since the beginning of the human era. Hand gesture recognition extends human-computer interaction (HCI) more convenient and flexible. Therefore, it is important to identify each character correctly for calm and error-free HCI. Literature survey reveals that most of the existing hand gesture recognition (HGR) systems have considered only a few simple discriminating gestures for recognition performance. This paper applies deep learning-based convolutional neural networks (CNNs) for robust modeling of static signs in the context of sign language recognition. In this work, CNN is employed for HGR where both alphabets and numerals of ASL are considered simultaneously. The pros and cons of CNNs used for HGR are also highlighted. The CNN architecture is based on modified AlexNet and modified VGG16 models for classification. Modified pre-trained AlexNet and modified pre-trained VGG16 based architectures are used for feature extraction followed by a multiclass support vector machine (SVM) classifier. The results are evaluated based on different layer features for best recognition performance. To examine the accuracy of the HGR schemes, both the leave-one-subject-out and a random 70–30 form of cross-validation approach were adopted. This work also highlights the recognition accuracy of each character, and their similarities with identical gestures. The experiments are performed in a simple CPU system instead of high-end GPU systems to demonstrate the cost-effectiveness of this work. The proposed system has achieved a recognition accuracy of 99.82%, which is better than some of the state-of-art methods.

Journal ArticleDOI
TL;DR: A new 26-layered Convolutional Neural Network (CNN) architecture for accurate complex action recognition is designed and a feature selection method name Poisson distribution along with Univariate Measures (PDaUM) is proposed.
Abstract: Vision-based human action recognition (HAR) is a hot topic of research from the decade due to a few popular applications such as visual surveillance and robotics. For correct action recognition, various local and global points are requires known as features. These features modified during the variation in human movement. But due to a bit change in several human actions, the features of these actions are mixed that degrade the recognition performance. In this article, we design a new 26-layered Convolutional Neural Network (CNN) architecture for accurate complex action recognition. The features are extracted from the global average pooling layer and fully connected (FC) layer, and fused by a proposed high entropy-based approach. Further, we propose a feature selection method name Poisson distribution along with Univariate Measures (PDaUM). Few of fused CNN features are irrelevant, and few of them are redundant that makes the incorrect prediction among complex human actions. Therefore, the proposed PDaUM based approach selects only the strongest features that later passed to the Extreme Learning Machine (ELM) and Softmax for final recognition. Four datasets are using for experimental analysis - HMDB51 (51 classes), UCF Sports (10 classes), KTH (6 classes), and Weizmann (10 classes). On these datasets, the ELM classifier gives an improved performance as compared to a Softmax classifier. The achieved accuracy on each dataset is 81.4%, 99.2%, 98.3%, and 98.7%, respectively. Comparison with existing techniques, it is shown that the proposed architecture gives better performance in terms of accuracy and testing time.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a triple-attention guided residual dense and BiLSTM networks (TARDB-Net) to reduce redundant features while increasing feature fusion capabilities, which ultimately improves the ability to classify hyperspectral images.
Abstract: Each sample in the hyperspectral remote sensing image has high-dimensional features and contains rich spatial and spectral information, which greatly increases the difficulty of feature selection and mining. In view of these difficulties, we propose a novel Triple-attention Guided Residual Dense and BiLSTM networks(TARDB-Net) to reduce redundant features while increasing feature fusion capabilities, which ultimately improves the ability to classify hyperspectral images. First, a novel Triple-attention mechanism is proposed to assign different weights to each feature. Then, the residual network is used to perform the residual operation on the features, and the initial features of the multiple residual blocks and the generated deep residual features are intensively fused, retaining a host number of prior features. And use the bidirectional long short-term memory network to integrate the contextual semantics of deep fusion features. Finally, the classification task is completed by Softmax classifier. Experiments on three hyperspectral datasets—Indian Pines, University of Pavia, and Salinas—show that under 10% of the training samples, the overall accuracy of our method is 87%, 96% and 96%, which is superior to several well-known methods.

Journal ArticleDOI
TL;DR: Deep Learning Approach (DLA) has been widely used in medical imaging to detect the presence or absence of the disease as discussed by the authors, and most of the implementations concentrate on the X-ray images, computerized tomography, mammography images, and digital histopathology images.
Abstract: Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning Approach (DLA) in medical image analysis emerges as a fast-growing research field. DLA has been widely used in medical imaging to detect the presence or absence of the disease. This paper presents the development of artificial neural networks, comprehensive analysis of DLA, which delivers promising medical imaging applications. Most of the DLA implementations concentrate on the X-ray images, computerized tomography, mammography images, and digital histopathology images. It provides a systematic review of the articles for classification, detection, and segmentation of medical images based on DLA. This review guides the researchers to think of appropriate changes in medical image analysis based on DLA.

Journal ArticleDOI
TL;DR: A novel A-HPE method that intelligently identifies human behaviours by utilizing saliency silhouette detection, robust body parts model and multidimensional cues from full-body silhouettes followed by an entropy Markov model is proposed.
Abstract: Automated human posture estimation (A-HPE) systems need delicate methods for detecting body parts and selecting cues based on marker-less sensors to effectively recognize complex activity motions. Recognition of human activities using vision sensors is a challenging issue due to variations in illumination conditions and complex movements during the monitoring of sports and fitness exercises. In this paper, we propose a novel A-HPE method that intelligently identifies human behaviours by utilizing saliency silhouette detection, robust body parts model and multidimensional cues from full-body silhouettes followed by an entropy Markov model. Initially, images are pre-processed and noise is removed to obtain a robust silhouette. Body parts models are then used to extract twelve key body parts. These key body parts are further optimized to assist the generation of multidimensional cues. These cues include energy, optical flow and distinctive values that are fed into quadratic discriminant analysis to discriminate cues which help in the recognition of actions. Finally, these optimized patterns are further processed by a maximum entropy Markov model as a recognizer engine based on transition and emission probability values for activity recognition. For evaluation, we used a leave-one-out cross validation scheme and the results outperformed existing well-known statistical state-of-the-art methods by achieving better body parts detection and higher recognition accuracy over four benchmark datasets. The proposed method will be useful for man-machine interactions such as 3D interactive games, virtual reality, service robots, e-health fitness, and security surveillance.

Journal ArticleDOI
TL;DR: In this article, an active deep learning (ADL-CNN) model was proposed to automatically extract features compare to handcrafted-based features for diabetic retinopathy (DR) screening.
Abstract: Retinal fundus image analysis (RFIA) for diabetic retinopathy (DR) screening can be used to reduce the risk of blindness among diabetic patients. The RFIA screening programs help the ophthalmologists to cope with this paramount visual impairment problem. In this article, an automatic recognition of the DR stage is proposed based on a new multi-layer architecture of active deep learning (ADL). To develop the ADL system, we used the convolutional neural networks (CNN) model to automatically extract features compare to handcrafted-based features. However, the training of CNN procedure requires an immense size of labeled data that makes it almost difficult in the classification phase. As a result, a label-efficient CNN architecture is presented known as ADL-CNN by using one of the active learning methods known as an expected gradient length (EGL). This ADL-CNN model can be seen as a two-stage process. At first, the proposed ADL-CNN system selects both the most informative patches and images by using some ground truth labels of training samples to learn the simple to complex retinal features. Next, it provides useful masks for prognostication to assist clinical specialists for the important eye sample annotation and segment regions-of-interest within the retinograph image to grade five severity-levels of diabetic retinopathy. To test and evaluate the performance of ADL-CNN model, the EyePACS benchmark is utilized and compared with state-of-the-art methods. The statistical metrics are used such as sensitivity (SE), specificity (SP), F-measure and classification accuracy (ACC) to measure the effectiveness of ADL-CNN system. On 54,000 retinograph images, the ADL-CNN model achieved an average SE of 92.20%, SP of 95.10%, F-measure of 93% and ACC of 98%. Hence, the new ADL-CNN architecture is outperformed for detecting DR-related lesions and recognizing the five levels of severity of DR on a wide range of fundus images.

Journal ArticleDOI
TL;DR: The main objective is to improve the quality of service over a heterogeneous network by reinforcement learning-based multimedia data segregation (RLMDS) algorithm and Computing QoS in Medical Information system using Fuzzy (CQMISF) algorithm in fog computing.
Abstract: Fog computing is an emerging trend in the healthcare sector for the care of patients in emergencies. Fog computing provides better results in healthcare by improving the quality of services in the heterogeneous network. The transmission of critical multimedia healthcare data is required to be transferred in real-time for saving the lives of patients using better quality networks. The main objective is to improve the quality of service over a heterogeneous network by reinforcement learning-based multimedia data segregation (RLMDS) algorithm and Computing QoS in Medical Information system using Fuzzy (CQMISF) algorithm in fog computing. The proposed algorithms works in three phase’s such as classification of healthcare data, selection of optimal gateways for data transmission and improving the transmission quality with the consideration of parameters such as throughput, end-to-end delay and jitter. Proposed algorithms used to classify the healthcare data and transfer the classified high-risk data to end-user with by selecting the optimal gateway. To performance validation, extensive simulations were conducted on MATLAB R2018b on different parameters like throughput, end-to-end delay, and jitter. The performance of the proposed work is compared with FLQoS and AQCA algorithms. The proposed CQMISF algorithm achieves 81.7% overall accuracy and in comparison to FLQoS and AQCA algorithm, the proposed algorithms achieves the significant improvement of 6.195% and 2.01%.

Journal ArticleDOI
TL;DR: In this paper, a deep network model that uses ResNet-50 and global average pooling to resolve the vanishing gradient and overfitting problems was proposed, which achieved a mean accuracy of 97.08% and 97.48% respectively.
Abstract: A rapid increase in brain tumor cases mandates researchers for the automation of brain tumor detection and diagnosis. Multi-tumor brain image classification became a contemporary research task due to the diverse characteristics of tumors. Recently, deep neural networks are commonly used for medical image classification to assist neurologists. Vanishing gradient problem and overfitting are the demerits of the deep networks. In this paper, we have proposed a deep network model that uses ResNet-50 and global average pooling to resolve the vanishing gradient and overfitting problems. To evaluate the efficiency of the proposed model simulation has been carried out using a three-tumor brain magnetic resonance image dataset consisting of 3064 images. Key performance metrics have used to analyze the performance of the proposed model and its competitive models. We have achieved a mean accuracy of 97.08% and 97.48% with data augmentation and without data augmentation, respectively. Our proposed model outperforms existing models in classification accuracy.

Journal ArticleDOI
TL;DR: A more advanced successor of the Convolutional Neural Networks (CNNs) called 3-D CNNs is employed, which can recognize the patterns in volumetric data like videos like videos, which outperforms the existing state-of-art models in terms of precision, recall, and f-measure.
Abstract: The communication between a person from the impaired community with a person who does not understand sign language could be a tedious task. Sign language is the art of conveying messages using hand gestures. Recognition of dynamic hand gestures in American Sign Language (ASL) became a very important challenge that is still unresolved. In order to resolve the challenges of dynamic ASL recognition, a more advanced successor of the Convolutional Neural Networks (CNNs) called 3-D CNNs is employed, which can recognize the patterns in volumetric data like videos. The CNN is trained for classification of 100 words on Boston ASL (Lexicon Video Dataset) LVD dataset with more than 3300 English words signed by 6 different signers. 70% of the dataset is used for Training while the remaining 30% dataset is used for testing the model. The proposed work outperforms the existing state-of-art models in terms of precision (3.7%), recall (4.3%), and f-measure (3.9%). The computing time (0.19 seconds per frame) of the proposed work shows that the proposal may be used in real-time applications.

Journal ArticleDOI
TL;DR: From the results, the proposed WPART method proved to be most accurate in providing drought prediction as well as the productivity of crops like Bajra, Soybean, Jowar, and Sugarcane.
Abstract: This paper suggests an IoT based smart farming system along with an efficient prediction method called WPART based on machine learning techniques to predict crop productivity and drought for proficient decision support making in IoT based smart farming systems. The crop productivity and drought predictions is very important to the farmers and agriculture’s executives, which greatly help agriculture-affected countries around the world. Drought prediction plays a significant role in drought early warning to mitigate its impacts on crop productivity, drought prediction research aims to enhance our understanding of the physical mechanism of drought and improve predictability skill by taking full advantage of sources of predictability. In this work, an intelligent method based on the blend of a wrapper feature selection approach, and PART classification technique is proposed for crop productivity and drought predicting. Five datasets are used for estimating the proposed method. The results indicated that the projected method is robust, accurate, and precise to classify and predict crop productivity and drought in comparison with the existing techniques. From the results, the proposed method proved to be most accurate in providing drought prediction as well as the productivity of crops like Bajra, Soybean, Jowar, and Sugarcane. The WPART method attains the maximum accuracy compared to the existing supreme standard algorithms, it is obtained up to 92.51%, 96.77%, 98.04%, 96.12%, and 98.15% for the five datasets for drought classification, and crop productivity respectively. Likewise, the proposed method outperforms existing algorithms with precision, sensitivity, and F Score metrics.

Journal ArticleDOI
TL;DR: This study presented a novel method for the designing of 8 × 8 S-boxes with selected cryptographic characteristics based on a cuckoo search (CS) algorithm and discrete-space chaotic map and indicated that the S- boxes exhibited good cryptographic features and could resist various cryptanalysis attacks.
Abstract: Substitution-boxes (S-boxes) are unique nonlinear elements, which are used to achieve the property of confusion in modern symmetric ciphers and offer resistance to cryptanalysis. The construction of strong S-boxes has gained considerable attention in the area of cryptography. In fact, the security of transmitted data is highly dependent on the strength of the S-boxes for the prevention of unauthorised access. Therefore, the creation of strong S-box with high nonlinearity score has been considered a significant challenge. This study presented a novel method for the designing of 8 × 8 S-boxes with selected cryptographic characteristics based on a cuckoo search (CS) algorithm and discrete-space chaotic map. Notably, the advantage of the proposed approach is indicated through the efficient randomisation and lower adjustable parameters in CS compared to GA and PSO. Also, this approach utilised a 1D discrete-space chaotic map with virtually unlimited key space to design initial S-boxes, which is another advantage over the methods based on continuous-space chaotic maps, which consist of the limited key space. Moreover, chaotic maps have a potential to overcome the trapping problem of a standard CS in the local optima, and they were used to generate initial S-boxes to achieve the desired quality and facilitate the metaheuristic search. Accordingly, the metaheuristic CS was used to find a notable S-box configuration which fulfilled the established criteria. This objective was achieved by searching for the optimal or near-optimal features which maximised the given fitness function. The performance of the proposed method was evaluated based on the established performance evaluation criteria, including bijectivity, nonlinearity, strict avalanche criteria, bit independence criteria, differential uniformity, and linear probability. Based on the results of proposed method performance was benchmarked against the results of the recently developed S-boxes, it was indicated that the S-boxes exhibited good cryptographic features and could resist various cryptanalysis attacks.

Journal ArticleDOI
TL;DR: The challenges faced by applying the deep learning method to reconstruct 3D objects from a single image are discussed and the common datasets and evaluation metrics of single image 3D object reconstruction in recent years are introduced.
Abstract: The reconstruction of 3D object from a single image is an important task in the field of computer vision. In recent years, 3D reconstruction of single image using deep learning technology has achieved remarkable results. Traditional methods to reconstruct 3D object from a single image require prior knowledge and assumptions, and the reconstruction object is limited to a certain category or it is difficult to accomplish a good reconstruction from a real image. Although deep learning can solve these problems well with its own powerful learning ability, it also faces many problems. In this paper, we first discuss the challenges faced by applying the deep learning method to reconstruct 3D objects from a single image. Second, we comprehensively review encoders, decoders and training details used in 3D reconstruction of a single image. Then, the common datasets and evaluation metrics of single image 3D object reconstruction in recent years are introduced. In order to analyze the advantages and disadvantages of different 3D reconstruction methods, a series of experiments are used for comparison. In addition, we simply give some related application examples involving 3D reconstruction of a single image. Finally, we summarize this paper and discuss the future directions.

Journal ArticleDOI
TL;DR: The proposed scheme is found to be more useful, when compared with recently proposed schemes in term of features and usefulness.
Abstract: Image watermarking can provide ownership identification as well as tamper protection. Transform domain based image watermarking has been proven to be more robust than the spatial domain watermarking against different signal processing attacks. On the other hand, tamper detection is found to be working well in spatial domain. In the proposed work, the focus is on the improvement of the medical image watermarking by incorporating the concept of multiple watermarking of the host image. The principal components (PC) based insertion make the scheme secured towards ownership attack. On the other hand, LZW (Lempel–Ziv–Welch) based fragile watermarking is used to hide compressed image’s ROI (region of interest) to tackle the intentional tampering attacks. The ROI based watermark generation provides the complete reversibility of the ROI. In this way, proposed scheme provides perfect reversibility of ROI, good imperceptibility in addition to satisfactory robustness. The tamper handing ability of proposed scheme is also tested against various attacks, which turns out to be quite good. The proposed scheme is found to be more useful, when compared with recently proposed schemes in term of features and usefulness.

Journal ArticleDOI
TL;DR: In this article, the authors present a new implementation that dramatically improves the computation speed of the Image Source Method (ISM) by using Graphic Processing Units (GPUs) to parallelize both the simulation of multiple RIRs and the computation of the images inside each RIR.
Abstract: The Image Source Method (ISM) is one of the most employed techniques to calculate acoustic Room Impulse Responses (RIRs), however, its computational complexity grows fast with the reverberation time of the room and its computation time can be prohibitive for some applications where a huge number of RIRs are needed. In this paper, we present a new implementation that dramatically improves the computation speed of the ISM by using Graphic Processing Units (GPUs) to parallelize both the simulation of multiple RIRs and the computation of the images inside each RIR. Additional speedups were achieved by exploiting the mixed precision capabilities of the newer GPUs and by using lookup tables. We provide a Python library under GNU license that can be easily used without any knowledge about GPU programming and we show that it is about 100 times faster than other state of the art CPU libraries. It may become a powerful tool for many applications that need to perform a large number of acoustic simulations, such as training machine learning systems for audio signal processing, or for real-time room acoustics simulations for immersive multimedia systems, such as augmented or virtual reality.

Journal ArticleDOI
TL;DR: This paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples.
Abstract: The automatic image annotation is an effective computer operation that predicts the annotation of an unknown image by automatically learning potential relationships between the semantic concept space and the visual feature space in the annotation image dataset. Usually, the auto-labeling image includes the processing: learning processing and labeling processing. Existing image annotation methods that employ convolutional features of deep learning methods have a number of limitations, including complex training and high space/time expenses associated with the image annotation procedure. Accordingly, this paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples. Firstly, the convolutional result is directly output in the form of low-level visual features through the mid-level of the pre-trained deep learning model, with the image being represented by sparse coding. Secondly, the positive mean vector method is used to construct visual feature vectors for each text vocabulary item, so that a visual feature vector database is created. Finally, the visual feature vector similarity between the testing image and all text vocabulary is calculated, and the vocabulary with the largest similarity used for annotation. Experiments on the datasets demonstrate the effectiveness of the proposed method; in terms of F1 score, the proposed method’s performance on the Corel5k dataset and IAPR TC-12 dataset is superior to that of MBRM, JEC-AF, JEC-DF, and 2PKNN with end-to-end deep features.