Showing papers in "Iet Image Processing in 2020"
TL;DR: The proposed CapsNet based technique can achieve extraction of desired features from image data sets and provides tumor classification automatically with 92.65% accuracy.
Abstract: Visual evaluation of many magnetic resonance images is a difficult task. Therefore, computer-assisted brain tumor classification techniques have been proposed. These techniques have several drawbacks or limitations. Capsule based neural networks are new approaches that can preserve spatial relationships of learned features using dynamic routing algorithm. By this way, not only performance of tumor recognition increases but also sampling efficiency and generalisation capability improves. Therefore, in this work, a Capsule Network (CapsNet) is used to achieve fully automated classification of tumors from brain magnetic resonance images. In this work, prevalent three types of tumors (pituitary, glioma and meningioma) have been handled. The main contributions in this paper are as follows: 1) A comprehensive review on CapsNet based methods is presented. 2) A new CapsNet topology is designed by using a Sobolev gradient-based optimisation, expectation-maximisation based dynamic routing and tumor boundary information. 3) The network topology is applied to categorise three types of brain tumors. 4) Comparative evaluations of the results obtained by other methods are performed. According to the experimental results, the proposed CapsNet based technique can achieve extraction of desired features from image data sets and provides tumor classification automatically with 92.65% accuracy.
76 citations
TL;DR: A proposed cascade network configuration with three IoU thresholds was utilised in the process of model training, and the proposed algorithm demonstrated superior performance compared to that of the Mask R-CNN.
Abstract: Object detection is a crucial topic in computer vision. Mask Region-Convolution Neural Network (R-CNN) based methods, wherein a large intersection over union (IoU) threshold is chosen for high quality samples, have often been employed for object detection. However, the detection performance of such methods deteriorates when samples are reduced. To address this, the authors propose an improved Mask R-CNN-based method: the ResNet Group Cascade (RGC) Mask R-CNN. First, they compared ResNet with different layers, finding that ResNeXt-101-64 × 4d is superior to other backbone networks. Secondly, during the training of the test model, the performance of Mask R-CNN suffered from a small batch processing scale, resulting in inaccurately calculated mean and variance; thus, group normalisation was added to the backbone, feature pyramid network neck and bounding box head of the network. Finally, the higher the intersection of Mask R-CNN than the threshold, the easier it is to obtain high-quality samples. However, blindly selecting a high threshold leads to sample reduction and overfitting. Thus, a proposed cascade network configuration with three IoU thresholds was utilised in the process of model training. The model was trained and tested on the COCO and PASCAL VOC07 datasets. Their proposed algorithm demonstrated superior performance compared to that of the Mask R-CNN.
65 citations
TL;DR: Experiments show that the proposed network model can robustly track and recognise gestures under complex backgrounds (such as similar complexion, illumination changes, and occlusion), and compared with the single-channel model, the average detection accuracy is improved, and mean average precision is improved by 3.56%.
Abstract: With the rapid development of sensor technology and artificial intelligence, the video gesture recognition technology under the background of big data makes human–computer interaction more natural and flexible, bringing the richer interactive experience to teaching, on-board control, electronic games etc. To perform robust recognition under the conditions of illumination change, background clutter, rapid movement, and partial occlusion, an algorithm based on multi-level feature fusion of two-stream convolutional neural network is proposed, which includes three main steps. Firstly, the Kinect sensor obtains red–green–blue-depth (RGB-D) images to establish a gesture database. At the same time, data enhancement is performed on the training set and test set. Then, a model of multi-level feature fusion of a two-stream convolutional neural network is established and trained. Experiments show that the proposed network model can robustly track and recognise gestures under complex backgrounds (such as similar complexion, illumination changes, and occlusion), and compared with the single-channel model, the average detection accuracy is improved by 1.08%, and mean average precision is improved by 3.56%.
61 citations
TL;DR: A novel image encryption scheme based on dynamic DNA sequences encryption and improved 2D logistic sine chaotic map (2D-LSMM) is presented, which not only achieves proper encryption but can also resist different attacks.
Abstract: The one-dimensional (2D) chaotic encryption algorithm has good encryption performance. For its properties, such as the excellent complexity, pseudo-randomness, and sensitivity to the initial value of the chaotic sequence. However, compared with other methods, its biggest drawback is that the key space is too small. To address these problems, in this study, the authors introduce an improved 2D logistic sine chaotic map (2D-LSMM). A novel image encryption scheme based on dynamic DNA sequences encryption and improved 2D-LSMM is presented. The logistic map is used to control the input of the sine map. And the encoding and operation rules of DNA sequences are determined by 2D-LSMM chaotic sequences. By implementing dynamic DNA sequence encryption, the encryption process becomes more complicated and harder to be attacked. Simulation experimental results and security analysis show that the authors’ encryption scheme not only achieves proper encryption but can also resist different attacks.
57 citations
TL;DR: A strength Pareto evolutionary algorithm-II based meta-heuristic approach is proposed to tune the hyper-parameters of the four-dimensional chaotic map and is implemented in a parallel fashion to enhance the computational speed.
Abstract: In recent years, many image encryption approaches have been proposed on the basis of chaotic maps. The various types of chaotic maps such as one-dimensional and multi-dimensional have been used to generate the secret keys. Chaotic maps require some parameters and value assignment to these parameters is very crucial. Because, poor value assignments may make the chaotic map un-chaotic. Therefore, hyper-parameter tuning of chaotic maps is required. Recently, meta-heuristic based image encryption approaches have been designed by researchers to resolve this issue. However, the majority of the techniques suffer from poor computational speed and stuck in local optima problems. Therefore, in this study, a strength Pareto evolutionary algorithm-II based meta-heuristic approach is proposed to tune the hyper-parameters of the four-dimensional chaotic map. The proposed approach is also implemented in a parallel fashion to enhance the computational speed. The effectiveness of the proposed approach is evaluated through extensive experiments. Comparative analyses show that the proposed approach outperforms the competitive approaches in terms of entropy, NPCR, UACI, and PSNR by 0.9834 , 1.0728 , 0.9134 , and 0.8971 % , respectively.
55 citations
TL;DR: The authors propose a bilinear fusion mechanism over different types of squeeze operation such as global pooling and max pooling, which confirms the superiority of the proposed FuSENet method with respect to the state-of-the-art methods.
Abstract: Deep learning-based approaches have become very prominent in recent years due to its outstanding performance as compared to the hand-extracted feature-based methods. Convolutional neural network (CNN) is a type of deep learning architecture to deal with the image/video data. Residual network and squeeze and excitation network (SENet) are among recent developments in CNN for image classification. However, the performance of SENet depends on the squeeze operation done by global pooling, which sometimes may lead to poor performance. In this study, the authors propose a bilinear fusion mechanism over different types of squeeze operation such as global pooling and max pooling. The excitation operation is performed using the fused output of squeeze operation. They used to model the proposed fused SENet with the residual unit and name it as FuSENet. Here the classification experiments are performed over benchmark hyperspectral image datasets. The experimental results confirm the superiority of the proposed FuSENet method with respect to the state-of-the-art methods. The source code of the complete system is made publicly available at https://github.com/swalpa/FuSENet.
54 citations
TL;DR: Aiming at a complex indoor environment, this study designs an image semantic segmentation network framework of joint target detection and innovatively implements multi-vision task combining object classification, detection and semantic segmentations.
Abstract: Image semantic segmentation has always been a research hotspot in the field of robots. Its purpose is to assign different semantic category labels to objects by segmenting different objects. However, in practical applications, in addition to knowing the semantic category information of objects, robots also need to know the position information of objects to complete more complex visual tasks. Aiming at a complex indoor environment, this study designs an image semantic segmentation network framework of joint target detection. Using the parallel operation of adding semantic segmentation branches to the target detection network, it innovatively implements multi-vision task combining object classification, detection and semantic segmentation. By designing a new loss function, adjusting the training using the idea of transfer learning, and finally verifying it on the self-built indoor scene data set, the experiment proves that the method in this study is feasible and effective, and has good robustness.
53 citations
TL;DR: An adaptive frequency median filter (AFMF) is proposed to remove the salt and pepper noise and denoises more effectively than other state-of-the-art denoising methods.
Abstract: In this article, the authors propose an adaptive frequency median filter (AFMF) to remove the salt and pepper noise. AFMF uses the same adaptive condition of adaptive median filter (AMF). However, AFMF employs frequency median to restore grey values of the corrupted pixels instead of the median of AMF. The frequency median can exclude noisy pixels from evaluating a grey value of the centre pixel of the considered window, and it focuses on the uniqueness of grey values. Hence, the frequency median produces a grey value closer to the original grey value than the one by the median of AMF. Therefore, AFMF outperforms AMF. In experiments, the authors tested the proposed method on a variety of natural images of the MATLAB library, as well as the TESTIMAGES data set. Additionally, they also compared the denoising results of AFMF to the ones of other state-of-the-art denoising methods. The results showed that AFMF denoises more effectively than other methods.
50 citations
TL;DR: The proposed colour image encryption scheme innately uses Deoxyribo Nucleic Acid coding that blends well with chaotic cryptosystem for an efficient statistical shift and yields near-zero correlation and good entropy.
Abstract: The evergrowing virtualised information technology infrastructure is powered by cloud-centric technology around the world Cloud-based multimedia storage has become an essential aspect for users and business behemoths However, as per a survey of Norton, around 3800 breaches have been publicly disclosed with 41 billion numbers of records exposed in 2019, which is a 54% rise when compared to 2018 So the data security is a widely quoted barrier for cloud storage Ciphering the confidential images before transmission and subsequent storage in cloud database needs critical attention for techno-specific applications In image encryption, chaos-based keys can do better confusion, but the diffusion process using XOR is vulnerable to chosen plain text attack The proposed colour image encryption scheme innately uses Deoxyribo Nucleic Acid coding that blends well with chaotic cryptosystem for an efficient statistical shift The ciphered images are stored in authenticated and authorised cloud storage facilities The experimentation is carried out with the help of Amazon Web Services storage instances The proposed image encryption scheme offers a strong resistance towards the brute force, occlusion, statistical and differential attacks and yields near-zero correlation and good entropy
43 citations
TL;DR: Two ring oscillator (RO) based TRNG structures adopting identical and non-identical ring of inverters have alone been employed for confusion (scrambling) and diffusion (intensity variation) processes for encrypting the greyscale and RGB images.
Abstract: The utility of true random number generators (TRNGs) is not only restricted to session key generation, nonce generation, OTP generation etc. in cryptography. In the proposed work, two ring oscillator (RO) based TRNG structures adopting identical and non-identical ring of inverters have alone been employed for confusion (scrambling) and diffusion (intensity variation) processes for encrypting the greyscale and RGB images. Cyclone IVE EP4CE115F29C7 FPGA was utilised to generate a couple of random synthetic images using the two RO architectures which took a maximum of 520 combinational units and 543 logic registers. The suggested scheme of image encryption was tested on 100 test greyscale images of size 256 × 256. This non-chaos influenced image ciphering has resulted in an approximate average entropy of 7.99 and near-zero correlation figures for the greyscale & RGB cipher images. The attack resistance capability was checked by performing various occlusion and noise attacks on encrypted images.
41 citations
TL;DR: The authors present a novel highly parameter efficient method called DenseUNet, which is inspired by the approach that takes particular advantage of recent advances in both UNet and DenseNet and can achieve state-of-the-art results on EM image segmentation without any further post-processing module or pre-training.
Abstract: Electron microscopy (EM) image segmentation plays an important role in computer-aided diagnosis of specific pathogens or disease. However, EM image segmentation is a laborious task and needs to impose experts knowledge, which can take up valuable time from research. Convolutional neural network (CNN)-based methods have been proposed for EM image segmentation and achieved considerable progress. Among those CNN-based methods, UNet is regarded as the state-of-the-art method. However, the UNet usually has millions of parameters to increase training difficulty and is limited by the issue of vanishing gradients. To address those problems, the authors present a novel highly parameter efficient method called DenseUNet, which is inspired by the approach that takes particular advantage of recent advances in both UNet and DenseNet. In addition, they successfully apply the weighted loss, which enables us to boost the performance of segmentation. They conduct several comparative experiments on the ISBI 2012 EM dataset. The experimental results show that their method can achieve state-of-the-art results on EM image segmentation without any further post-processing module or pre-training. Moreover, due to smart design of the model, their approach has much less parameters than currently published encoder-decoder architecture variants for this dataset.
TL;DR: This study proposes a novel pavement crack detection method based on an end-to-end trainable deep convolution neural network and introduces a spatial-channel combinational attention module into the encoder–decoder network for refining crack features.
Abstract: Automatic detection of pavement crack is an important task for conducting road maintenance. However, as an important part of the intelligent transportation system, automatic pavement crack detection is challenging due to the poor continuity of cracks, the different width of cracks, and the low contrast between cracks and the surrounding pavement. This study proposes a novel pavement crack detection method based on an end-to-end trainable deep convolution neural network. The authors build the network using the encoder–decoder architecture and adopt a pyramid module to exploit global context information for the complex topology structures of cracks. Moreover, they introduce a spatial-channel combinational attention module into the encoder–decoder network for refining crack features. Further, the dilated convolution is used to reduce the loss of crack details due to the pooling operation in the encoder network. In addition, they introduce a lovasz hinge loss function, which is suitable for small objects. They train the authors' network on the CRACK500 dataset and evaluate it on three pavement crack datasets. Among the methods they compare, their method can achieve the best experimental results.
TL;DR: In this paper, a deep neural network is proposed to solve the problem of image super-resolution reconstruction by constructing a deep-level network for end-to-end training, which can be divided into four types: interpolation-based preprocessing-based model, original image processing based model, hierarchical feature based model and high-frequency detail-based network model.
Abstract: Image super-resolution reconstruction refers to a technique of recovering a high-resolution (HR) image (or multiple images) from a low-resolution (LR) degraded image (or multiple images). Due to the breakthrough progress in deep learning in other computer vision tasks, people try to introduce deep neural network and solve the problem of image super-resolution reconstruction by constructing a deep-level network for end-to-end training. The currently used deep learning models can divide the SISR model into four types: interpolation-based preprocessing-based model, original image processing based model, hierarchical feature-based model, and high-frequency detail-based model, or shared the network model. The current challenges for super-resolution reconstruction are mainly reflected in the actual application process, such as encountering an unknown scaling factor, losing paired LR–HR images, and so on.
TL;DR: This study has designed a new two-dimensional (2D) chaotic system which is derived from two existing one-dimensional chaotic maps, and designs a novel image encryption algorithm which is simulated and experimental results show that the validity and reliability of the proposed algorithm is validated.
Abstract: The authors know that a common and effective way to protect digital images security are to encrypt these images into white noise image. In this study, the authors have designed a new two-dimensional (2D) chaotic system which is derived from two existing one-dimensional (1D) chaotic maps. The simulation results show that the new 2D chaotic system is able to produce many 2D chaotic maps by selecting different 1D chaotic maps, and which have the wider chaotic ranges and more complex chaotic behaviours compared with the existing 1D chaotic maps. In order to investigate its applications, using the proposed 2D chaotic maps, the authors design a novel image encryption algorithm. First of all, the original image is scrambled by using the chaotic sequences which are generated by new 2D chaotic maps, Arnold transform and Hilbert curve. Then the scrambled image is confused and diffused by chaotic sequences. Finally, the performance of the proposed encryption algorithm is simulated and the experimental results show that the validity and reliability of the proposed algorithm is validated.
TL;DR: A summary of prevalent techniques and methodologies used for the detection, quantification and classification of diseases is presented to understand the scope of improvement and pay attention to critical gaps that exist in available approaches and enhance them for the early prediction of diseases.
Abstract: There is intense pressure on agricultural productivity due to the ever-growing population. Several diseases affect crop yield and thus, effective control of these can significantly improve the production of food for all. In this regard, detection of diseases at an early stage and quantification of the severity, in general, has acquired urgent attention of the researchers. In this study, a summary of prevalent techniques and methodologies used for the detection, quantification and classification of diseases is presented to understand the scope of improvement. The study pays attention to critical gaps that exist in available approaches and enhance them for the early prediction of diseases. Diseases affect almost all parts of plants, e.g. root, stem, flower, leaf; a manifestation in different ways for different parts of the plant of the same disease presents a challenge for researchers. This study extends the review work published by JGA Barbedo in 2013, as there have been significant advances and numerous new techniques introduced since then. A novel approach of classifying and categorisation of the existing techniques based on pathogen types is a significant contribution by the authors in this study.
TL;DR: This work presents an ensemble framework Deep3DSCan for lung cancer segmentation and classification, which achieves significant improvement over the template matching technique, which had achieved an accuracy of 0.927.
Abstract: With the increasing incidence rate of lung cancer patients, early diagnosis could help in reducing the mortality rate. However, accurate recognition of cancerous lesions is immensely challenging owing to factors such as low contrast variation, heterogeneity and visual similarity between benign and malignant nodules. Deep learning techniques have been very effective in performing natural image segmentation with robustness to previously unseen situations, reasonable scale invariance and the ability to detect even minute differences. However, they usually fail to learn domain-specific features due to the limited amount of available data and domain agnostic nature of these techniques. This work presents an ensemble framework Deep3DSCan for lung cancer segmentation and classification. The deep 3D segmentation network generates the 3D volume of interest from computed tomography scans of patients. The deep features and handcrafted descriptors are extracted using a fine-tuned residual network and morphological techniques, respectively. Finally, the fused features are used for cancer classification. The experiments were conducted on the publicly available LUNA16 dataset. For the segmentation, the authors achieved an accuracy of 0.927, significant improvement over the template matching technique, which had achieved an accuracy of 0.927. For the detection, previous state-of-the-art is 0.866, while ours is 0.883.
TL;DR: The simulation results indicate that the proposed enhanced quadratic map has the properties of large key space, a weaker correlation between neighbouring pixels, higher sensitivity towards key, greater randomness of pixels and the capacity to withstand statistical analysis, plaintext/chosen-plaintext attacks, and differential attacks, thus that it has higher security and can be appropriate for image encryption.
Abstract: In this study, an enhanced quadratic map (EQM) is proposed and has been applied in a new colour image encryption scheme. The performance evaluations show that the EQM has excellent performances such as better Lyapunov exponent and larger chaotic ranges when compared with the classical quadratic map. The sequences generated from this EQM are successfully used in a new proposed colour image encryption scheme with excellent confusion and diffusion properties. The encryption structure is based on the permutation–diffusion process, and then adopted on the classical permutation, it is characterised by a high speed of diffusion, which enables the encryption of the three components of the plaintext image at the same time, and these encrypted components are simultaneously related to each other. The proposed scheme is tested on the USC-SIPI image dataset and on the real-life image dataset; its effectiveness is also compared with five latterly proposed image encryption schemes. The simulation results indicate that the proposed scheme has the properties of large key space, a weaker correlation between neighbouring pixels, higher sensitivity towards key, greater randomness of pixels and the capacity to withstand statistical analysis, plaintext/chosen-plaintext attacks, and differential attacks, thus that it has higher security and can be appropriate for image encryption.
TL;DR: A simple and computationally efficient low light image enhancement framework is presented that is derived from the sigmoid activation function of neural networks and combined with a Laplacian operator, colour and contrast-enhanced images are obtained.
Abstract: Low light image enhancement algorithms intent to produce visually pleasant images and target to extract valuable information for computer vision applications. The task of improving the quality of low light images is a challenging one. The existing methods for quality improvement undeniably annoy the visual aesthetics and suffer the major drawback of high computational complexity and less efficiency. To improve the visual quality and lower the distortions, a simple and computationally efficient low light image enhancement framework is presented in this study. To achieve this, an adaptive sigmoid transfer function (ASTF) is used and is derived from the sigmoid activation function of neural networks. By combining ASTF with a Laplacian operator, colour and contrast-enhanced images are obtained. Experiments show the effectiveness of the proposed method with state-of-the-art methods.
TL;DR: In this article, a novel approach based on deep learning for diagnosis of Parkinson's disease through medical imaging is presented, which includes analysis and use of the knowledge extracted by deep convolutional and recurrent neural networks when trained with medical images, such as magnetic resonance images and dopamine transporters scans.
Abstract: The study presents a novel approach, based on deep learning, for diagnosis of Parkinson's disease through medical imaging. The approach includes analysis and use of the knowledge extracted by deep convolutional and recurrent neural networks when trained with medical images, such as magnetic resonance images and dopamine transporters scans. Internal representations of the trained DNNs constitute the extracted knowledge which is used in a transfer learning and domain adaptation manner, so as to create a unified framework for prediction of Parkinson's across different medical environments. A large experimental study is presented illustrating the ability of the proposed approach to effectively predict Parkinson's, using different medical image sets from real environments.
TL;DR: A normalised gamma transformation-based contrast-limited adaptive histogram equalisation (CLAHE) with colour correction in Lab colour space for sand-dust image enhancement is proposed in this study and can obtain the highest percentage of new visible edges for all testing images.
Abstract: Images captured in the sand-dust weather often suffer from serious colour cast and poor contrast, and this has serious implications for outdoor computer vision systems. To address these problems, a normalised gamma transformation-based contrast-limited adaptive histogram equalisation (CLAHE) with colour correction in Lab colour space for sand-dust image enhancement is proposed in this study. This method consists of image contrast enhancement and image colour correction. To avoid producing new colour deviation, the input sand-dust images are first transformed from red, green, and blue colour space into Lab colour space. Then, the contrast of the lightness component (L channel) of the sand-dust image is enhanced using CLAHE. To avoid unbalanced contrast, as well as to reduce the overincreased brightness caused by CLAHE, a normalised gamma correction function is introduced to CLAHE. After that, the a and b chromatic components are recovered by a grey-world-based colour correction method. Experiments on real sand-dust images demonstrate that the proposed method can obtain the highest percentage of new visible edges for all testing images. The contrast restoration exhibits good colour fidelity and proper brightness.
TL;DR: A novel deep learning framework which combines convolutional neural network (CNN) with long short-term memory (LSTM) cell for real-time FER and achieves comparable and better results to the state-of-the-art.
Abstract: Deep learning has received increasing attention in all fields and has made considerable progress in facial expression recognition (FER). Mainly, the conventional FER methods are trained for constrained datasets, which may not operate well for real-time images. Such real-time image sequences limit the accuracy and efficacy of the traditional system. In this work, the authors present a novel deep learning framework which combines convolutional neural network (CNN) with long short-term memory (LSTM) cell for real-time FER. The novel framework has three main aspects: (i) two different pre-processing techniques are employed to handle illumination variances and to preserve subtle edge information of each image; (ii) correspondingly, the pre-processed images are inputted to the two individual CNN architecture which extracts the spatial features very effectively; (iii) spatial feature maps from two individual CNN layers are fused and integrated with an LSTM layer which extracts temporal relations between the successive frames. They experimented the authors’ proposed method on three publically available FER databases and also with self-created database. With pre-processing, their proposed model achieves comparable and better results to the state-of-the-art.
TL;DR: This study reviews different techniques and applications used around the world for vehicle detection under various environmental conditions based on video processing systems and highlights the problems encountered during surveillance under extreme weather conditions.
Abstract: Developing an intelligent transportation system has attracted a lot of attention in the recent past. Moreover, with the growing number of vehicles on the road most nations are adopting an intelligent transport system (ITS) for handling issues like traffic flow density, queue length, the average speed of the traffic, and total vehicles passing through a point in a specific time interval and so on. ITS by capturing traffic images and videos through cameras, helps the traffic control centres in monitoring and managing the traffic. Efficient and unfailing vehicle detection is a crucial step for the ITS. This study reviews different techniques and applications used around the world for vehicle detection under various environmental conditions based on video processing systems. This study also discusses the types of cameras used for vehicle detections, and the classification of vehicles for traffic monitoring and controlling. This study finally highlights the problems encountered during surveillance under extreme weather conditions.
TL;DR: A Computer-Aided Detection system based on convolutional neural network (CNN) that uses the concept of deep learning to classify the mammogram images into benign, malignant and normal and demonstrates the feasibility of using CNNs on medical image processing techniques for the classification of breast masses.
Abstract: A mammogram is an image of a breast used to detect and diagnose breast cancer. This paper emphases a Computer-Aided Detection system based on convolutional neural network (CNN) that uses the concept of deep learning to classify the mammogram images into benign, malignant and normal. The proposed CNN model consists of eight convolutional, four max-pooling and two fully connected layers and achieved better results compared to the pre-trained nets, AlexNet and VGG16. The proposed model demonstrates the feasibility of using CNNs on medical image processing techniques for the classification of breast masses. The results are also compared with the state-of-the-art machine learning algorithm like kNN classifier. Experimentation is done with three datasets. Among them, two are publicly available, Mammographic Image Analysis Society (MIAS), digital database for screening mammography (DDSM) and an internally collected dataset. The proposed model achieved accuracies of 92.54, 96.47 and 95 and the Area under the ROC curve (AUC) score of 0.85, 0.96 and 0.94 for MIAS, DDSM and the internally collected dataset respectively. Furthermore, the images of the three datasets are merged to build one large set and used to fine tune the proposed CNN model and produced accuracy of 98.32 and AUC of 0.98.
TL;DR: Visual Sentiment Analysis aims to understand how images affect people, in terms of evoked emotions as mentioned in this paper, by considering different levels of granularity, as well as the components that can affect the sentiment toward an image in different ways.
Abstract: Visual Sentiment Analysis aims to understand how images affect people, in terms of evoked emotions. Although this field is rather new, a broad range of techniques have been developed for various data sources and problems, resulting in a large body of research. This paper reviews pertinent publications and tries to present an exhaustive overview of the field. After a description of the task and the related applications, the subject is tackled under different main headings. The paper also describes principles of design of general Visual Sentiment Analysis systems from three main points of view: emotional models, dataset definition, feature design. A formalization of the problem is discussed, considering different levels of granularity, as well as the components that can affect the sentiment toward an image in different ways. To this aim, this paper considers a structured formalization of the problem which is usually used for the analysis of text, and discusses it's suitability in the context of Visual Sentiment Analysis. The paper also includes a description of new challenges, the evaluation from the viewpoint of progress toward more sophisticated systems and related practical applications, as well as a summary of the insights resulting from this study.
TL;DR: It is concluded that the Faster R-CNN Inception V2 model is highly suitable for real-time hand gesture recognition system under unconstrained environments.
Abstract: The most effective and accurate deep convolutional neural network (faster region-based convolutional neural network (Faster R-CNN) Inception V2 model, single shot detector (SSD) Inception V2 model) based architectures for real-time hand gesture recognition is proposed. The proposed models are tested on standard data sets (NUS hand posture data set-II, Senz-3D) and custom-developed (MITI hand data set (MITI-HD)) data set. The performance metrics are analysed for intersection over union (IoU) ranges between 0.5 and 0.95. IoU value of 0.5 resulted in higher precision compared to other IoU values considered (0.5:0.95, 0.75). It is observed that the Faster R-CNN Inception V2 model resulted in higher precision (0.990 for AP
all
, IoU = 0.5) compared to SSD Inception V2 model (0.984 for all
) for MITI-HD 160. The computation time of Faster R-CNN Inception V2 is higher compared to SSD Inception V2 model and also resulted in less number of mispredictions. Increasing the size of samples (MITI-HD 300) resulted in improvement of AP
all
= 0.991. Improvement in large (APlarge) and medium (APmedium) size detections are not significant when compared to small (APsmall) detections. It is concluded that the Faster R-CNN Inception V2 model is highly suitable for real-time hand gesture recognition system under unconstrained environments.
TL;DR: A technique for classifying the severity levels of glioma tumour using a novel segmentation algorithm, named DeepJoint segmentation and the multi-classifier, and the results depicted that the proposed method produces a maximum accuracy of 96%, which indicates its superiority.
Abstract: Brain tumour segmentation is the process of separating the tumour from normal brain tissues. A glioma is a kind of tumour, which fires up in the glial cells of the spine or the brain. This study introduces a technique for classifying the severity levels of glioma tumour using a novel segmentation algorithm, named DeepJoint segmentation and the multi-classifier. Initially, the brain images are subjected to pre-processing and the region of interest is extracted. Then, the segmentation of the pre-processed image is done using the proposed DeepJoint segmentation, which is developed through the iterative procedure of joining the grid segments. After the segmentation, feature extraction is carried out from core and oedema tumours using information-theoretic measures. Finally, the classification is done by the deep convolutional neural network (DCNN), which is trained by an optimisation algorithm, named fractional Jaya whale optimiser (FJWO). FJWO is developed by integrating the whale optimisation algorithm in fractional Jaya optimiser. The performance of the proposed FJWO–DCNN with the DeepJoint segmentation method is analysed using accuracy, true positive rate, specificity, and sensitivity. The results depicted that the proposed method produces a maximum accuracy of 96%, which indicates its superiority.
TL;DR: Wang et al. as discussed by the authors proposed a new gesture recognition architecture, which combines feature fusion network with variant convolutional long short-term memory (ConvLSTM), which extracts spatio-temporal feature information from local, global and deep aspects, and combined feature fusion to alleviate the loss of feature information.
Abstract: Gesture is a natural form of human communication, and it is of great significance in human–computer interaction. In the dynamic gesture recognition method based on deep learning, the key is to obtain comprehensive gesture feature information. Aiming at the problem of inadequate extraction of spatiotemporal features or loss of feature information in current dynamic gesture recognition, a new gesture recognition architecture is proposed, which combines feature fusion network with variant convolutional long short-term memory (ConvLSTM). The architecture extracts spatiotemporal feature information from local, global and deep aspects, and combines feature fusion to alleviate the loss of feature information. Firstly, local spatiotemporal feature information is extracted from video sequence by 3D residual network based on channel feature fusion. Then the authors use the variant ConvLSTM to learn the global spatiotemporal information of dynamic gesture, and introduce the attention mechanism to change the gate structure of ConvLSTM. Finally, a multi-feature fusion depthwise separable network is used to learn higher-level features including depth feature information. The proposed approach obtains very competitive performance on the Jester dataset with the classification accuracies of 95.59%, achieving state-of-the-art performance with 99.65% accuracy on the SKIG (Sheffifield Kinect Gesture) dataset.
TL;DR: The proposed approach has adopted Chua diode circuit for the generation of 256 × 256 random synthetic image which has been utilised to scramble the DICOM image for the second level of diffusion as the cellular automata provides the first level of scrambling and diffusion.
Abstract: Medical image security relies upon the arrival of improved techniques for enhancing the confusion as well as diffusion processes in the pixel level manipulations. Non-linear circuits constructed with hardware components can be employed for hardware–software co-design in medical image encryption. The proposed approach has adopted Chua diode circuit for the generation of 256 × 256 random synthetic image with the entropy of 7.9972. This synthetic image has been utilised to scramble the DICOM image for the second level of diffusion as the cellular automata provides the first level of scrambling and diffusion. The average correlation coefficients of encrypted pixels are 0.00204, −0.00298 and −0.00054 in three directions. The encrypted pixels pass the NIST SP 800-22 test suite and offer resistance to statistical, differential and chosen plain text attacks.
TL;DR: The authors propose a novel framework that ensembles three state-of-the-art deep convolutional neural networks (DCNNs) with multi-modality images for AD classification and show that the proposed algorithm has better performance compared to existing method.
Abstract: Alzheimer's disease (AD) is one of the most common progressive neurodegenerative diseases. Structural magnetic resonance imaging (MRI) would provide abundant information on the anatomical structure of human organs. Fluorodeoxy-glucose positron emission tomography (PET) obtains the metabolic activity of the brain. Previous studies have demonstrated that multi-modality images could contribute to improve diagnosis of AD. However, these methods need to extract the handcrafted features that demand domain specific knowledge and image processing stage is time consuming. In order to tackle these problems, in this study, the authors propose a novel framework that ensembles three state-of-the-art deep convolutional neural networks (DCNNs) with multi-modality images for AD classification. In detail, they extract some slices from each subject of each modality, and every DCNN generates a probabilistic score for the input slices. Furthermore, a `dropout' mechanism is introduced to discard low discrimination slices of the category probabilities. Then average reserved slices of each subject are acquired as a new feature. Finally, they train the Adaboost ensemble classifier based on single decision tree classifier with the MRI and PET probabilistic scores of each DCNN. Evaluations on Alzheimer's Disease Neuroimaging Initiative database show that the proposed algorithm has better performance compared to existing method, the algorithm proposed in this study significantly improved the classification accuracy.
TL;DR: The proposed deep detector classifier employs and validates DeepSphere, which aims mainly to identify the anomalous cases in the spatial and temporal context in order to perform foreground objects segmentation, and takes advantage of the power of generative models to classify the segmented objects.
Abstract: In this study, the authors present a new approach to segment and classify moving objects in video sequences by combining an unsupervised anomaly discovery framework called DeepSphere and generative adversarial networks. The proposed deep detector classifier employs and validates DeepSphere, which aims mainly to identify the anomalous cases in the spatial and temporal context in order to perform foreground objects segmentation. For post-processing, some morphological operations are considered to better segment and extract the desired objects. Finally, they take advantage of the power of generative models, which recognise the problem of semi-supervised learning as a specific missing data imputation task in order to classify the segmented objects. They evaluate the method with multiple datasets and the results confirm the effectiveness of the proposed approach, which achieves superior performance over the state-of-the-art methods having the capabilities of segmenting and classifying moving objects from videos surveillance.