Showing papers in "Pattern Recognition in 2018"
TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.
Abstract: We give an overview of the basic components of CNN.We discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.We introduce the applications of CNN on various tasks, including image classification, object detection, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing.We discuss the challenges in CNN and give several future research directions. In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.
3,125 citations
TL;DR: The background of deep visual tracking is introduced, including the fundamental concepts of visual tracking and related deep learning algorithms, and the existing deep-learning-based trackers are categorize into three classes according to network structure, network function and network training.
Abstract: Recently, deep learning has achieved great success in visual tracking. The goal of this paper is to review the state-of-the-art tracking methods based on deep learning. First, we introduce the background of deep visual tracking, including the fundamental concepts of visual tracking and related deep learning algorithms. Second, we categorize the existing deep-learning-based trackers into three classes according to network structure, network function and network training. For each categorize, we explain its analysis of the network perspective and analyze papers in different categories. Then, we conduct extensive experiments to compare the representative methods on the popular OTB-100, TC-128 and VOT2015 benchmarks. Based on our observations, we conclude that: (1) The usage of the convolutional neural network (CNN) model could significantly improve the tracking performance. (2) The trackers using the convolutional neural network (CNN) model to distinguish the tracked object from its surrounding background could get more accurate results, while using the CNN model for template matching is usually faster. (3) The trackers with deep features perform much better than those with low-level hand-crafted features. (4) Deep features from different convolutional layers have different characteristics and the effective combination of them usually results in a more robust tracker. (5) The deep visual trackers using end-to-end networks usually perform better than the trackers merely using feature extraction networks. (6) For visual tracking, the most suitable network training method is to per-train networks with video information and online fine-tune them with subsequent observations. Finally, we summarize our manuscript and highlight our insights, and point out the further trends for deep visual tracking.
473 citations
TL;DR: This paper proposes a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN, and demonstrates that the method is complementary with other existing methods and may further improve model performance.
Abstract: Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain. Recent study (Tommasi et al., 2015) shows that a DNN has strong dependency towards the training dataset, and the learned features cannot be easily transferred to a different but relevant task without fine-tuning. In this paper, we propose a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN. By modulating the statistics from the source domain to the target domain in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks. In contrary to other deep learning domain adaptation methods, our method does not require additional components, and is parameter-free. It archives state-of-the-art performance despite its surprising simplicity. Furthermore, we demonstrate that our method is complementary with other existing methods. Combining AdaBN with existing domain adaptation treatments may further improve model performance.
453 citations
TL;DR: The survey provides an overview on deep learning and the popular architectures used for cancer detection and diagnosis and presents four popular deep learning architectures, including convolutional neural networks, fully Convolutional networks, auto-encoders, and deep belief networks in the survey.
Abstract: In this paper, we aim to provide a survey on the applications of deep learning for cancer detection and diagnosis and hope to provide an overview of the progress in this field. In the survey, we firstly provide an overview on deep learning and the popular architectures used for cancer detection and diagnosis. Especially we present four popular deep learning architectures, including convolutional neural networks, fully convolutional networks, auto-encoders, and deep belief networks in the survey. Secondly, we provide a survey on the studies exploiting deep learning for cancer detection and diagnosis. The surveys in this part are organized based on the types of cancers. Thirdly, we provide a summary and comments on the recent work on the applications of deep learning to cancer detection and diagnosis and propose some future research directions.
356 citations
TL;DR: This article revisit Multiple Instance Neural Networks (MINNs) that the neural networks aim at solving the MIL problems and proposes a new type of MINN to learn bag representations, which is different from the existing MINNs that focus on estimating instance label.
Abstract: We revisit the problem of solving MIL using neural networks (MINNs), which are ignored in current MIL research community. Our experiments show that MINNs are very effective and efficient.We proposed a novel MI-Net which is centered on learning bag representation in the neural networks in an end-to-end way.Recent deep learning tricks including dropout, deep supervision and residual connections are studied in MINNs. We find deep supervision and residual connections are effective for MIL.In the experiments, the proposed MINNs achieve state-of-the-art or competitive performance on several MIL benchmarks. Moreover, it is extremely fast for both testing and training, for example, it takes only 0.0003s to predict a bag and a few seconds to train on MIL datasets on a moderate CPU. Of late, neural networks and Multiple Instance Learning (MIL) are both attractive topics in the research areas related to Artificial Intelligence. Deep neural networks have achieved great successes in supervised learning problems, and MIL as a typical weakly-supervised learning method is effective for many applications in computer vision, biometrics, natural language processing, and so on. In this article, we revisit Multiple Instance Neural Networks (MINNs) that the neural networks aim at solving the MIL problems. The MINNs perform MIL in an end-to-end manner, which take bags with a various number of instances as input and directly output the labels of bags. All of the parameters in a MINN can be optimized via back-propagation. Besides revisiting the old MINNs, we propose a new type of MINN to learn bag representations, which is different from the existing MINNs that focus on estimating instance label. In addition, recent tricks developed in deep learning have been studied in MINNs; we find deep supervision is effective for learning better bag representations. In the experiments, the proposed MINNs achieve state-of-the-art or competitive performance on several MIL benchmarks. Moreover, it is extremely fast for both testing and training, for example, it takes only 0.0003 s to predict a bag and a few seconds to train on MIL datasets on a moderate CPU.
343 citations
TL;DR: This task challenges state-of-the-art methods from a variety of research fields to applications including fraud detection, intrusion detection, medical diagnoses and data cleaning.
Abstract: We survey unsupervised machine learning algorithms in the context of outlier detection. This task challenges state-of-the-art methods from a variety of research fields to applications including fraud detection, intrusion detection, medical diagnoses and data cleaning. The selected methods are benchmarked on publicly available datasets and novel industrial datasets. Each method is then submitted to extensive scalability, memory consumption and robustness tests in order to build a full overview of the algorithms’ characteristics.
341 citations
TL;DR: This work presents a novel background subtraction from video sequences algorithm that uses a deep Convolutional Neural Network (CNN) to perform the segmentation, and it outperforms the existing algorithms with respect to the average ranking over different evaluation metrics announced in CDnet 2014.
Abstract: We propose a novel approach based on deep learning for background subtraction from video sequences.A new algorithm to generate background model has been proposed.Input image patches and their corresponding background images are fed into CNN to do background subtraction.We utilized median filter to enhance the segmentation results.Experiments of Change detection results confirm the performance of the proposed approach. In this work, we present a novel background subtraction from video sequences algorithm that uses a deep Convolutional Neural Network (CNN) to perform the segmentation. With this approach, feature engineering and parameter tuning become unnecessary since the network parameters can be learned from data by training a single CNN that can handle various video scenes. Additionally, we propose a new approach to estimate background model from video sequences. For the training of the CNN, we employed randomly 5% video frames and their ground truth segmentations taken from the Change Detection challenge 2014 (CDnet 2014). We also utilized spatial-median filtering as the post-processing of the network outputs. Our method is evaluated with different data-sets, and it (so-called DeepBS) outperforms the existing algorithms with respect to the average ranking over different evaluation metrics announced in CDnet 2014. Furthermore, due to the network architecture, our CNN is capable of real time processing.
331 citations
TL;DR: An approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views is presented, relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix.
Abstract: Most existing approaches address multi-view subspace clustering problem by constructing the affinity matrix on each view separately and afterwards propose how to extend spectral clustering algorithm to handle multi-view data. This paper presents an approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views. Relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix, we introduce the objective that balances between the agreement across different views, while at the same time encourages sparsity and low-rankness of the solution. Related low-rank and sparsity constrained optimization problem is for each view solved using the alternating direction method of multipliers. Furthermore, we extend our approach to cluster data drawn from nonlinear subspaces by solving the corresponding problem in a reproducing kernel Hilbert space. The proposed algorithm outperforms state-of-the-art multi-view subspace clustering algorithms on one synthetic and four real-world datasets.
297 citations
TL;DR: A deep learning-based approach for temporal 3D pose recognition problems based on a combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network and a data augmentation method that has also been validated experimentally is proposed.
Abstract: Combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network for skeleton-based human activity and hand gesture recognition.Two-stage training strategy which firstly focuses on the CNN training and, secondly, adjusts the full method CNN+LSTM.A method for data augmentation in the context of spatiotemporal 3D data sequences.An exhaustive experimental study on publicly available data benchmarks with respect to the state-of-the-art most representative methods.Comparison among different CPU and GPU platforms. In this work, we address human activity and hand gesture recognition problems using 3D data sequences obtained from full-body and hand skeletons, respectively. To this aim, we propose a deep learning-based approach for temporal 3D pose recognition problems based on a combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) recurrent network. We also present a two-stage training strategy which firstly focuses on CNN training and, secondly, adjusts the full method (CNN+LSTM). Experimental testing demonstrated that our training method obtains better results than a single-stage training strategy. Additionally, we propose a data augmentation method that has also been validated experimentally. Finally, we perform an extensive experimental study on publicly available data benchmarks. The results obtained show how the proposed approach reaches state-of-the-art performance when compared to the methods identified in the literature. The best results were obtained for small datasets, where the proposed data augmentation strategy has greater impact.
294 citations
TL;DR: It is proved that selecting useful deep descriptors contributes well to fine-grained image recognition, and a novel Mask-CNN model without the fully connected layers is proposed, which has a small feature dimensionality and efficient inference speed by comparing with other fine- grained approaches.
Abstract: Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we prove that selecting useful deep descriptors contributes well to fine-grained image recognition. Specifically, a novel Mask-CNN model without the fully connected layers is proposed. Based on the part annotations, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate weighted object/part masks for selecting useful and meaningful convolutional descriptors. After that, a three-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. Thanks to discarding the parameter redundant fully connected layers, our Mask-CNN has a small feature dimensionality and efficient inference speed by comparing with other fine-grained approaches. Furthermore, we obtain a new state-of-the-art accuracy on two challenging fine-grained bird species categorization datasets, which validates the effectiveness of both the descriptor selection scheme and the proposed Mask-CNN model.
261 citations
TL;DR: This paper first introduces fully convolutional auto-encoders for image feature learning and then proposes a unified clustering framework to learn image representations and cluster centers jointly based on a fully Convolutional Auto-encoder and soft k-means scores.
Abstract: Traditional image clustering methods take a two-step approach, feature learning and clustering, sequentially. However, recent research results demonstrated that combining the separated phases in a unified framework and training them jointly can achieve a better performance. In this paper, we first introduce fully convolutional auto-encoders for image feature learning and then propose a unified clustering framework to learn image representations and cluster centers jointly based on a fully convolutional auto-encoder and soft k-means scores. At initial stages of the learning procedure, the representations extracted from the auto-encoder may not be very discriminative for latter clustering. We address this issue by adopting a boosted discriminative distribution, where high score assignments are highlighted and low score ones are de-emphasized. With the gradually boosted discrimination, clustering assignment scores are discriminated and cluster purities are enlarged. Experiments on several vision benchmark datasets show that our methods can achieve a state-of-the-art performance.
TL;DR: IDNet is the first system that exploits a deep learning approach as universal feature extractors for gait recognition, and that combines classification results from subsequent walking cycles into a multi-stage decision making framework.
Abstract: Here, we present IDNet, a user authentication framework from smartphone-acquired motion signals. Its goal is to recognize a target user from their way of walking, using the accelerometer and gyroscope (inertial) signals provided by a commercial smartphone worn in the front pocket of the user’s trousers. IDNet features several innovations including: (i) a robust and smartphone-orientation-independent walking cycle extraction block, (ii) a novel feature extractor based on convolutional neural networks, (iii) a one-class support vector machine to classify walking cycles, and the coherent integration of these into (iv) a multi-stage authentication technique. IDNet is the first system that exploits a deep learning approach as universal feature extractors for gait recognition, and that combines classification results from subsequent walking cycles into a multi-stage decision making framework. Experimental results show the superiority of our approach against state-of-the-art techniques, leading to misclassification rates (either false negatives or positives) smaller than 0.15% with fewer than five walking cycles. Design choices are discussed and motivated throughout, assessing their impact on the user authentication performance.
TL;DR: A survey about recent methods that localize a visual acquisition system according to a known environment by categorizing VBL methods into two distinct families: indirect and direct localization systems.
Abstract: We are surrounded by plenty of information about our environment. From these multiple sources, numerous data could be extracted: set of images, 3D model, coloured points cloud... When classical localization devices failed (e.g. GPS sensor in cluttered environments), aforementioned data could be used within a localization framework. This is called Visual Based Localization (VBL). Due to numerous data types that can be collected from a scene, VBL encompasses a large amount of different methods. This paper presents a survey about recent methods that localize a visual acquisition system according to a known environment. We start by categorizing VBL methods into two distinct families: indirect and direct localization systems. As the localization environment is almost always dynamic, we pay special attention to methods designed to handle appearances changes occurring in a scene. Thereafter, we highlight methods exploiting heterogeneous types of data. Finally, we conclude the paper with a discussion on promising trends that could permit to a localization system to reach high precision pose estimation within an area as large as possible.
TL;DR: An overview of current LDW system is provided, describing in particular pre-processing, lane models, lane de Ntection techniques and departure warning system.
Abstract: Statistics show that worldwide motor vehicle collisions lead to significant deaths and disabilities as well as substantial financial costs to both society and the individuals involved. Unintended lane departure is a leading cause of road fatalities by the collision. To reduce the number of traffic accidents and to improve driver’s safety lane departure warning (LDW), the system has emerged as a promising tool. Vision-based lane detection and departure warning system has been investigated over two decades. During this period, many different problems related to lane detection and departure warning have been addressed. This paper provides an overview of current LDW system, describing in particular pre-processing, lane models, lane de Ntection techniques and departure warning system.
TL;DR: A novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep feature embedding for person re-id by proposing a novel sampling to mine suitable positives within a local range to improve the deep embedding in the context of large intra-class variations.
Abstract: Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance systems. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range which are robust to guide deep embedding against uncontrolled variations cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable positives (i.e., intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample's local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep feature embedding. This attains local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method. (C) 2017 Elsevier Ltd. All rights reserved.
TL;DR: Commentary on the proposed revision of the IASP definition of pain of 1979 summarizes, why this proposal is useful for guiding assessment of pain, but not its definition.
Abstract: Milton Cohen, John Quintner, and Simon van Rysewyk proposed a revision of the IASP definition of pain of 1979. This commentary summarizes, why this proposal is useful for guiding assessment of pain, but not its definition.
TL;DR: A human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human- related regions is introduced that achieves state-of-the-art results on these four datasets.
Abstract: The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We additionally consider human-related regions that contain the most informative features. First, by improving foreground detection, the region of interest corresponding to the appearance and the motion of an actor can be detected robustly under realistic circumstances. Based on the entire detected human body, we construct one appearance and one motion stream. In addition, we select a secondary region that contains the major moving part of an actor based on motion saliency. By combining the traditional streams with the novel human-related streams, we introduce a human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human-related regions. Comparative evaluation on the JHMDB, HMDB51, UCF Sports and UCF101 datasets demonstrates that the streams contain features that complement each other. The proposed multi-stream architecture achieves state-of-the-art results on these four datasets.
TL;DR: Quantization-based Hashing (QBH) is a generic framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods and can be applied to both unsupervised and supervised hashing methods.
Abstract: As far as we know, we are the first to propose a general framework to incorporate the quantization-based methods into the conventional similarity-preserving hashing, in order to improve the effectiveness of hashing methods. In theory, any quantization method can be adopted to reduce the quantization error of any similarity-preserving hashing methods to improve their performance.This framework can be applied to both unsupervised and supervised hashing. We experimentally obtained the best performance compared to state-ofthe-art supervised and unsupervised hashing methods on six popular datasets.We successfully show it to work on a huge dataset SIFT1B (1 billion data points) by utilizing the graph approximation and out-of-sample extension. Nowadays, due to the exponential growth of user generated images and videos, there is an increasing interest in learning-based hashing methods. In computer vision, the hash functions are learned in such a way that the hash codes can preserve essential properties of the original space (or label information). Then the Hamming distance of the hash codes can approximate the data similarity. On the other hand, vector quantization methods quantize the data into different clusters based on the criteria of minimal quantization error, and then perform the search using look-up tables. While hashing methods using Hamming distance can achieve faster search speed, their accuracy is often outperformed by quantization methods with the same code length, due to the low quantization error and more flexible distance lookups. To improve the effectiveness of the hashing methods, in this work, we propose Quantization-based Hashing (QBH), a general framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods. The learned hash codes simultaneously preserve the properties in the original space and reduce the quantization error, and thus can achieve better performance. Furthermore, the hash functions and a quantizer can be jointly learned and iteratively updated in a unified framework, which can be readily used to generate hash codes or quantize new data points. Importantly, QBH is a generic framework that can be integrated to different property preserving hashing methods and quantization strategies, and we apply QBH to both unsupervised and supervised hashing models as showcases in this paper. Experimental results on three large-scale unlabeled datasets (i.e., SIFT1M, GIST1M, and SIFT1B), three labeled datastes (i.e., ESPGAME, IAPRTC and MIRFLICKR) and one video dataset (UQ_VIDEO) demonstrate the superior performance of our QBH over existing unsupervised and supervised hashing methods.
TL;DR: This paper develops a novel medical image fusion, denoising, and enhancement method based on low-rank sparse component decomposition and dictionary learning that consistently outperforms existing state-of-the-art methods in terms of both visual and quantitative evaluations.
Abstract: Medical image fusion is important in image-guided medical diagnostics, treatment, and other computer vision tasks. However, most current approaches assume that the source images are noise-free, which is not usually the case in practice. The performance of traditional fusion methods decreases significantly when images are corrupted with noise. It is therefore necessary to develop a fusion method that accurately preserves detailed information even when images are corrupted. However, suppressing noise and enhancing textural details are difficult to achieve simultaneously. In this paper, we develop a novel medical image fusion, denoising, and enhancement method based on low-rank sparse component decomposition and dictionary learning. Specifically, to improve the discriminative ability of the learned dictionaries, we incorporate low-rank and sparse regularization terms into the dictionary learning model. Furthermore, in the image decomposition model, we impose a weighted nuclear norm and sparse constraint on the sparse component to remove noise and preserve textural details. Finally, the fused result is constructed by combining the fused low-rank and sparse components of the source images. Experimental results demonstrate that the proposed method consistently outperforms existing state-of-the-art methods in terms of both visual and quantitative evaluations.
TL;DR: A novel distance metric optimization driven learning approach that integrates these traditional steps via a deep convolutional neural network, which learns feature representations and the decision function in an end-to-end way, and can be optimized simultaneously by backward propagation.
Abstract: Propose a new distance metric optimization driven deep-learning framework for age-invariant face recognition.Learn feature representations and the similarity measure simultaneously in an end-to-end way.Train the joint network using a novel optimization method and carefully designed training strategies.The experimental results demonstrate the effectiveness of our approach. Despite the great advances in face-related works in recent years, face recognition across age remains a challenging problem. The traditional approaches to this problem usually include two basic steps: feature extraction and the application of a distance metric, sometimes common space projection is also involved. On the one hand, handling these steps separately ignores the interactions of these components, and on the other hand, the fixed-distance threshold of measurement affects the models robustness. In this paper, we present a novel distance metric optimization driven learning approach that integrates these traditional steps via a deep convolutional neural network, which learns feature representations and the decision function in an end-to-end way. Given the labelled training images, we first generate a large number of pairs with a certain proportion of matched and unmatched pairs. For matched pairs, we try to select as many different age instances as possible for each person to learn the identification information that is not affected by age. Then, taking these pairs as input, we aim to enlarge the differences between the unmatched pairs while reducing the variations between the matched pairs, and we update the model parameters by using the mini-batch stochastic gradient descent (SGD) algorithm. Specifically, the distance matrix is used as the top fully connected layer, and the bottom layers representing the image features are integrated with it seamlessly. Thus, the image features and the distance metric can be optimized simultaneously by backward propagation. In particular, we introduce several training strategies to reduce the computational cost and overcome insufficient memory capacity. We evaluate our method on three tasks: age-invariant face identification on the MORPH database, age-invariant face retrieval on the CACD database and age-invariant face verification on CACD-VS database. The experimental results demonstrate the effectiveness of our approach.
TL;DR: Inspired by the recent breakthrough via deep recurrent convolutional neural networks (CNNs) on classifying mental load, improved CNNs methods for this task are proposed, which contain less parameters than state-of-the-art ones, making it be more competitive in further practical application.
Abstract: Electroencephalograph (EEG), the representation of the brain's electrical activity, is a widely used measure of brain activities such as working memory during cognitive tasks. Varying in complexity of cognitive tasks, mental load results in different EEG recordings. Classification of mental load is one of core issues in studies on working memory. Various machine learning methods have been introduced into this area, achieving competitive performance. Inspired by the recent breakthrough via deep recurrent convolutional neural networks (CNNs) on classifying mental load, we propose improved CNNs methods for this task. Specifically, our frameworks contain both single-model and double-model methods. With the help of our models, spatial, spectral, and temporal information of EEG data is taken into consideration. Meanwhile, a novel fusion strategy for utilizing different networks is introduced in this work. The proposed methods have been compared with state-of-the-art ones on the same EEG database. The comparison results show that both our single-model method and double-model method can achieve comparable or even better performance than the well-performed deep recurrent CNNs. Furthermore, our proposed CNNs models contain less parameters than state-of-the-art ones, making it be more competitive in further practical application.
TL;DR: The basic ideas, theories, pros and cons of the approaches, group them into categories, and extensively review each category in depth by discussing the principles, application issues, and advantages/disadvantages are studied.
Abstract: Breast cancer is one of the leading causes of cancer death among women worldwide. In clinical routine, automatic breast ultrasound (BUS) image segmentation is very challenging and essential for cancer diagnosis and treatment planning. Many BUS segmentation approaches have been studied in the last two decades, and have been proved to be effective on private datasets. Currently, the advancement of BUS image segmentation seems to meet its bottleneck. The improvement of the performance is increasingly challenging, and only few new approaches were published in the last several years. It is the time to look at the field by reviewing previous approaches comprehensively and to investigate the future directions. In this paper, we study the basic ideas, theories, pros and cons of the approaches, group them into categories, and extensively review each category in depth by discussing the principles, application issues, and advantages/disadvantages.
TL;DR: A novel deep multiplicative integration gating function is proposed, which answers the question of what-and-where to match for effective person re-id and is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner.
Abstract: Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions with spatial manipulation to perform matching in local correspondences. However, they essentially extract fixed representations from pre-divided regions for each image and then perform matching based on these extracted representations. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of what-and-where to match for effective person re-id. To address what to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This leads to flexible representations for pair-wise images. To address where to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.
TL;DR: A fingerprint and finger-vein based cancelable multi-biometric system, which provides template protection and revocability and security is strengthened, thanks to the enhanced partial discrete Fourier transform based non-invertible transformation.
Abstract: Fingerprint and finger-vein based cancelable multi-biometric template design.Flexible feature-level fusion strategy with three fusion options.Enhanced partial discrete Fourier transform based non-invertible transformation.High-performing cancelable multi-biometric templates with strong security. Compared to uni-biometric systems, multi-biometric systems, which fuse multiple biometric features, can improve recognition accuracy and security. However, due to the challenging issues such as feature fusion and biometric template security, there is little research on cancelable multi-biometric systems. In this paper, we propose a fingerprint and finger-vein based cancelable multi-biometric system, which provides template protection and revocability. The proposed multi-biometric system combines the minutia-based fingerprint feature set and image-based finger-vein feature set. We develop a feature-level fusion strategy with three fusion options. Matching performance and security strength using these different fusion options are thoroughly evaluated and analyzed. Moreover, compared with the original partial discrete Fourier transform (P-DFT), security of the proposed multi-biometric system is strengthened, thanks to the enhanced partial discrete Fourier transform (EP-DFT) based non-invertible transformation.
TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.
Abstract: Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird’s eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms.
TL;DR: This article introduces a regularized ensemble framework of deep learning to address the imbalanced, multi-class learning problems in medical diagnosis and demonstrates the superior performance of the method compared to several state-of-the-art algorithms.
Abstract: In medical diagnosis, e.g. bowel cancer detection, a large number of examples of normal cases exists with a much smaller number of positive cases. Such data imbalance usually complicates the learning process, especially for the classes with fewer representative examples, and results in miss detection. In this article, we introduce a regularized ensemble framework of deep learning to address the imbalanced, multi-class learning problems. Our method employs regularization that accommodates multi-class data sets and automatically determines the error bound. The regularization penalizes the classifier when it misclassifies examples that were correctly classified in the previous learning phase. Experiments are conducted using capsule endoscopy videos of bowel cancer symptoms and synthetic data sets with moderate to high imbalance ratios. The results demonstrate the superior performance of our method compared to several state-of-the-art algorithms for imbalanced, multi-class classification problems. More importantly, the sensitivity gain of the minority classes is accompanied by the improvement of the overall accuracy for all classes. With regularization, a diverse group of classifiers is created and the maximum accuracy improvement is at 24.7%. The reduction in computational cost is also noticeable and as the volume of training data increase, the gain of efficiency by our method becomes more significant.
TL;DR: This work treats the small-dim targets as a special sparse noise component of the complex background noise and adopt Mixture of Gaussians (MoG) with Markov random field (MRF) with MRF to model the small target detection problem.
Abstract: Small target detection is one of the key techniques in infrared search and tracking applications. When small targets are very dim and of low signal-to-noise ratio, they are very similar to background noise, which usually causes high false alarm rates for conventional methods. To address this problem, we novelly treat the small-dim targets as a special sparse noise component of the complex background noise and adopt Mixture of Gaussians (MoG) with Markov random field (MRF) to model this problem. Firstly, the spatio-temporal patch image is constructed using several consecutive frames to utilize the temporal information of the image sequence. Then, the MRF guided MoG noise model under the Bayesian framework is proposed to model the small target detection problem. After that, by variational Bayesian, the small target component can be effectively separated from complex background noise. Finally, a simple adaptive segmentation method is used to extract small targets. Several series of experiments are done to evaluate the proposed method and the results show that the proposed method is robust for real infrared images with complex background.
TL;DR: An integrated framework consisting of bottom-up and top-down attention mechanisms that enable attention to be computed at the level of salient objects and/or regions is proposed.
Abstract: Visual attention is a kind of fundamental cognitive capability that allows human beings to focus on the region of interests (ROIs) under complex natural environments. What kind of ROIs that we pay attention to mainly depends on two distinct types of attentional mechanisms. The bottom-up mechanism can guide our detection of the salient objects and regions by externally driven factors, i.e. color and location, whilst the top-down mechanism controls our biasing attention based on prior knowledge and cognitive strategies being provided by visual cortex. However, how to practically use and fuse both attentional mechanisms for salient object detection has not been sufficiently explored. To the end, we propose in this paper an integrated framework consisting of bottom-up and top-down attention mechanisms that enable attention to be computed at the level of salient objects and/or regions. Within our framework, the model of a bottom-up mechanism is guided by the gestalt-laws of perception. We interpreted gestalt-laws of homogeneity, similarity, proximity and figure and ground in link with color, spatial contrast at the level of regions and objects to produce feature contrast map. The model of top-down mechanism aims to use a formal computational model to describe the background connectivity of the attention and produce the priority map. Integrating both mechanisms and applying to salient object detection, our results have demonstrated that the proposed method consistently outperforms a number of existing unsupervised approaches on five challenging and complicated datasets in terms of higher precision and recall rates, AP (average precision) and AUC (area under curve) values.
TL;DR: A novel regression-based Convolutional Neural Network (CNN) pipeline is presented for polyp detection during Colonoscopy and has great potential to be used to assist endoscopists in tracking polyps during colonoscopy.
Abstract: A computer-aided detection (CAD) tool for locating and detecting polyps can help reduce the chance of missing polyps during colonoscopy. Nevertheless, state-of-the-art algorithms were either computationally complex or suffered from low sensitivity and therefore unsuitable to be used in real clinical setting. In this paper, a novel regression-based Convolutional Neural Network (CNN) pipeline is presented for polyp detection during colonoscopy. The proposed pipeline was constructed in two parts: 1) to learn the spatial features of colorectal polyps, a fast object detection algorithm named ResYOLO was pre-trained with a large non-medical image database and further fine-tuned with colonoscopic images extracted from videos; and 2) temporal information was incorporated via a tracker named Efficient Convolution Operators (ECO) for refining the detection results given by ResYOLO. Evaluated on 17,574 frames extracted from 18 endoscopic videos of the AsuMayoDB, the proposed method was able to detect frames with polyps with a precision of 88.6%, recall of 71.6% and processing speed of 6.5 frames per second, i.e. the method can accurately locate polyps in more frames and at a faster speed compared to existing methods. In conclusion, the proposed method has great potential to be used to assist endoscopists in tracking polyps during colonoscopy.
TL;DR: The International Association for the Study of Pain (IASP) definition of pain has been widely accepted as a pragmatic characterisation of human experience as mentioned in this paper, but it fails to sufficiently integrate phenomenological aspects of pain.
Abstract: Introduction: The definition of pain promulgated by the International Association for the Study of Pain (IASP) is widely accepted as a pragmatic characterisation of that human experience. Although the Notes that accompany it characterise pain as "always
subjective," the IASP definition itself fails to sufficiently integrate phenomenological aspects of pain.
Methods: This essay reviews the historical development of the IASP definition, and the commentaries and suggested modifications to it over almost 40 years. Common factors of pain experience identified in phenomenological studies are described, together with theoretical insights from philosophy and biology.
Results: A fuller understanding of the pain experience and of the clinical care of those experiencing pain is achievable through greater attention to the phenomenology of pain, the social "intersubjective space" in which pain occurs, and the limitations of language.
Conclusion: Based on these results, a revised definition of pain is offered: Pain is a mutually recognizable somatic experience that reflects a person’s apprehension of threat to their bodily or existential integrity.