Showing papers on "Feature (computer vision) published in 2016"

PDF

Open Access

Posted Content•

Feature Pyramid Networks for Object Detection

[...]

Tsung-Yi Lin¹, Piotr Dollár², Ross Girshick², Kaiming He², Bharath Hariharan², Serge Belongie¹ - Show less +2 more•Institutions (2)

Cornell University¹, Facebook²

09 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Feature pyramid networks (FPNets) as mentioned in this paper exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost.

...read moreread less

Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

...read moreread less

5,438 citations

Proceedings Article•DOI•

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

[...]

Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew Peter Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang - Show less +4 more

27 Jun 2016

TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.

...read moreread less

Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

...read moreread less

4,770 citations

Journal Article•DOI•

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

[...]

Hoo-Chang Shin, Holger R. Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues, Jianhua Yao, Daniel J. Mollura, Ronald M. Summers - Show less +5 more

11 Feb 2016-IEEE Transactions on Medical Imaging

TL;DR: Two specific computer-aided detection problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification are studied, achieving the state-of-the-art performance on the mediastinal LN detection, and the first five-fold cross-validation classification results are reported.

...read moreread less

Abstract: Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and deep convolutional neural networks (CNNs). CNNs enable learning data-driven, highly representative, hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.

...read moreread less

4,249 citations

Proceedings Article•

Unsupervised deep embedding for clustering analysis

[...]

Junyuan Xie¹, Ross Girshick², Ali Farhadi¹•Institutions (2)

University of Washington¹, Facebook²

19 Jun 2016

TL;DR: Deep Embedded Clustering (DEC) as discussed by the authors learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective.

...read moreread less

Abstract: Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods.

...read moreread less

1,776 citations

Proceedings Article•DOI•

Deeper Depth Prediction with Fully Convolutional Residual Networks

[...]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis¹, Federico Tombari, Nassir Navab - Show less +1 more•Institutions (1)

University of Oxford¹

01 Oct 2016

TL;DR: A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented.

...read moreread less

Abstract: This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.

...read moreread less

1,677 citations

Journal Article•DOI•

Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images

[...]

Gong Cheng¹, Peicheng Zhou¹, Junwei Han¹•Institutions (1)

Northwestern Polytechnical University¹

05 Sep 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: This paper proposes a novel and effective approach to learn a rotation-invariant CNN (RICNN) model for advancing the performance of object detection, which is achieved by introducing and learning a new rotation- Invariant layer on the basis of the existing CNN architectures.

...read moreread less

Abstract: Object detection in very high resolution optical remote sensing images is a fundamental problem faced for remote sensing image analysis. Due to the advances of powerful feature representations, machine-learning-based object detection is receiving increasing attention. Although numerous feature representations exist, most of them are handcrafted or shallow-learning-based features. As the object detection task becomes more challenging, their description capability becomes limited or even impoverished. More recently, deep learning algorithms, especially convolutional neural networks (CNNs), have shown their much stronger feature representation power in computer vision. Despite the progress made in nature scene images, it is problematic to directly use the CNN feature for object detection in optical remote sensing images because it is difficult to effectively deal with the problem of object rotation variations. To address this problem, this paper proposes a novel and effective approach to learn a rotation-invariant CNN (RICNN) model for advancing the performance of object detection, which is achieved by introducing and learning a new rotation-invariant layer on the basis of the existing CNN architectures. However, different from the training of traditional CNN models that only optimizes the multinomial logistic regression objective, our RICNN model is trained by optimizing a new objective function via imposing a regularization constraint, which explicitly enforces the feature representations of the training samples before and after rotating to be mapped close to each other, hence achieving rotation invariance. To facilitate training, we first train the rotation-invariant layer and then domain-specifically fine-tune the whole RICNN network to further boost the performance. Comprehensive evaluations on a publicly available ten-class object detection data set demonstrate the effectiveness of the proposed method.

...read moreread less

1,370 citations

Book Chapter•DOI•

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

[...]

Zhaowei Cai¹, Quanfu Fan², Rogerio Feris², Nuno Vasconcelos¹•Institutions (2)

University of California, San Diego¹, IBM²

08 Oct 2016

TL;DR: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi- scale object detection, which is learned end-to-end, by optimizing a multi-task loss.

...read moreread less

Abstract: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.

...read moreread less

1,342 citations

Book Chapter•DOI•

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

[...]

Martin Danelljan¹, Andreas Robinson¹, Fahad Shahbaz Khan¹, Michael Felsberg¹•Institutions (1)

Linköping University¹

12 Aug 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Discriminative Correlation Filters have demonstrated excellent performance for visual object tracking and the key to their success is the ability to efficiently exploit available negative data.

...read moreread less

Abstract: Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate). Additionally, our approach is capable of sub-pixel localization, crucial for the task of accurate feature point tracking. We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments. Code and supplementary material are available at this http URL.

...read moreread less

1,324 citations

Journal Article•DOI•

Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

[...]

Marios Anthimopoulos¹, Stergios Christodoulidis¹, Lukas Ebner¹, Andreas Christe¹, Stavroula Mougiakakou¹ - Show less +1 more•Institutions (1)

University of Bern¹

29 Feb 2016-IEEE Transactions on Medical Imaging

TL;DR: A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset, and demonstrated the potential of CNNs in analyzing lung patterns.

...read moreread less

Abstract: Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2 $\,\times\,$ 2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance ( $\sim 85.5\%$ ) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists.

...read moreread less

1,053 citations

Journal Article•DOI•

A survey on object detection in optical remote sensing images

[...]

Gong Cheng¹, Junwei Han¹•Institutions (1)

Northwestern Polytechnical University¹

01 Jul 2016-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: This survey focuses on more generic object categories including, but not limited to, road, building, tree, vehicle, ship, airport, urban-area, and proposes two promising research directions, namely deep learning- based feature representation and weakly supervised learning-based geospatial object detection.

...read moreread less

Abstract: Object detection in optical remote sensing images, being a fundamental but challenging problem in the field of aerial and satellite image analysis, plays an important role for a wide range of applications and is receiving significant attention in recent years. While enormous methods exist, a deep review of the literature concerning generic object detection is still lacking. This paper aims to provide a review of the recent progress in this field. Different from several previously published surveys that focus on a specific object class such as building and road, we concentrate on more generic object categories including, but are not limited to, road, building, tree, vehicle, ship, airport, urban-area. Covering about 270 publications we survey (1) template matching-based object detection methods, (2) knowledge-based object detection methods, (3) object-based image analysis (OBIA)-based object detection methods, (4) machine learning-based object detection methods, and (5) five publicly available datasets and three standard evaluation metrics. We also discuss the challenges of current studies and propose two promising research directions, namely deep learning-based feature representation and weakly supervised learning-based geospatial object detection. It is our hope that this survey will be beneficial for the researchers to have better understanding of this research field.

...read moreread less

994 citations

Journal Article•DOI•

Binary grey wolf optimization approaches for feature selection

[...]

Eid Emary¹, Hossam M. Zawbaa², Aboul Ella Hassanien³•Institutions (3)

Cairo University¹, Babeș-Bolyai University², Beni-Suef University³

08 Jan 2016-Neurocomputing

TL;DR: Results prove the capability of the proposed binary version of grey wolf optimization (bGWO) to search the feature space for optimal feature combinations regardless of the initialization and the used stochastic operators.

...read moreread less

Journal Article•DOI•

Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.

[...]

Andrew Janowczyk¹, Anant Madabhushi¹•Institutions (1)

Case Western Reserve University¹

01 Jan 2016-Journal of Pathology Informatics

TL;DR: This paper investigates concepts through seven unique DP tasks as use cases to elucidate techniques needed to produce comparable, and in many cases, superior to results from the state-of-the-art hand-crafted feature-based classification approaches.

...read moreread less

Proceedings Article•

Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections

[...]

Xiao-Jiao Mao¹, Chunhua Shen², Yu-Bin Yang¹•Institutions (2)

Nanjing University¹, University of Adelaide²

05 Dec 2016

TL;DR: This paper proposes to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum, making training deep networks easier and achieving restoration performance gains consequently.

...read moreread less

Abstract: In this paper, we propose a very deep fully convolutional encoding-decoding framework for image restoration such as denoising and super-resolution. The network is composed of multiple layers of convolution and deconvolution operators, learning end-to-end mappings from corrupted images to the original ones. The convolutional layers act as the feature extractor, which capture the abstraction of image contents while eliminating noises/corruptions. Deconvolutional layers are then used to recover the image details. We propose to symmetrically link convolutional and deconvolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum. First, the skip connections allow the signal to be back-propagated to bottom layers directly, and thus tackles the problem of gradient vanishing, making training deep networks easier and achieving restoration performance gains consequently. Second, these skip connections pass image details from convolutional layers to deconvolutional layers, which is beneficial in recovering the original image. Significantly, with the large capacity, we can handle different levels of noises using a single model. Experimental results show that our network achieves better performance than recent state-of-the-art methods.

...read moreread less

Book Chapter•DOI•

LIFT: Learned Invariant Feature Transform

[...]

Kwang Moo Yi¹, Eduard Trulls¹, Vincent Lepetit², Pascal Fua¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Graz University of Technology²

08 Oct 2016

TL;DR: This work introduces a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description, and shows how to learn to do all three in a unified manner while preserving end-to-end differentiability.

...read moreread less

Abstract: We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.

...read moreread less

Journal Article•DOI•

High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning

[...]

Sarah M. Erfani¹, Sutharshan Rajasegarar¹, Shanika Karunasekera¹, Christopher Leckie¹•Institutions (1)

University of Melbourne¹

01 Oct 2016-Pattern Recognition

TL;DR: A hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN, which delivers a comparable accuracy with a deep autoencoder and is scalable and computationally efficient.

...read moreread less

Journal Article•DOI•

Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach

[...]

Wenzhi Zhao¹, Shihong Du¹•Institutions (1)

Peking University¹

08 Apr 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A spectral-spatial feature based classification (SSFC) framework that jointly uses dimension reduction and deep learning techniques for spectral and spatial feature extraction, respectively is proposed.

...read moreread less

Abstract: In this paper, we propose a spectral–spatial feature based classification (SSFC) framework that jointly uses dimension reduction and deep learning techniques for spectral and spatial feature extraction, respectively. In this framework, a balanced local discriminant embedding algorithm is proposed for spectral feature extraction from high-dimensional hyperspectral data sets. In the meantime, convolutional neural network is utilized to automatically find spatial-related features at high levels. Then, the fusion feature is extracted by stacking spectral and spatial features together. Finally, the multiple-feature-based classifier is trained for image classification. Experimental results on well-known hyperspectral data sets show that the proposed SSFC method outperforms other commonly used methods for hyperspectral image classification.

...read moreread less

Posted Content•

Deeper Depth Prediction with Fully Convolutional Residual Networks

[...]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis¹, Federico Tombari, Nassir Navab - Show less +1 more•Institutions (1)

University of Oxford¹

01 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a fully convolutional architecture, encompassing residual learning, is proposed to model the ambiguous mapping between monocular images and depth maps, which can be trained end-to-end and does not rely on post-processing techniques such as CRFs or other additional refinement steps.

...read moreread less

Journal Article•DOI•

Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification

[...]

Satoshi Iizuka¹, Edgar Simo-Serra¹, Hiroshi Ishikawa¹•Institutions (1)

Waseda University¹

11 Jul 2016

TL;DR: A novel technique to automatically colorize grayscale images that combines both global priors and local image features and can process images of any resolution, unlike most existing approaches based on CNN.

...read moreread less

Abstract: We present a novel technique to automatically colorize grayscale images that combines both global priors and local image features. Based on Convolutional Neural Networks, our deep network features a fusion layer that allows us to elegantly merge local information dependent on small image patches with global priors computed using the entire image. The entire framework, including the global and local priors as well as the colorization model, is trained in an end-to-end fashion. Furthermore, our architecture can process images of any resolution, unlike most existing approaches based on CNN. We leverage an existing large-scale scene classification database to train our model, exploiting the class labels of the dataset to more efficiently and discriminatively learn the global priors. We validate our approach with a user study and compare against the state of the art, where we show significant improvements. Furthermore, we demonstrate our method extensively on many different types of images, including black-and-white photography from over a hundred years ago, and show realistic colorizations.

...read moreread less

Posted Content•

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification

[...]

Tong Xiao¹, Hongsheng Li¹, Wanli Ouyang¹, Xiaogang Wang¹•Institutions (1)

The Chinese University of Hong Kong¹

26 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a Domain Guided Dropout (DGD) algorithm to improve the feature learning procedure for person re-ID, which outperformed state-of-the-art methods on multiple datasets by large margins.

...read moreread less

Abstract: Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations. In this work, we present a pipeline for learning deep feature representations from multiple domains with Convolutional Neural Networks (CNNs). When training a CNN with data from all the domains, some neurons learn representations shared across several domains, while some others are effective only for a specific one. Based on this important observation, we propose a Domain Guided Dropout algorithm to improve the feature learning procedure. Experiments show the effectiveness of our pipeline and the proposed algorithm. Our methods on the person re-identification problem outperform state-of-the-art methods on multiple datasets by large margins.

...read moreread less

Posted Content•

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

[...]

Long Chen¹, Hanwang Zhang², Jun Xiao¹, Liqiang Nie³, Jian Shao¹, Wei Liu⁴, Tat-Seng Chua⁵ - Show less +3 more•Institutions (5)

Zhejiang University¹, Columbia University², Shandong University³, Tencent⁴, National University of Singapore⁵

17 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: SCA-CNN as mentioned in this paper incorporates spatial and channel-wise attentions in a CNN to dynamically modulate the sentence generation context in multi-layer feature maps, encoding where attentive spatial locations at multiple layers and what (i.e., attentive channels) the visual attention is.

...read moreread less

Abstract: Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism --- a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.

...read moreread less

Journal Article•DOI•

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations

[...]

Eliyahu Kiperwasser¹, Yoav Goldberg¹•Institutions (1)

Bar-Ilan University¹

20 Jul 2016-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTM) and feature vectors are constructed by concatenating a few BiLSTMM vectors.

...read moreread less

Abstract: We present a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with a BiLSTM vector representing the token in its sentential context, and feature vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing. We demonstrate the effectiveness of the approach by applying it to a greedy transition-based parser as well as to a globally optimized graph-based parser. The resulting parsers have very simple architectures, and match or surpass the state-of-the-art accuracies on English and Chinese.

...read moreread less

Journal Article•DOI•

Image analysis and machine learning in digital pathology: Challenges and opportunities

[...]

Anant Madabhushi¹, George Lee¹•Institutions (1)

Case Western Reserve University¹

01 Oct 2016-Medical Image Analysis

TL;DR: This review discusses developments in computational image analysis tools for predictive modeling of digital pathology images from a detection, segmentation, feature extraction, and tissue classification perspective, and reflects on future opportunities for the quantitation of histopathology.

...read moreread less

Posted Content•

Generating Images with Perceptual Similarity Metrics based on Deep Networks

[...]

Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

08 Feb 2016-arXiv: Learning

TL;DR: Deep perceptual similarity metrics (DeePSiM) as mentioned in this paper is proposed to mitigate the over-smoothed results of image-generating machine learning models by computing distances between image features extracted by deep neural networks.

...read moreread less

Abstract: Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric better reflects perceptually similarity of images and thus leads to better results. We show three applications: autoencoder training, a modification of a variational autoencoder, and inversion of deep convolutional networks. In all cases, the generated images look sharp and resemble natural images.

...read moreread less

Journal Article•DOI•

Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary Nodules in CT Scans

[...]

Jie-Zhi Cheng¹, Dong Ni¹, Yi-Hong Chou², Jing Qin¹, Chui Mei Tiu³, Yeun-Chung Chang⁴, Chiun-Sheng Huang⁴, Dinggang Shen⁵, Chung-Ming Chen⁴ - Show less +5 more•Institutions (5)

Shenzhen University¹, Taipei Veterans General Hospital², National Yang-Ming University³, National Taiwan University⁴, University of North Carolina at Chapel Hill⁵

15 Apr 2016-Scientific Reports

TL;DR: The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.

...read moreread less

Abstract: This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.

...read moreread less

Proceedings Article•DOI•

Human Pose Estimation with Iterative Error Feedback

[...]

Joao Carreira¹, Pulkit Agrawal¹, Katerina Fragkiadaki¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

01 Jun 2016

TL;DR: This article propose a self-correcting model that progressively changes an initial solution by feeding back error predictions, in a process called Iterative Error Feedback (IEF), which shows excellent performance on the task of articulated human pose estimation in the challenging MPII and LSP benchmarks.

...read moreread less

Abstract: Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing. Feedforward architectures can learn rich representations of the input space but do not explicitly model dependencies in the output spaces, that are quite structured for tasks such as articulated human pose estimation or object segmentation. Here we propose a framework that expands the expressive power of hierarchical feature extractors to encompass both input and output spaces, by introducing top-down feedback. Instead of directly predicting the outputs in one go, we use a self-correcting model that progressively changes an initial solution by feeding back error predictions, in a process we call Iterative Error Feedback (IEF). IEF shows excellent performance on the task of articulated pose estimation in the challenging MPII and LSP benchmarks, matching the state-of-the-art without requiring ground truth scale annotation.

...read moreread less

Proceedings Article•DOI•

Recurrent Convolutional Network for Video-Based Person Re-identification

[...]

Niall McLaughlin¹, Jesus Martinez del Rincon¹, Paul Miller¹•Institutions (1)

Queen's University Belfast¹

27 Jun 2016

TL;DR: A novel recurrent neural network architecture for video-based person re-identification that makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re- identification.

...read moreread less

Abstract: In this paper we propose a novel recurrent neural network architecture for video-based person re-identification. Given the video sequence of a person, features are extracted from each frame using a convolutional neural network that incorporates a recurrent final layer, which allows information to flow between time-steps. The features from all timesteps are then combined using temporal pooling to give an overall appearance feature for the complete sequence. The convolutional network, recurrent layer, and temporal pooling layer, are jointly trained to act as a feature extractor for video-based re-identification using a Siamese network architecture. Our approach makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re-identification. Experiments are conduced on the iLIDS-VID and PRID-2011 datasets to show that this approach outperforms existing methods of video-based re-identification.

...read moreread less

Posted Content•

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

[...]

Georgios Pavlakos¹, Xiaowei Zhou¹, Konstantinos G. Derpanis², Kostas Daniilidis¹•Institutions (2)

University of Pennsylvania¹, Ryerson University²

23 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a fine discretization of the 3D space around the subject and trains a ConvNet to predict per voxel likelihoods for each joint, which creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates.

...read moreread less

Abstract: This paper addresses the challenge of 3D human pose estimation from a single color image. Despite the general success of the end-to-end learning paradigm, top performing approaches employ a two-step solution consisting of a Convolutional Network (ConvNet) for 2D joint localization and a subsequent optimization step to recover 3D pose. In this paper, we identify the representation of 3D pose as a critical issue with current ConvNet approaches and make two important contributions towards validating the value of end-to-end learning for this task. First, we propose a fine discretization of the 3D space around the subject and train a ConvNet to predict per voxel likelihoods for each joint. This creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates. Second, to further improve upon initial estimates, we employ a coarse-to-fine prediction scheme. This step addresses the large dimensionality increase and enables iterative refinement and repeated processing of the image features. The proposed approach outperforms all state-of-the-art methods on standard benchmarks achieving a relative error reduction greater than 30% on average. Additionally, we investigate using our volumetric representation in a related architecture which is suboptimal compared to our end-to-end approach, but is of practical interest, since it enables training when no image with corresponding 3D groundtruth is available, and allows us to present compelling results for in-the-wild images.

...read moreread less

Proceedings Article•DOI•

Inverting Visual Representations with Convolutional Networks

[...]

Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

27 Jun 2016

TL;DR: This work proposes a new approach to study image representations by inverting them with an up-convolutional neural network, and applies this method to shallow representations (HOG, SIFT, LBP), as well as to deep networks.

...read moreread less

Abstract: Feature representations, both hand-designed and learned ones, are often hard to analyze and interpret, even when they are extracted from visual data. We propose a new approach to study image representations by inverting them with an up-convolutional neural network. We apply the method to shallow representations (HOG, SIFT, LBP), as well as to deep networks. For shallow representations our approach provides significantly better reconstructions than existing methods, revealing that there is surprisingly rich information contained in these features. Inverting a deep network trained on ImageNet provides several insights into the properties of the feature representation learned by the network. Most strikingly, the colors and the rough contours of an image can be reconstructed from activations in higher network layers and even from the predicted class probabilities.

...read moreread less

Journal Article•DOI•

A Comprehensive Performance Evaluation of 3D Local Feature Descriptors

[...]

Yulan Guo¹, Mohammed Bennamoun², Ferdous Sohel², Min Lu¹, Jianwei Wan¹, Ngaiming Kwok³ - Show less +2 more•Institutions (3)

National University of Defense Technology¹, University of Western Australia², University of New South Wales³

01 Jan 2016-International Journal of Computer Vision

TL;DR: This paper compares ten popular local feature descriptors in the contexts of 3D object recognition, 3D shape retrieval, and 3D modeling and presents the performance results of these descriptors when combined with different 3D keypoint detection methods.

...read moreread less

Abstract: A number of 3D local feature descriptors have been proposed in the literature. It is however, unclear which descriptors are more appropriate for a particular application. A good descriptor should be descriptive, compact, and robust to a set of nuisances. This paper compares ten popular local feature descriptors in the contexts of 3D object recognition, 3D shape retrieval, and 3D modeling. We first evaluate the descriptiveness of these descriptors on eight popular datasets which were acquired using different techniques. We then analyze their compactness using the recall of feature matching per each float value in the descriptor. We also test the robustness of the selected descriptors with respect to support radius variations, Gaussian noise, shot noise, varying mesh resolution, distance to the mesh boundary, keypoint localization error, occlusion, clutter, and dataset size. Moreover, we present the performance results of these descriptors when combined with different 3D keypoint detection methods. We finally analyze the computational efficiency for generating each descriptor.

...read moreread less

Journal Article•DOI•

DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection

[...]

Xi Li¹, Liming Zhao¹, Lina Wei¹, Ming-Hsuan Yang², Fei Wu¹, Yueting Zhuang¹, Haibin Ling³, Jingdong Wang⁴ - Show less +4 more•Institutions (4)

Zhejiang University¹, University of California, Merced², Temple University³, Microsoft⁴

09 Jun 2016-IEEE Transactions on Image Processing

TL;DR: This paper proposes a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (Whole saliency maps) and presents a graph Laplacian regularized nonlinear regression model for saliency refinement.

...read moreread less

Abstract: A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency model takes a data-driven strategy for encoding the underlying saliency prior information, and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation. Through collaborative feature learning from such two correlated tasks, the shared fully convolutional layers produce effective features for object perception. Moreover, it is capable of capturing the semantic information on salient objects across different levels using the fully convolutional layers, which investigate the feature-sharing properties of salient object detection with a great reduction of feature redundancy. Finally, we present a graph Laplacian regularized nonlinear regression model for saliency refinement. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

...read moreread less

Collapse