Showing papers in "Computer Vision and Image Understanding in 2017"

PDF

Open Access

Journal Article•DOI•

The THUMOS challenge on action recognition for videos “in the wild”

[...]

Haroon Idrees¹, Amir Roshan Zamir², Yu-Gang Jiang³, Alexander Gorban⁴, Ivan Laptev⁵, Rahul Sukthankar⁴, Mubarak Shah¹ - Show less +3 more•Institutions (5)

University of Central Florida¹, Stanford University², Fudan University³, Google⁴, French Institute for Research in Computer Science and Automation⁵

01 Feb 2017-Computer Vision and Image Understanding

TL;DR: The THUMOS benchmark is described in detail and an overview of data collection and annotation procedures are given, including a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

...read moreread less

415 citations

Journal Article•DOI•

Detecting anomalous events in videos by learning deep representations of appearance and motion

[...]

Dan Xu¹, Yan Yan², Elisa Ricci³, Elisa Ricci⁴, Nicu Sebe¹ - Show less +1 more•Institutions (4)

University of Trento¹, University of Michigan², fondazione bruno kessler³, University of Perugia⁴

01 Mar 2017-Computer Vision and Image Understanding

TL;DR: A novel double fusion framework is introduced, combining the benefits of traditional early fusion and late fusion strategies, which is extensively evaluated on publicly available video surveillance datasets including UCSD pedestian, Subway, and Train, showing competitive performance with respect to state of the art approaches.

...read moreread less

385 citations

Journal Article•DOI•

Learning a no-reference quality metric for single-image super-resolution

[...]

Chao Ma¹, Chao Ma², Chih-Yuan Yang³, Xiaokang Yang¹, Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

Shanghai Jiao Tong University¹, University of Adelaide², University of California, Merced³

01 May 2017-Computer Vision and Image Understanding

TL;DR: Zhang et al. as discussed by the authors designed three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learned a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images.

...read moreread less

338 citations

Journal Article•DOI•

Space-time representation of people based on 3D skeletal data

[...]

Fei Han¹, Brian Reily¹, William Hoff¹, Hao Zhang¹•Institutions (1)

Colorado School of Mines¹

01 May 2017-Computer Vision and Image Understanding

TL;DR: Skeleton-based human representations have been intensively studied and kept attracting an increasing attention, due to their robustness to variations of viewpoint, human body scale and motion speed as well as the real-time, online performance as mentioned in this paper.

...read moreread less

279 citations

Journal Article•DOI•

Systematic evaluation of convolution neural network advances on the Imagenet

[...]

Dmytro Mishkin¹, Nikolay Sergievskiy, Jiri Matas¹•Institutions (1)

Czech Technical University in Prague¹

01 Aug 2017-Computer Vision and Image Understanding

TL;DR: It is shown that the use of 128 × 128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets, and an order of magnitude faster than with the standard 224 pixel images.

...read moreread less

266 citations

Journal Article•DOI•

Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound

[...]

Fausto Milletari¹, Seyed-Ahmad Ahmadi², Christine Kroll¹, Annika Plate², Verena E. Rozanski², Juliana Maiostre², Johannes Levin², Olaf Dietrich², Birgit Ertl-Wagner², Kai Bötzel², Nassir Navab¹ - Show less +7 more•Institutions (2)

Technische Universität München¹, Ludwig Maximilian University of Munich²

01 Nov 2017-Computer Vision and Image Understanding

TL;DR: A novel approach to perform segmentation by leveraging the abstraction capabilities of convolutional neural networks (CNNs) based on Hough voting, which is robust, multi-region, flexible and can be easily adapted to different modalities is proposed.

...read moreread less

263 citations

Journal Article•DOI•

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

[...]

Abhishek Das¹, Harsh Agrawal², C. Lawrence Zitnick³, Devi Parikh¹, Devi Parikh³, Dhruv Batra³, Dhruv Batra¹ - Show less +3 more•Institutions (3)

Georgia Institute of Technology¹, Virginia Tech², Facebook³

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: In this article, the VQA-HAT (Human ATtention) dataset was introduced to evaluate the attention maps generated by state-of-the-art visual question answering models against human attention, and they showed that current attention models do not seem to be looking at the same regions as humans.

...read moreread less

256 citations

Journal Article•DOI•

Visual question answering: A survey of methods and datasets

[...]

Qi Wu¹, Damien Teney¹, Peng Wang¹, Chunhua Shen¹, Anthony Dick¹, Anton van den Hengel¹ - Show less +2 more•Institutions (1)

University of Adelaide¹

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities as mentioned in this paper, which requires reasoning over visual elements of the image and general knowledge to infer the correct answer.

...read moreread less

255 citations

Journal Article•DOI•

Visual question answering: Datasets, algorithms, and future challenges

[...]

Kushal Kafle¹, Christopher Kanan¹•Institutions (1)

Rochester Institute of Technology¹

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: This review critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms, and exhaustively review existing algorithms for V QA.

...read moreread less

203 citations

Journal Article•DOI•

Looking beyond appearances: Synthetic training data for deep CNNs in re-identification

[...]

Igor Barros Barbosa¹, Marco Cristani², Barbara Caputo³, Aleksander Rognhaugen¹, Theoharis Theoharis¹ - Show less +1 more•Institutions (3)

Norwegian University of Science and Technology¹, University of Verona², Sapienza University of Rome³

11 Jan 2017-Computer Vision and Image Understanding

TL;DR: SOMAnet, a framework based on a deep convolutional neural network that additionally models other discriminative aspects of the human figure, departing from the usual siamese framework, matches subjects even with different apparel.

...read moreread less

196 citations

Journal Article•DOI•

Computer vision for assistive technologies

[...]

Marco Leo¹, Gerard Medioni², Mohan M. Trivedi³, Takeo Kanade⁴, Giovanni Maria Farinella⁵ - Show less +1 more•Institutions (5)

National Research Council¹, University of Southern California², University of California, San Diego³, Carnegie Mellon University⁴, University of Catania⁵

01 Jan 2017-Computer Vision and Image Understanding

TL;DR: An original "task oriented" way to categorize the state of the art of the AT works has been introduced that relies on the split of the final assistive goals into tasks that are then used as pointers to the works in literature in which each of them has been used as a component.

...read moreread less

Journal Article•DOI•

Computer vision for sports: Current applications and research topics

[...]

Graham Thomas, Rikke Gade, Thomas B. Moeslund, Peter W. Carr¹, Adrian Hilton² - Show less +1 more•Institutions (2)

Disney Research¹, University of Surrey²

01 Jun 2017-Computer Vision and Image Understanding

TL;DR: A selection of current commercial applications that use computer vision for sports analysis, and highlights some of the topics that are currently being addressed in the research community are discussed.

...read moreread less

Journal Article•DOI•

Haze visibility enhancement: A Survey and quantitative benchmarking

[...]

Yu Li¹, Shaodi You², Michael S. Brown³, Robby T. Tan⁴•Institutions (4)

Agency for Science, Technology and Research¹, Australian National University², York University³, National University of Singapore⁴

01 Dec 2017-Computer Vision and Image Understanding

TL;DR: A comprehensive survey of visibility enhancement of images taken in hazy or foggy scenes can be found in this paper, where optical models of atmospheric scattering media and image formation are discussed.

...read moreread less

Journal Article•DOI•

Improved gait recognition based on specialized deep convolutional neural network

[...]

Munif Alotaibi¹, Ausif Mahmood¹•Institutions (1)

University of Bridgeport¹

01 Nov 2017-Computer Vision and Image Understanding

TL;DR: A specialized deep convolutional neural network architecture for gait recognition that is less sensitive to several cases of the common variations and occlusions that affect and degrade gait Recognition performance.

...read moreread less

Journal Article•DOI•

A survey on player tracking in soccer videos

[...]

M. Manafifard¹, Hamid Ebadi¹, H. Abrishami Moghaddam¹•Institutions (1)

K.N.Toosi University of Technology¹

01 Jun 2017-Computer Vision and Image Understanding

TL;DR: This paper presents the state-of-the-art in preprocessing and processing methods for soccer player tracking, categorize different approaches, analyze their strengths and weaknesses, review evaluation criteria and conclude future research directions.

...read moreread less

Journal Article•DOI•

Efficient 3D scene abstraction using line segments

[...]

Manuel Hofer¹, Michael Maurer¹, Horst Bischof¹•Institutions (1)

Graz University of Technology¹

01 Apr 2017-Computer Vision and Image Understanding

TL;DR: A robust and efficient line-based Multi-v iew Stereo algorithm is introduced that uses geometric line-matching, which makes it invariant to illumination changes, and generates accurate 3D models with low computational costs, which is especially useful for large-scale urban datasets.

...read moreread less

Journal Article•DOI•

Face alignment in-the-wild: A Survey

[...]

Xin Jin¹, Xiaoyang Tan¹•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Sep 2017-Computer Vision and Image Understanding

TL;DR: This survey presents an up-to-date critical review of the existing literatures on face alignment, focusing on those methods addressing overall difficulties and challenges of this topic under uncontrolled conditions.

...read moreread less

Journal Article•DOI•

Efficient single image dehazing and denoising

[...]

Xin Liu¹, He Zhang¹, Yiu-ming Cheung², Xinge You³, Yuan Yan Tang⁴ - Show less +1 more•Institutions (4)

Huaqiao University¹, Hong Kong Baptist University², Huazhong University of Science and Technology³, University of Macau⁴

01 Sep 2017-Computer Vision and Image Understanding

TL;DR: This paper presents an efficient multi-scale correlated wavelet approach to solve the image dehazing and denoising problem in the frequency domain and finds a generic regularity in nature images that the haze is typically distributed in the low frequency spectrum of its multi- scale wavelet decomposition.

...read moreread less

Journal Article•DOI•

Weakly supervised learning of actions from transcripts

[...]

Hilde Kuehne¹, Alexander Richard¹, Juergen Gall¹•Institutions (1)

University of Bonn¹

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: In this article, a weakly-supervised learning approach is proposed for weakly supervised learning of human actions from video transcriptions based on the idea that, given a sequence of input data and a transcript, i.e., a list of the order the actions occur in the video, it is possible to infer the actions within the video stream and to learn the related action models without the need for any frame-based annotation.

...read moreread less

Journal Article•DOI•

Structured deep hashing with convolutional neural networks for fast person re-identification

[...]

Lin Wu¹, Yang Wang², Yang Wang³, Zongyuan Ge⁴, Qichang Hu⁵, Xue Li¹ - Show less +2 more•Institutions (5)

University of Queensland¹, Dalian University of Technology², University of New South Wales³, IBM⁴, University of Adelaide⁵

01 Dec 2017-Computer Vision and Image Understanding

TL;DR: A novel deep hashing framework with Convolutional Neural Networks (CNNs) for fast person re-identification that simultaneously learns both CNN features and hash functions to get robust yet discriminative features and similarity-preserving hash codes.

...read moreread less

Journal Article•DOI•

Underwater image and video dehazing with pure haze region segmentation

[...]

Simon Emberton¹, Lars Chittka¹, Andrea Cavallaro¹•Institutions (1)

Queen Mary University of London¹

24 Aug 2017-Computer Vision and Image Understanding

TL;DR: A novel dehazing method is presented that improves visibility in images and videos by detecting and segmenting image regions that contain only water, and proposes a semantic white balancing approach for illuminant estimation that uses the dominant colour of the water to address the spectral distortion present in underwater scenes.

...read moreread less

Journal Article•DOI•

Compact Descriptors for Sketch-based Image Retrieval using a Triplet loss Convolutional Neural Network

[...]

Tu Bui¹, Leonardo Sampaio Ferraz Ribeiro², Moacir Antonelli Ponti², John Collomosse¹•Institutions (2)

University of Surrey¹, University of São Paulo²

22 Jun 2017-Computer Vision and Image Understanding

TL;DR: The ability of the learned image descriptor to generalise beyond the categories of object present in the authors' training data, forming a basis for general cross-category SBIR is demonstrated.

...read moreread less

Journal Article•DOI•

Traffic surveillance camera calibration by 3D model bounding box alignment for accurate vehicle speed measurement

[...]

Jakub Sochor¹, Roman Jurnek¹, Adam Herout¹•Institutions (1)

Brno University of Technology¹

01 Aug 2017-Computer Vision and Image Understanding

TL;DR: This paper improves over a recent state-of-the-art camera calibration method for traffic surveillance based on two detected vanishing points, and proposes a novel automatic scene scale inference method based on matching bounding boxes of rendered 3D models of vehicles with detected bounding box in the image.

...read moreread less

Journal Article•DOI•

Image Understanding using vision and reasoning through Scene Description Graph

[...]

Somak Aditya¹, Yezhou Yang¹, Chitta Baral¹, Yiannis Aloimonos², Cornelia Fermüller² - Show less +1 more•Institutions (2)

Arizona State University¹, University of Maryland, College Park²

01 Dec 2017-Computer Vision and Image Understanding

TL;DR: A general architecture is proposed in which a system can represent both the content and underlying concepts of an image using an SDG, and it is proposed that the extracted graphs capture syntactic and semantic content of images with reasonable accuracy.

...read moreread less

Journal Article•DOI•

Simple to complex cross-modal learning to rank

[...]

Minnan Luo¹, Xiaojun Chang², Zhihui Li³, Liqiang Nie⁴, Alexander G. Hauptmann², Qinghua Zheng¹ - Show less +2 more•Institutions (4)

Xi'an Jiaotong University¹, Carnegie Mellon University², University of Technology, Sydney³, National University of Singapore⁴

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: In this paper, a self-paced learning theory with diversity was proposed to learn an optimal multi-modal embedding space based on non-linear mapping functions, which enhances the model robustness to outliers and achieves better generalization via training the model gradually from easy rankings by diverse queries to more complex ones.

...read moreread less

Journal Article•DOI•

3D-2D face recognition with pose and illumination normalization

[...]

Ioannis A. Kakadiaris¹, George Toderici¹, Georgios Evangelopoulos¹, G. Passalis², Dat Chu¹, Xi Zhao¹, Shishir K. Shah¹, Theoharis Theoharis² - Show less +4 more•Institutions (2)

University of Houston¹, National and Kapodistrian University of Athens²

01 Jan 2017-Computer Vision and Image Understanding

TL;DR: Results for 3D-2D face recognition on the UHDB11 3D/2D database with 2D images under large illumination and pose variations support the hypothesis that, in challenging datasets, 3D+2D outperforms 2D- 2D and decreases the performance gap against 3D.

...read moreread less

Journal Article•DOI•

Large-scale outdoor 3D reconstruction on a mobile device

[...]

Thomas Schps, Torsten Sattler, Christian Hne¹, Marc Pollefeys²•Institutions (2)

University of California, Berkeley¹, Microsoft²

01 Apr 2017-Computer Vision and Image Understanding

TL;DR: This paper presents an approach for reconstructing large-scale outdoor scenes through monocular motion stereo at interactive frame rates on a modern mobile device, and is the first method to enable live reconstruction of large outdoor scenes on a mobile device.

...read moreread less

Journal Article•DOI•

SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation

[...]

Mariella Dimiccoli¹, Marc Bolaños¹, Estefania Talavera², Estefania Talavera¹, Maedeh Aghaei¹, Stavri G. Nikolov, Petia Radeva¹ - Show less +3 more•Institutions (2)

University of Barcelona¹, University of Groningen²

01 Feb 2017-Computer Vision and Image Understanding

TL;DR: This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval.

...read moreread less

Journal Article•DOI•

Accurate vessel segmentation using maximum entropy incorporating line detection and phase-preserving denoising

[...]

Dinesh Pandey¹, Xiaoxia Yin¹, Hua Wang¹, Yanchun Zhang¹•Institutions (1)

Victoria University, Australia¹

01 Feb 2017-Computer Vision and Image Understanding

TL;DR: A novel algorithm which involves separation of background images to minimize the influence of noise, non-uniformed illuminations and lesions is proposed and two different strategies to segment thin and thick blood vessels are developed.

...read moreread less

Journal Article•DOI•

Social profiling through image understanding

[...]

Cristina Segalin¹, Dong Seon Cheng², Marco Cristani³•Institutions (3)

California Institute of Technology¹, Hankuk University of Foreign Studies², University of Verona³

01 Mar 2017-Computer Vision and Image Understanding

TL;DR: The experimental results show that the proposed method outperforms state-of-the-art results and captures what visually characterizes a certain trait: using a deconvolution strategy, a clear distinction of features, patterns and content between low and high values in a given trait is found.

...read moreread less

Collapse