scispace - formally typeset
Search or ask a question
Author

Jack Valmadre

Bio: Jack Valmadre is an academic researcher from University of Oxford. The author has contributed to research in topics: Video tracking & Structure from motion. The author has an hindex of 21, co-authored 32 publications receiving 8323 citations. Previous affiliations of Jack Valmadre include Commonwealth Scientific and Industrial Research Organisation & Queensland University of Technology.

Papers
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: A basic tracking algorithm is equipped with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video and achieves state-of-the-art performance in multiple benchmarks.
Abstract: The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.

2,936 citations

Posted Content
TL;DR: In this paper, a fully-convolutional Siamese network is trained end-to-end on the ILSVRC15 dataset for object detection in video, which achieves state-of-the-art performance.
Abstract: The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.

1,613 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: In this paper, the Correlation Filter learner is interpreted as a differentiable layer in a deep neural network, which enables learning deep features that are tightly coupled to the correlation filter.
Abstract: The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations. It is well suited to object tracking because its formulation in the Fourier domain provides a fast solution, enabling the detector to be re-trained once per frame. Previous works that use the Correlation Filter, however, have adopted features that were either manually designed or trained for a different task. This work is the first to overcome this limitation by interpreting the Correlation Filter learner, which has a closed-form solution, as a differentiable layer in a deep neural network. This enables learning deep features that are tightly coupled to the Correlation Filter. Experiments illustrate that our method has the important practical benefit of allowing lightweight architectures to achieve state-of-the-art performance at high framerates.

1,329 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: It is shown that a simple tracker combining complementary cues in a ridge regression framework can operate faster than 80 FPS and outperform not only all entries in the popular VOT14 competition, but also recent and far more sophisticated trackers according to multiple benchmarks.
Abstract: Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Models based on colour statistics have complementary traits: they cope well with variation in shape, but suffer when illumination is not consistent throughout a sequence. Moreover, colour distributions alone can be insufficiently discriminative. In this paper, we show that a simple tracker combining complementary cues in a ridge regression framework can operate faster than 80 FPS and outperform not only all entries in the popular VOT14 competition, but also recent and far more sophisticated trackers according to multiple benchmarks.

1,285 citations

Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Cehovin1, Tomas Vojir3, Gustav Häger4, Alan Lukežič1, Gustavo Fernandez5, Abhinav Gupta6, Alfredo Petrosino7, Alireza Memarmoghadam8, Alvaro Garcia-Martin9, Andres Solis Montero10, Andrea Vedaldi11, Andreas Robinson4, Andy J. Ma12, Anton Varfolomieiev13, A. Aydin Alatan14, Aykut Erdem15, Bernard Ghanem16, Bin Liu, Bohyung Han17, Brais Martinez18, Chang-Ming Chang19, Changsheng Xu20, Chong Sun21, Daijin Kim17, Dapeng Chen22, Dawei Du20, Deepak Mishra23, Dit-Yan Yeung24, Erhan Gundogdu25, Erkut Erdem15, Fahad Shahbaz Khan4, Fatih Porikli26, Fatih Porikli27, Fei Zhao20, Filiz Bunyak28, Francesco Battistone7, Gao Zhu27, Giorgio Roffo29, Gorthi R. K. Sai Subrahmanyam23, Guilherme Sousa Bastos30, Guna Seetharaman31, Henry Medeiros32, Hongdong Li27, Honggang Qi20, Horst Bischof33, Horst Possegger33, Huchuan Lu21, Hyemin Lee17, Hyeonseob Nam34, Hyung Jin Chang35, Isabela Drummond30, Jack Valmadre11, Jae-chan Jeong36, Jaeil Cho36, Jae-Yeong Lee36, Jianke Zhu37, Jiayi Feng20, Jin Gao20, Jin-Young Choi, Jingjing Xiao2, Ji-Wan Kim36, Jiyeoup Jeong, João F. Henriques11, Jochen Lang10, Jongwon Choi, José M. Martínez9, Junliang Xing20, Junyu Gao20, Kannappan Palaniappan28, Karel Lebeda38, Ke Gao28, Krystian Mikolajczyk35, Lei Qin20, Lijun Wang21, Longyin Wen19, Luca Bertinetto11, Madan Kumar Rapuru23, Mahdieh Poostchi28, Mario Edoardo Maresca7, Martin Danelljan4, Matthias Mueller16, Mengdan Zhang20, Michael Arens, Michel Valstar18, Ming Tang20, Mooyeol Baek17, Muhammad Haris Khan18, Naiyan Wang24, Nana Fan39, Noor M. Al-Shakarji28, Ondrej Miksik11, Osman Akin15, Payman Moallem8, Pedro Senna30, Philip H. S. Torr11, Pong C. Yuen12, Qingming Huang20, Qingming Huang39, Rafael Martin-Nieto9, Rengarajan Pelapur28, Richard Bowden38, Robert Laganiere10, Rustam Stolkin2, Ryan Walsh32, Sebastian B. Krah, Shengkun Li19, Shengping Zhang39, Shizeng Yao28, Simon Hadfield38, Simone Melzi29, Siwei Lyu19, Siyi Li24, Stefan Becker, Stuart Golodetz11, Sumithra Kakanuru23, Sunglok Choi36, Tao Hu20, Thomas Mauthner33, Tianzhu Zhang20, Tony P. Pridmore18, Vincenzo Santopietro7, Weiming Hu20, Wenbo Li40, Wolfgang Hübner, Xiangyuan Lan12, Xiaomeng Wang18, Xin Li39, Yang Li37, Yiannis Demiris35, Yifan Wang21, Yuankai Qi39, Zejian Yuan22, Zexiong Cai12, Zhan Xu37, Zhenyu He39, Zhizhen Chi21 
08 Oct 2016
TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).

744 citations


Cited by
More filters
Proceedings Article
24 Apr 2017
TL;DR: In this paper, an LSTM-based meta-learner model is proposed to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime.
Abstract: Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a model has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity models requires many iterative steps over many examples to perform well. Here, we propose an LSTM-based meta-learner model to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime. The parametrization of our model allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner network that allows for quick convergence of training. We demonstrate that this meta-learning model is competitive with deep metric-learning techniques for few-shot learning.

2,981 citations

Book ChapterDOI
08 Oct 2016
TL;DR: A basic tracking algorithm is equipped with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video and achieves state-of-the-art performance in multiple benchmarks.
Abstract: The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.

2,936 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning.
Abstract: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on few-shot learning, our framework is easily extended to zero-shot learning. Extensive experiments on five benchmarks demonstrate that our simple approach provides a unified and effective approach for both of these two tasks.

2,496 citations

Posted Content
TL;DR: Relation Network (RN) as mentioned in this paper learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.
Abstract: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on few-shot learning, our framework is easily extended to zero-shot learning. Extensive experiments on five benchmarks demonstrate that our simple approach provides a unified and effective approach for both of these two tasks.

2,077 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: The Siamese region proposal network (Siamese-RPN) is proposed which is end-to-end trained off-line with large-scale image pairs for visual object tracking and consists of SiAMESe subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch.
Abstract: Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers can hardly get top performance with real-time speed. In this paper, we propose the Siamese region proposal network (Siamese-RPN) which is end-to-end trained off-line with large-scale image pairs. Specifically, it consists of Siamese subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch. In the inference phase, the proposed framework is formulated as a local one-shot detection task. We can pre-compute the template branch of the Siamese subnetwork and formulate the correlation layers as trivial convolution layers to perform online tracking. Benefit from the proposal refinement, traditional multi-scale test and online fine-tuning can be discarded. The Siamese-RPN runs at 160 FPS while achieving leading performance in VOT2015, VOT2016 and VOT2017 real-time challenges.

2,016 citations