Top 28 papers published by Jiri Matas from Czech Technical University in Prague in 2019

Proceedings Article•DOI•

The Seventh Visual Object Tracking VOT2019 Challenge Results

[...]

Matej Kristan¹, Amanda Berg², Linyu Zheng³, Litu Rout⁴ +176 more•Institutions (43)

01 Oct 2019

TL;DR: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative; results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.

...read moreread less

Abstract: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

...read moreread less

393 citations

Proceedings Article•DOI•

MAGSAC: Marginalizing Sample Consensus

[...]

Daniel Barath, Jiri Matas, Jana Noskova

15 Jun 2019

TL;DR: In this article, a method called sigmaconsensus is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC, which is marginalized over a range of noise scales.

...read moreread less

Abstract: A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating the noise sigma, it is marginalized over a range of noise scales. The optimized model is obtained by weighted least-squares fitting where the weights come from the marginalization over sigma of the point likelihoods of being inliers. A new quality function is proposed not requiring sigma and, thus, a set of inliers to determine the model quality. Also, a new termination criterion for RANSAC is built on the proposed marginalization approach. Applying sigma-consensus, MAGSAC is proposed with no need for a user-defined sigma and improving the accuracy of robust estimation significantly. It is superior to the state-of-the-art in terms of geometric accuracy on publicly available real-world datasets for epipolar geometry (F and E) and homography estimation. In addition, applying sigma-consensus only once as a post-processing step to the RANSAC output always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, adding a few milliseconds.

...read moreread less

187 citations

Proceedings Article•DOI•

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019

[...]

Nibal Nayef, Cheng-Lin Liu¹, Jean-Marc Ogier², Yash Patel³, Michal Busta³, Pinaki Nath Chowdhury⁴, Dimosthenis Karatzas⁵, Wafa Khlif², Jiri Matas³, Umapada Pal⁴, Jean-Christophe Burie² - Show less +7 more•Institutions (5)

Chinese Academy of Sciences¹, University of La Rochelle², Czech Technical University in Prague³, Indian Statistical Institute⁴, Autonomous University of Barcelona⁵

01 Sep 2019

TL;DR: The RRC-MLT-2019 challenge as discussed by the authors was the first edition of the multi-lingual scene text (MLT) detection and recognition challenge, which aims to systematically benchmark and push the state-of-the-art forward.

...read moreread less

Abstract: With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

...read moreread less

175 citations

Posted Content•

MAGSAC++, a fast, reliable and accurate robust estimator

[...]

Daniel Barath¹, Jana Noskova², Maksym Ivashechkin², Jiri Matas²•Institutions (2)

Hungarian Academy of Sciences¹, Czech Technical University in Prague²

11 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: MAGSAC++ and Progressive NAPSAC sampler is proposed and it is shown that the progressive spatial sampling in P-NAPSAC can be integrated with PROSAC sampling, which is applied to the first, location-defining, point.

...read moreread less

Abstract: A new method for robust estimation, MAGSAC++, is proposed. It introduces a new model quality (scoring) function that does not require the inlier-outlier decision, and a novel marginalization procedure formulated as an iteratively re-weighted least-squares approach. We also propose a new sampler, Progressive NAPSAC, for RANSAC-like robust estimators. Exploiting the fact that nearby points often originate from the same model in real-world data, it finds local structures earlier than global samplers. The progressive transition from local to global sampling does not suffer from the weaknesses of purely localized samplers. On six publicly available real-world datasets for homography and fundamental matrix fitting, MAGSAC++ produces results superior to state-of-the-art robust methods. It is faster, more geometrically accurate and fails less often.

...read moreread less

68 citations

Proceedings Article•DOI•

On Finding Gray Pixels

[...]

Yanlin Qian, Joni-Kristian Kamarainen, Jarno Nikkanen¹, Jiri Matas•Institutions (1)

Intel¹

09 Jan 2019

TL;DR: The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free, and outperforms state-of-the-art statistical methods and many recent deep methods.

...read moreread less

Abstract: We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep methods. GI is simple and fast, written in a few dozen lines of code, processing a 1080p image in ~0.4 seconds with a non-optimized Matlab code.

...read moreread less

65 citations

Proceedings Article•DOI•

Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters

[...]

Ugur Kart¹, Alan Lukezic², Matej Kristan², Joni-Kristian Kamarainen, Jiri Matas - Show less +1 more•Institutions (2)

Tampere University of Technology¹, University of Ljubljana²

15 Jun 2019

TL;DR: The proposed long-term RGB-D tracker called OTR – Object Tracking by Reconstruction performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs).

...read moreread less

Abstract: Standard RGB-D trackers treat the target as a 2D structure, which makes modelling appearance changes related even to out-of-plane rotation challenging. This limitation is addressed by the proposed long-term RGB-D tracker called OTR – Object Tracking by Reconstruction. OTR performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs). The 3D reconstruction supports two performance- enhancing features: (i) generation of an accurate spatial support for constrained DCF learning from its 2D projection and (ii) point-cloud based estimation of 3D pose change for selection and storage of view-specific DCFs which robustly localize the target after out-of-view rotation or heavy occlusion. Extensive evaluation on the Princeton RGB-D tracking and STC Benchmarks shows OTR outperforms the state-of-the-art by a large margin.

...read moreread less

63 citations

Proceedings Article•DOI•

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

[...]

Alan Lukezic¹, Ugur Kart², Jani Käpylä, Ahmed Durmush, Joni-Kristian Kamarainen, Jiri Matas, Matej Kristan¹ - Show less +3 more•Institutions (2)

University of Ljubljana¹, Tampere University of Technology²

01 Oct 2019

TL;DR: The CDTB dataset is the largest and most diverse dataset in RGB-D tracking, with an order of magnitude larger number of frames than related datasets, indicating a large gap between the two fields, which has not been previously detected by the prior benchmarks.

...read moreread less

Abstract: We propose a new color-and-depth general visual object tracking benchmark (CDTB). CDTB is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The CDTB dataset is the largest and most diverse dataset in RGB-D tracking, with an order of magnitude larger number of frames than related datasets. The sequences have been carefully recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. Experiments with RGB and RGB-D trackers show that CDTB is more challenging than previous datasets. State-of-the-art RGB trackers outperform the recent RGB-D trackers, indicating a large gap between the two fields, which has not been previously detected by the prior benchmarks. Based on the results of the analysis we point out opportunities for future research in RGB-D tracker design.

...read moreread less

52 citations

Proceedings Article•DOI•

Gyroscope-Aided Motion Deblurring with Deep Networks

[...]

Janne Mustaniemi¹, Juho Kannala, Simo Särkkä, Jiri Matas², Janne Heikkilä¹ - Show less +1 more•Institutions (2)

University of Oulu¹, Czech Technical University in Prague²

04 Mar 2019

TL;DR: A deblurring method that incorporates gyroscope measurements into a convolutional neural network (CNN) can handle extremely strong and spatially-variant motion blur and is shown to improve the performance of existing feature detectors and descriptors against the motion blur.

...read moreread less

Abstract: We propose a deblurring method that incorporates gyroscope measurements into a convolutional neural network (CNN). With the help of such measurements, it can handle extremely strong and spatially-variant motion blur. At the same time, the image data is used to overcome the limitations of gyro-based blur estimation. To train our network, we also introduce a novel way of generating realistic training data using the gyroscope. The evaluation shows a clear improvement in visual quality over the state-of-the-art while achieving real-time performance. Furthermore, the method is shown to improve the performance of existing feature detectors and descriptors against the motion blur.

...read moreread less

32 citations

Proceedings Article•DOI•

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm

[...]

Daniel Barath, Jiri Matas

01 Oct 2019

TL;DR: Prog-X as discussed by the authors is a multi-model fitting algorithm that interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization.

...read moreread less

Abstract: The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several beneficial properties compared with the state-of-the-art. First, a clear criterion, adopted from RANSAC, controls the termination and stops the algorithm when the probability of finding a new model with a reasonable number of inliers falls below a threshold. Second, Prog-X is an any-time algorithm. Thus, whenever is interrupted, e.g. due to a time limit, the returned instances cover real and, likely, the most dominant ones. The method is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation.

...read moreread less

25 citations

Posted Content•

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm

[...]

Daniel Barath¹, Jiri Matas•Institutions (1)

Hungarian Academy of Sciences¹

05 Jun 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting and is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation.

...read moreread less

Abstract: The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several beneficial properties compared with the state-of-the-art. First, a clear criterion, adopted from RANSAC, controls the termination and stops the algorithm when the probability of finding a new model with a reasonable number of inliers falls below a threshold. Second, Prog-X is an any-time algorithm. Thus, whenever is interrupted, e.g. due to a time limit, the returned instances cover real and, likely, the most dominant ones. The method is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation.

...read moreread less

21 citations

Proceedings Article•DOI•

Revisiting Gray Pixel for Statistical Illumination Estimation.

[...]

Yanlin Qian¹, Said Pertuz², Jarno Nikkanen³, Joni-Kristian Kamarainen², Jiri Matas¹ - Show less +1 more•Institutions (3)

Czech Technical University in Prague¹, Tampere University of Technology², Intel³

01 Jan 2019

TL;DR: Experiments on two real-world benchmarks show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario and in the setting where the camera is known, MSGP outperforms all statistical methods.

...read moreread less

Abstract: We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel -- MSGP, is based on the observation: true-gray pixels are aligned towards one single direction. Our solution is compact, easy to compute and requires no training. Experiments on two real-world benchmarks show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario. In the setting where the camera is known, MSGP outperforms all statistical methods.

...read moreread less

Proceedings Article•DOI•

Improving CNN Classifiers by Estimating Test-Time Priors

[...]

Milan Sulc¹, Jiri Matas¹•Institutions (1)

Czech Technical University in Prague¹

01 Oct 2019

TL;DR: In this paper, the problem of different training and test set class priors is addressed in the context of CNN classifiers, and two approaches to the estimation of the unknown test priors are compared: an existing Maximum Likelihood Estimation (MLE) method and a proposed Maximum a Posteriori (MAP) approach introducing a Dirichlet hyper-prior on the class prior probabilities.

...read moreread less

Abstract: The problem of different training and test set class priors is addressed in the context of CNN classifiers. We compare two approaches to the estimation of the unknown test priors: an existing Maximum Likelihood Estimation (MLE) method and a proposed Maximum a Posteriori (MAP) approach introducing a Dirichlet hyper-prior on the class prior probabilities. Experimental results show a significant improvement in the fine-grained classification tasks using known evaluation-time priors, increasing top-1 accuracy by 4.0% on the FGVC iNaturalist 2018 validation set and by 3.9% on the FGVCx Fungi 2018 validation set. Estimation of the unknown test set priors noticeably increases the accuracy on the PlantCLEF dataset, allowing a single CNN model to achieve state-of-the-art results and to outperform the competition-winning ensemble of 12 CNNs. The proposed MAP estimation increases the prediction accuracy by 2.8% on PlantCLEF 2017 and by 1.8% on FGVCx Fungi, where the MLE method decreases accuracy.

...read moreread less

Posted Content•

DAL -- A Deep Depth-aware Long-term Tracker

[...]

Yanlin Qian, Alan Lukežič, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas - Show less +1 more

02 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep depth-aware long-term tracker is proposed that achieves state-of-the-art RGBD tracking performance and is fast to run and reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features.

...read moreread less

Abstract: The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features. Moreover, the same depth-aware correlation filter is used for target re-detection. Comprehensive evaluations show that the proposed tracker achieves state-of-the-art performance on the Princeton RGBD, STC, and the newly-released CDTB benchmarks and runs 20 fps.

...read moreread less

Posted Content•

Progressive NAPSAC: sampling from gradually growing neighborhoods.

[...]

Daniel Barath¹, Maksym Ivashechkin, Jiri Matas•Institutions (1)

Hungarian Academy of Sciences¹

05 Jun 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that the progressive spatial sampling in P-NAPSAC can be integrated with PROSAC sampling, which is applied to the first, location-defining, point.

...read moreread less

Abstract: We propose Progressive NAPSAC, P-NAPSAC in short, which merges the advantages of local and global sampling by drawing samples from gradually growing neighborhoods. Exploiting the fact that nearby points are more likely to originate from the same geometric model, P-NAPSAC finds local structures earlier than global samplers. We show that the progressive spatial sampling in P-NAPSAC can be integrated with PROSAC sampling, which is applied to the first, location-defining, point. P-NAPSAC is embedded in USAC, a state-of-the-art robust estimation pipeline, which we further improve by implementing its local optimization as in Graph-Cut RANSAC. We call the resulting estimator USAC*. The method is tested on homography and fundamental matrix fitting on a total of 10,691 models from seven publicly available datasets. USAC* with P-NAPSAC outperforms reference methods in terms of speed on all problems.

...read moreread less

Proceedings Article•DOI•

Intra-Frame Object Tracking by Deblatting

[...]

Jan Kotera, Denys Rozumnyi, Filip Sroubek, Jiri Matas

01 Oct 2019

TL;DR: The Tracking by Deblatting (TbD) tracker as discussed by the authors is based on the observation that motion blur is directly related to the intra-frame trajectory of an object.

...read moreread less

Abstract: Objects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects elapse non-negligible distance during exposure time of a single frame and therefore their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by standard trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. The trajectory is then estimated by fitting a piecewise quadratic curve, which models physically justifiable trajectories. As a result, tracked objects are precisely localized with higher temporal resolution than by conventional trackers. The proposed TbD tracker was evaluated on a newly created dataset of videos with ground truth obtained by a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms baseline both in recall and trajectory accuracy.

...read moreread less

Posted Content•

Leveraging Outdoor Webcams for Local Descriptor Learning

[...]

Milan Pultar, Dmytro Mishkin, Jiri Matas

28 Jan 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: AMOS Patches, a large set of image cut-outs intended primarily for the robustification of trainable local feature descriptors to illumination and appearance changes, achieves state-of-the-art in matching under illumination changes on standard benchmarks.

...read moreread less

Abstract: We present AMOS Patches, a large set of image cut-outs, intended primarily for the robustification of trainable local feature descriptors to illumination and appearance changes. Images contributing to AMOS Patches originate from the AMOS dataset of recordings from a large set of outdoor webcams. The semiautomatic method used to generate AMOS Patches is described. It includes camera selection, viewpoint clustering and patch selection. For training, we provide both the registered full source images as well as the patches. A new descriptor, trained on the AMOS Patches and 6Brown datasets, is introduced. It achieves state-of-the-art in matching under illumination changes on standard benchmarks.

...read moreread less

Posted Content•

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

[...]

Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-Lin Liu, Jean-Marc Ogier - Show less +7 more

01 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge are presented, which has 4 tasks covering various aspects of multi-lingual scene text.

...read moreread less

Abstract: With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

...read moreread less

Journal Article•DOI•

Cumulative attribute space regression for head pose estimation and color constancy

[...]

Ke Chen¹, Kui Jia², Heikki Huttunen¹, Jiri Matas¹, Jiri Matas³, Joni-Kristian Kamarainen¹ - Show less +2 more•Institutions (3)

Tampere University of Technology¹, South China University of Technology², Czech Technical University in Prague³

01 Mar 2019-Pattern Recognition

TL;DR: It is shown how the original CA space can be generalized to multiple output by the Cartesian product (CartCA), but for target spaces with more than two outputs the CartCA becomes computationally infeasible and therefore an approximate solution - multi-view CA (MvCA) - where CartCA is applied to output pairs is proposed.

...read moreread less

Recognition of the Amazonian flora by Inception Networks with Test-time Class Prior Estimation

[...]

Lukás Picek, Milan Sulc, Jiri Matas

01 Jan 2019

TL;DR: Performance improvements were achieved by adjusting the CNN predictions according to the estimated change of the class prior probabilities, replacing network parameters with their running averages, testtime data augmentation, filtering the provided training set and adding additional training images from GBIF.

...read moreread less

Abstract: The paper describes an automatic system for recognition of 10,000 plant species, with focus on species from the Guiana shield and the Amazon rain forest. The proposed system achieves the best results on the PlantCLEF 2019 test set with 31.9% accuracy. Compared against human experts in plant recognition, the system performed better than 3 of the 5 participating human experts and achieved 41.0% accuracy on the subset for expert evaluation. The proposed system is based on the Inception-v4 and Inception-ResNet-v2 Convolutional Neural Network (CNN) architectures. Performance improvements were achieved by: adjusting the CNN predictions according to the estimated change of the class prior probabilities, replacing network parameters with their running averages, testtime data augmentation, filtering the provided training set and adding additional training images from GBIF.

...read moreread less

Proceedings Article•DOI•

Flash Lightens Gray Pixe

[...]

Yanlin Qian, Song Yan, Joni-Kristian Kamarainen, Jiri Matas¹•Institutions (1)

Czech Technical University in Prague¹

01 Sep 2019

TL;DR: It is shown that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data, calibration of the flash, and a novel flash photography dataset generated from the MIT intrinsic dataset is introduced.

...read moreread less

Abstract: In the real world, a scene is usually cast by multiple illuminants and herein we address the problem of spatial illumination estimation. Our solution is based on detecting gray pixels with the help of flash photography. We show that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data or calibration of the flash. We also introduce a novel flash photography dataset generated from the MIT intrinsic dataset.

...read moreread less

Proceedings Article•DOI•

Intra-frame Object Tracking by Deblatting

[...]

Jan Kotera¹, Denys Rozumnyi², Filip Sroubek¹, Jiri Matas²•Institutions (2)

Academy of Sciences of the Czech Republic¹, Czech Technical University in Prague²

09 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Tracking by Deblatting (TbD) tracker as discussed by the authors is based on the observation that motion blur is directly related to the intra-frame trajectory of an object.

...read moreread less

Abstract: Objects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects elapse non-negligible distance during exposure time of a single frame and therefore their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by standard trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. The trajectory is then estimated by fitting a piecewise quadratic curve, which models physically justifiable trajectories. As a result, tracked objects are precisely localized with higher temporal resolution than by conventional trackers. The proposed TbD tracker was evaluated on a newly created dataset of videos with ground truth obtained by a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms baseline both in recall and trajectory accuracy.

...read moreread less

Journal Article•DOI•

Performance analysis of single-query 6-DoF camera pose estimation in self-driving setups

[...]

Junsheng Fu, Said Pertuz¹, Jiri Matas², Joni-Kristian Kamarainen•Institutions (2)

Industrial University of Santander¹, Czech Technical University in Prague²

01 Sep 2019-Computer Vision and Image Understanding

TL;DR: A hybrid approach that combines feature-based and mutual-information-based pose estimation methods to benefit from their complementary properties for pose estimation is evaluated.

...read moreread less

Proceedings Article•DOI•

Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects

[...]

Denys Rozumnyi¹, Jan Kotera², Filip Sroubek², Jiri Matas¹•Institutions (2)

Czech Technical University in Prague¹, Academy of Sciences of the Czech Republic²

25 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time is proposed.

...read moreread less

Abstract: We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. The sub-frame object localization and appearance estimation allows realistic temporal super-resolution and precise shape estimation. The method, called TbD-3D (Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which solves a piece-wise deblurring and matting problem. The 3D rotation is estimated by minimizing the reprojection error. As a second contribution, we present a new challenging dataset with fast moving objects that change their appearance and distance to the camera. High speed camera recordings with zero lag between frame exposures were used to generate videos with different frame rates annotated with ground-truth trajectory and pose.

...read moreread less

Proceedings Article•DOI•

Care Label Recognition

[...]

Jiri Kralicek¹, Jiri Matas¹, Michal Busta¹•Institutions (1)

Czech Technical University in Prague¹

01 Sep 2019

TL;DR: Experiments conducted on a newly-created dataset of 63 care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above 0.6, confirming the challenging nature of the problem.

...read moreread less

Abstract: The paper introduces the problem of care label recognition and presents a method addressing it. A care label, also called a care tag, is a small piece of cloth or paper attached to a garment providing instructions for its maintenance and information about e.g. the material and size. The informationand instructions are written as symbols or plain text. Care label recognition is a challenging text and pictogram recognition problem - the often sewn text is small, looking as if printed using a non-standard font; the contrast of the text gradually fades, making OCR progressively more difficult. On the other hand, the information provided is typically redundant and thus it facilitates semi-supervised learning. The presented care label recognition method is based on the recently published End-to-End Method for Multi-LanguageScene Text, E2E-MLT, Busta et al. 2018, exploiting specific constraints, e.g. a care label vocabulary with multi-language equivalences. Experiments conducted on a newly-created dataset of 63 care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above 0.6, confirming the challenging nature of the problem.

...read moreread less

Posted Content•

On Finding Gray Pixels

[...]

Yanlin Qian, Joni-Kristian Kamarainen, Jarno Nikkanen¹, Jiri Matas•Institutions (1)

Intel¹

09 Jan 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article proposed a grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation, which is derived using the Dichromatic Reflection Model and is learning-free.

...read moreread less

Abstract: We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep methods. GI is simple and fast, written in a few dozen lines of code, processing a 1080p image in ~0.4 seconds with a non-optimized Matlab code.

...read moreread less

Posted Content•

Flash Lightens Gray Pixels

[...]

Yanlin Qian, Song Yan, Joni-Kristian Kamarainen, Jiri Matas

27 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data, calibration of the flash, and a novel flash photography dataset generated from the MIT intrinsic dataset is introduced.

...read moreread less

Abstract: In the real world, a scene is usually cast by multiple illuminants and herein we address the problem of spatial illumination estimation. Our solution is based on detecting gray pixels with the help of flash photography. We show that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data or calibration of the flash. We also introduce a novel flash photography dataset generated from the MIT intrinsic dataset.

...read moreread less

Posted Content•

Rolling Shutter Camera Synchronization with Sub-millisecond Accuracy

[...]

Matej Smid¹, Jiri Matas¹•Institutions (1)

Czech Technical University in Prague¹

28 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a simple method for synchronization of video streams with a precision better than one millisecond is proposed, which is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video.

...read moreread less

Abstract: A simple method for synchronization of video streams with a precision better than one millisecond is proposed. The method is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video. The approach exploits the rolling shutter sensor property that every sensor row starts its exposure with a small delay after the onset of the previous row. The cameras may have different frame rates and resolutions, and need not have overlapping fields of view. The method was validated on five minutes of four streams from an ice hockey match. The found transformation maps events visible in all cameras to a reference time with a standard deviation of the temporal error in the range of 0.3 to 0.5 milliseconds. The quality of the synchronization is demonstrated on temporally and spatially overlapping images of a fast moving puck observed in two cameras.

...read moreread less

Proceedings Article•DOI•

ALFA: Agglomerative Late Fusion Algorithm for Object Detection

[...]

Evgenii Razinkov¹, Iuliia Saveleva¹, Jiri Matas²•Institutions (2)

Kazan Federal University¹, Czech Technical University in Prague²

13 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: ALFA as discussed by the authors is based on agglomerative clustering of object detector predictions taking into consideration both the bounding box locations and the class scores, and achieves state-of-the-art results on PASCAL VOC 2007 and PASCALSoc 2012, outperforming the individual detectors as well as baseline combination strategies.

...read moreread less

Abstract: We propose ALFA - a novel late fusion algorithm for object detection. ALFA is based on agglomerative clustering of object detector predictions taking into consideration both the bounding box locations and the class scores. Each cluster represents a single object hypothesis whose location is a weighted combination of the clustered bounding boxes. ALFA was evaluated using combinations of a pair (SSD and DeNet) and a triplet (SSD, DeNet and Faster R-CNN) of recent object detectors that are close to the state-of-the-art. ALFA achieves state of the art results on PASCAL VOC 2007 and PASCAL VOC 2012, outperforming the individual detectors as well as baseline combination strategies, achieving up to 32% lower error than the best individual detectors and up to 6% lower error than the reference fusion algorithm DBF - Dynamic Belief Fusion.

...read moreread less

Showing papers by "Jiri Matas published in 2019"