Radial Loss for Learning Fine-grained Video Similarity Metric

doi:10.1109/ICASSP.2019.8683003

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from Scanned Documents

[...]

Shashank Mujumdar¹, Nitin Gupta¹, Abhinav Jain¹, Douglas Burdick¹•Institutions (1)

IBM¹

01 Sep 2019

TL;DR: This paper proposes to combine the OCR performance into the loss function during network training and results in the generation of high resolution text images that achieve high O CR performance that is comparable to the ground truth high-resolution text images and surpassing those of the SOA baseline results.

...read moreread less

Abstract: Convolutional neural networks are shown to achieve breakthrough performance for the task of single image super resolution (SISR) for natural images. These state-of-the-art (SOA) networks have been adapted to the task of single text image super resolution and have been shown to boost the optical character recognition (OCR) performance. However, these approaches depend on variations of the standard mean squared error (MSE) loss in order to train the SR network for improving the text image quality which does not guarantee optimal OCR performance. In this paper, we propose to combine the OCR performance into the loss function during network training. This results in the generation of high resolution text images that achieve high OCR performance that is comparable to the ground truth high-resolution text images and surpassing those of the SOA baseline results. We define novel intuitive metrics to capture the improvement in the OCR performance and provide extensive experiments to qualitatively and quantitatively assess improvement in the results of our proposed approach against the SOA baselines on the standard UNLV dataset.

...read moreread less

4 citations

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•DOI•

FaceNet: A unified embedding for face recognition and clustering

[...]

Florian Schroff¹, Dmitry Kalenichenko¹, James Philbin¹•Institutions (1)

Google¹

07 Jun 2015

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.

...read moreread less

Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

...read moreread less

8,289 citations

Proceedings Article•DOI•

Learning Spatiotemporal Features with 3D Convolutional Networks

[...]

Du Tran¹, Du Tran², Lubomir Bourdev², Rob Fergus², Lorenzo Torresani¹, Manohar Paluri² - Show less +2 more•Institutions (2)

Dartmouth College¹, Facebook²

07 Dec 2015

TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.

...read moreread less

Abstract: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets, 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets, and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

...read moreread less

7,091 citations

"Radial Loss for Learning Fine-grain..." refers methods in this paper

...Fine-grained Video Retrieval: (i) 3D-CNN [18], (ii) Triplet [19], (iii) Quadruplet-1 [4] (use loss function and quadruplet sampling strategy), (iv) Quadruplet-2(a,b) [20] (use loss function and two sampling strategies - where negative samples come from (a) both negative & intermediate categories, and (b) only from the negative class)....
[...]

Posted Content•

Learning Spatiotemporal Features with 3D Convolutional Networks

[...]

Du Tran¹, Du Tran², Lubomir Bourdev², Rob Fergus², Lorenzo Torresani¹, Manohar Paluri² - Show less +2 more•Institutions (2)

Dartmouth College¹, Facebook²

02 Dec 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors proposed a simple and effective approach for spatio-temporal feature learning using deep 3D convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

...read moreread less

Abstract: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets; 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets; and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

...read moreread less

3,786 citations

Posted Content•

Learning Fine-grained Image Similarity with Deep Ranking

[...]

Jiang Wang¹, Yang Song, Thomas Leung, Charles J. Rosenberg, Jinbin Wang, James Philbin, Bo Chen², Ying Wu¹ - Show less +4 more•Institutions (2)

Northwestern University¹, California Institute of Technology²

17 Apr 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep ranking model that employs deep learning techniques to learn similarity metric directly from images has higher learning capability than models based on hand-crafted features and deep classification models.

...read moreread less

Abstract: Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class image differences. This paper proposes a deep ranking model that employs deep learning techniques to learn similarity metric directly from this http URL has higher learning capability than models based on hand-crafted features. A novel multiscale network structure has been developed to describe the images effectively. An efficient triplet sampling algorithm is proposed to learn the model with distributed asynchronized stochastic gradient. Extensive experiments show that the proposed algorithm outperforms models based on hand-crafted visual features and deep classification models.

...read moreread less

967 citations

"Radial Loss for Learning Fine-grain..." refers methods in this paper

...Fine-grained Video Retrieval: (i) 3D-CNN [18], (ii) Triplet [19], (iii) Quadruplet-1 [4] (use loss function and quadruplet sampling strategy), (iv) Quadruplet-2(a,b) [20] (use loss function and two sampling strategies - where negative samples come from (a) both negative & intermediate categories, and (b) only from the negative class)....
[...]
...Coarse grained video/clip similarity: (a) Triplet Precision (TP) relaxes the hard constraint of order preserving across 1Dataset Link: https://gofile.io/?c=8aphlD positive and intermediate samples....
[...]

Radial Loss for Learning Fine-grained Video Similarity Metric

Citations

References

"Radial Loss for Learning Fine-grain..." refers methods in this paper

"Radial Loss for Learning Fine-grain..." refers methods in this paper

Related Papers (5)