Face Sketch Matching via Coupled Deep Transform Learning

doi:10.1109/ICCV.2017.579

Home
/
Papers
/
Face Sketch Matching via Coupled Deep Transform Learning

Proceedings Article•DOI•

Face Sketch Matching via Coupled Deep Transform Learning

Shruti Nagpal¹, Maneet Singh², Richa Singh³, Mayank Vatsa¹, Afzel Noore³, Angshul Majumdar² - Show less +2 more•Institutions (3)

Indian Institute of Technology, Jodhpur¹, Indraprastha Institute of Information Technology², West Virginia University³

01 Oct 2017-pp 5429-5438

TL;DR: DeepTransformer as mentioned in this paper learns a transformation and mapping function between the features of two domains, which can be applied with any existing learned or hand-crafted feature and can be used for sketch-to-sketch matching.

read less

Abstract: Face sketch to digital image matching is an important challenge of face recognition that involves matching across different domains. Current research efforts have primarily focused on extracting domain invariant representations or learning a mapping from one domain to the other. In this research, we propose a novel transform learning based approach termed as DeepTransformer, which learns a transformation and mapping function between the features of two domains. The proposed formulation is independent of the input information and can be applied with any existing learned or hand-crafted feature. Since the mapping function is directional in nature, we propose two variants of DeepTransformer: (i) semi-coupled and (ii) symmetrically-coupled deep transform learning. This research also uses a novel IIIT-D Composite Sketch with Age (CSA) variations database which contains sketch images of 150 subjects along with age-separated digital photos. The performance of the proposed models is evaluated on a novel application of sketch-to-sketch matching, along with sketch-to-digital photo matching. Experimental results demonstrate the robustness of the proposed models in comparison to existing state-of-the-art sketch matching algorithms and a commercial face recognition system.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Deep Learning for Free-Hand Sketch: A Survey

[...]

Peng Xu

08 Jan 2020

TL;DR: A comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable can be found in this article, where the authors highlight the essential differences between sketch data and other data modalities, e.g., natural photos.

...read moreread less

Abstract: Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community. Finally, to support future sketch research and applications, we contribute TorchSketch -- the first sketch-oriented open-source deep learning library, which is built on PyTorch and available at this https URL.

...read moreread less

49 citations

Posted Content•

Learning Structure and Strength of CNN Filters for Small Sample Size Training

[...]

Rohit Keshari¹, Mayank Vatsa¹, Richa Singh¹, Afzel Noore²•Institutions (2)

Indraprastha Institute of Information Technology¹, Texas A&M University–Kingsville²

30 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: SSF-CNN is proposed which focuses on learning the "structure" and "strength" of filters and significantly reduces the number of parameters required for training while providing high accuracies on the test databases.

...read moreread less

Abstract: Convolutional Neural Networks have provided state-of-the-art results in several computer vision problems. However, due to a large number of parameters in CNNs, they require a large number of training samples which is a limiting factor for small sample size problems. To address this limitation, we propose SSF-CNN which focuses on learning the structure and strength of filters. The structure of the filter is initialized using a dictionary-based filter learning algorithm and the strength of the filter is learned using the small sample training data. The architecture provides the flexibility of training with both small and large training databases and yields good accuracies even with small size training data. The effectiveness of the algorithm is first demonstrated on MNIST, CIFAR10, and NORB databases, with a varying number of training samples. The results show that SSF-CNN significantly reduces the number of parameters required for training while providing high accuracies the test databases. On small sample size problems such as newborn face recognition and Omniglot, it yields state-of-the-art results. Specifically, on the IIITD Newborn Face Database, the results demonstrate improvement in rank-1 identification accuracy by at least 10%.

...read moreread less

48 citations

Proceedings Article•DOI•

Learning Structure and Strength of CNN Filters for Small Sample Size Training

[...]

Rohit Keshari¹, Mayank Vatsa¹, Richa Singh¹, Afzel Noore²•Institutions (2)

Indraprastha Institute of Information Technology¹, Texas A&M University–Kingsville²

18 Jun 2018

TL;DR: SSF-CNN as discussed by the authors uses dictionary-based filter learning to learn the structure and strength of the filter for small sample size problems such as newborn face recognition and Omniglot.

...read moreread less

Abstract: Convolutional Neural Networks have provided state-of-the-art results in several computer vision problems. However, due to a large number of parameters in CNNs, they require a large number of training samples which is a limiting factor for small sample size problems. To address this limitation, we propose SSF-CNN which focuses on learning the "structure" and "strength" of filters. The structure of the filter is initialized using a dictionary based filter learning algorithm and the strength of the filter is learned using the small sample training data. The architecture provides the flexibility of training with both small and large training databases, and yields good accuracies even with small size training data. The effectiveness of the algorithm is first demonstrated on MNIST, CIFAR10, and NORB databases, with varying number of training samples. The results show that SSF-CNN significantly reduces the number of parameters required for training while providing high accuracies on the test databases. On small sample size problems such as newborn face recognition and Omniglot, it yields state-of-the-art results. Specifically, on the IIITD Newborn Face Database, the results demonstrate improvement in rank-1 identification accuracy by at least 10%.

...read moreread less

40 citations

Posted Content•

Deep Learning for Free-Hand Sketch: A Survey and A Toolbox

[...]

Peng Xu, Timothy M. Hospedales, Qiyue Yin, Yi-Zhe Song, Tao Xiang, Liang Wang - Show less +2 more

08 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable.

...read moreread less

28 citations

Journal Article•DOI•

Learning Structural Representations via Dynamic Object Landmarks Discovery for Sketch Recognition and Retrieval

[...]

Hua Zhang¹, Peng She¹, Yong Liu¹, Jianhou Gan², Xiaochun Cao¹, Hassan Foroosh³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Yunnan Normal University², University of Central Florida³

19 Apr 2019-IEEE Transactions on Image Processing

TL;DR: A novel architecture to dynamically discover the object landmarks and learn the discriminative structural representations is proposed and compared with several state-of-the-art methods on two challenging datasets, TU-Berlin and Sketchy.

...read moreread less

Abstract: State-of-the-art methods on sketch classification and retrieval are based on deep convolutional neural network to learn representations. Although deep neural networks have the ability to model images with hierarchical representations by convolution kernels, they cannot automatically extract the structural representations of object categories in a human-perceptible way. Furthermore, sketch images usually have large-scale visual variations caused by the styles of drawing or viewpoints, which make it difficult to develop generalized representations using the fixed computational mode of convolutional kernel. In this paper, our aim is to address the problem of fixed computational mode in feature extraction process without extra supervision. We propose a novel architecture to dynamically discover the object landmarks and learn the discriminative structural representations. Our model is composed of two components: a representative landmark discovering module that localizes the key points on the object and a category-aware representation learning module that develops the category-specific features. Specifically, we develop a structure-aware offset layer to dynamically localize the representative landmarks, which is optimized based on the category labels without extra supervision. After that, a diversity branch is introduced to extract the global discriminative features for each category. Finally, we employ a multi-task loss function to develop an end-to-end trainable architecture. At testing time, we fuse all the predictions with different number of landmarks to achieve the final results. Through extensive experiments, we compare our model with several state-of-the-art methods on two challenging datasets, TU-Berlin and Sketchy, for sketch classification and retrieval, and the experimental results demonstrate the effectiveness of our proposed model.

...read moreread less

25 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

49,914 citations

Journal Article•DOI•

Learning the parts of objects by non-negative matrix factorization

[...]

Daniel D. Lee¹, H. Sebastian Seung¹, H. Sebastian Seung²•Institutions (2)

Alcatel-Lucent¹, Massachusetts Institute of Technology²

21 Oct 1999-Nature

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

...read moreread less

Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

...read moreread less

11,500 citations

Proceedings Article•DOI•

Deep face recognition

[...]

Omkar M. Parkhi¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: It is shown how a very large scale dataset can be assembled by a combination of automation and human in the loop, and the trade off between data purity and time is discussed.

...read moreread less

Abstract: The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets. We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

...read moreread less

5,308 citations