Showing papers by "Anurag Mittal published in 2020"

PDF

Open Access

Posted Content•

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

[...]

Gouthaman Kv¹, Anurag Mittal¹•Institutions (1)

13 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces the dependency of the model on the language priors, and achieves state-of-the-art results on the bias-sensitive split of the VQAv2 dataset.

...read moreread less

Abstract: Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer "what sport is" as "tennis" or "what color banana" as "yellow." This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. VGQE utilizes both visual and language modalities equally while encoding the question. Hence the question representation itself gets sufficient visual-grounding, and thus reduces the dependency of the model on the language priors. We demonstrate the effect of VGQE on three recent VQA models and achieve state-of-the-art results on the bias-sensitive split of the VQAv2 dataset; VQA-CPv2. Further, unlike the existing bias-reduction techniques, on the standard VQAv2 benchmark, our approach does not drop the accuracy; instead, it improves the performance.

...read moreread less

26 citations

Proceedings Article•DOI•

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

[...]

Anubha Pandey¹, Ashish Mishra¹, Vinay Kumar Verma², Anurag Mittal¹, Hema A. Murthy¹ - Show less +1 more•Institutions (2)

Indian Institute of Technology Madras¹, Indian Institute of Technology Kanpur²

01 Mar 2020

TL;DR: A generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR is proposed and a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.

...read moreread less

Abstract: Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.

...read moreread less

23 citations

Book Chapter•DOI•

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

[...]

Gouthaman Kv¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

23 Aug 2020

TL;DR: Visually-grounded question encoder (VGQE) as mentioned in this paper utilizes both visual and language modalities equally while encoding the question, and thus reduces the dependency of the model on the language priors.

...read moreread less

Abstract: Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer “what sport is” as “tennis” or “what color banana” as “yellow.” This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. VGQE utilizes both visual and language modalities equally while encoding the question. Hence the question representation itself gets sufficient visual-grounding, and thus reduces the dependency of the model on the language priors. We demonstrate the effect of VGQE on three recent VQA models and achieve state-of-the-art results on the bias-sensitive split of the VQAv2 dataset; VQA-CPv2. Further, unlike the existing bias-reduction techniques, on the standard VQAv2 benchmark, our approach does not drop the accuracy; instead, it improves the performance.

...read moreread less

12 citations

Posted Content•

Domain Adaptive Knowledge Distillation for Driving Scene Semantic Segmentation

[...]

Divya Kothandaraman¹, Athira M. Nambiar¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

03 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a novel approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner and introduces a novel cross entropy loss that leverages pseudo labels from the teacher.

...read moreread less

Abstract: Practical autonomous driving systems face two crucial challenges: memory constraints and domain gap issues. In this paper, we present a novel approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner. We term this as "Domain Adaptive Knowledge Distillation" and address the same in the context of unsupervised domain-adaptive semantic segmentation by proposing a multi-level distillation strategy to effectively distil knowledge at different levels. Further, we introduce a novel cross entropy loss that leverages pseudo labels from the teacher. These pseudo teacher labels play a multifaceted role towards: (i) knowledge distillation from the teacher network to the student network & (ii) serving as a proxy for the ground truth for target domain images, where the problem is completely unsupervised. We introduce four paradigms for distilling domain adaptive knowledge and carry out extensive experiments and ablation studies on real-to-real as well as synthetic-to-real scenarios. Our experiments demonstrate the profound success of our proposed method.

...read moreread less

10 citations

Proceedings Article•DOI•

Going Much Wider with Deep Networks for Image Super-Resolution

[...]

Vikram Singh¹, Keerthan Ramnath¹, Subrahmanyam Arunachalam¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Mar 2020

TL;DR: This work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other.

...read moreread less

Abstract: Divide and Conquer is a well-established approach in the literature that has efficiently solved a variety of problems. However, it is yet to be explored in full in solving image super-resolution. To predict a sharp up-sampled image, this work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other Half of these subproblems deal with predicting the overall features of the high-resolution image, while the remaining are exclusively for predicting the finer details. Additionally, a technique that is found to be more effective in calibrating the pixel intensities has been proposed. Results obtained on multiple datasets demonstrate the improved performance of the proposed wide and deep network over state-of-the-art methods.

...read moreread less

9 citations

Journal Article•DOI•

Refining high-frequencies for sharper super-resolution and deblurring

[...]

Vikram Singh¹, Keerthan Ramnath¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Oct 2020-Computer Vision and Image Understanding

TL;DR: A multi-stage neural network architecture ‘HFR-Net’ is proposed that works on the principle of ‘explicit refinement and fusion of high-frequency details’ that gives better results than the current state-of-the-art techniques.

...read moreread less

9 citations

Posted Content•

WDN: A Wide and Deep Network to Divide-and-Conquer Image Super-resolution.

[...]

Vikram Singh¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

07 Oct 2020-arXiv: Image and Video Processing

TL;DR: This work proposes an approach to divide the problem of image super-resolution into multiple subproblems and then solve/conquer them with the help of a neural network, and designs an alternate network architecture that is much wider than existing networks and is specially designed to implement the divide-and-conquer design paradigm with a Neural Network.

...read moreread less

Abstract: Divide and conquer is an established algorithm design paradigm that has proven itself to solve a variety of problems efficiently. However, it is yet to be fully explored in solving problems with a neural network, particularly the problem of image super-resolution. In this work, we propose an approach to divide the problem of image super-resolution into multiple sub-problems and then solve/conquer them with the help of a neural network. Unlike a typical deep neural network, we design an alternate network architecture that is much wider (along with being deeper) than existing networks and is specially designed to implement the divide-and-conquer design paradigm with a neural network. Additionally, a technique to calibrate the intensities of feature map pixels is being introduced. Extensive experimentation on five datasets reveals that our approach towards the problem and the proposed architecture generate better and sharper results than current state-of-the-art methods.

...read moreread less

4 citations

Proceedings Article•DOI•

High-Frequency Refinement for Sharper Video Super-Resolution

[...]

Vikram Singh¹, Akshay Sharma¹, Sudharshann Devanathan¹, Anurag Mittal¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Mar 2020

TL;DR: An upsampling network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’ is proposed and a novel technique named 2-phase progressive-retrogressive training is being proposed to train and implement this principle.

...read moreread less

Abstract: A video super-resolution technique is expected to generate a ‘sharp’ upsampled video. The sharpness in the generated video comes from the precise prediction of the high-frequency details (e.g. object edges). Thus high-frequency prediction becomes a vital sub-problem of the super-resolution task. To generate a sharp-upsampled video, this paper proposes an upsampling network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’. To implement this principle and to train HFR-Net, a novel technique named 2-phase progressive-retrogressive training is being proposed. Additionally, a method called dual motion warping is also being introduced to preprocess the videos that have varying motion intensities (slow and fast). Results on multiple video datasets demonstrate the improved performance of our approach over the current state-of-the-art.

...read moreread less

4 citations

Posted Content•

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

[...]

Anubha Pandey¹, Ashish Mishra¹, Vinay Kumar Verma², Anurag Mittal¹, Hema A. Murthy¹ - Show less +1 more•Institutions (2)

Indian Institute of Technology Madras¹, Indian Institute of Technology Kanpur²

18 Jan 2020-arXiv: Computer Vision and Pattern Recognition

...read moreread less

2 citations

Posted Content•

Unsupervised Domain Adaptive Knowledge Distillation for Semantic Segmentation

[...]

Divya Kothandaraman, Athira M. Nambiar, Anurag Mittal

03 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents an approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner and introduces a cross entropy loss that leverages pseudo labels from the teacher.

...read moreread less

Abstract: Practical autonomous driving systems face two crucial challenges: memory constraints and domain gap issues. We present an approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner. We delve into this in the context of unsupervised domain-adaptive semantic segmentation and propose a multi-level distillation strategy to effectively distil knowledge at different levels. Further, we introduce a cross entropy loss that leverages pseudo labels from the teacher. These pseudo teacher labels play a multifaceted role towards: (i) knowledge distillation from the teacher network to the student network & (ii) serving as a proxy for the ground truth for target domain images, where the problem is completely unsupervised. We introduce four paradigms for distilling domain adaptive knowledge and carry out extensive experiments and ablation studies on real-to-real and synthetic-to-real scenarios. Our experiments demonstrate the profound success of our proposed method.

...read moreread less

2 citations

Posted Content•

MARNet: Multi-Abstraction Refinement Network for 3D Point Cloud Analysis.

[...]

Rahul Chakwate, Arulkumar Subramaniam, Anurag Mittal

02 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed Multi-Abstraction Refinement Network (MARNet) ensures an effective exchange of information between multi-level features to gain local and global contextual cues while effectively preserving them till the final layer.

...read moreread less

Abstract: Representation learning from 3D point clouds is challenging due to their inherent nature of permutation invariance and irregular distribution in space. Existing deep learning methods follow a hierarchical feature extraction paradigm in which high-level abstract features are derived from low-level features. However, they fail to exploit different granularity of information due to the limited interaction between these features. To this end, we propose Multi-Abstraction Refinement Network (MARNet) that ensures an effective exchange of information between multi-level features to gain local and global contextual cues while effectively preserving them till the final layer. We empirically show the effectiveness of MARNet in terms of state-of-the-art results on two challenging tasks: Shape classification and Coarse-to-fine grained semantic segmentation. MARNet significantly improves the classification performance by 2% over the baseline and outperforms the state-of-the-art methods on semantic segmentation task.

...read moreread less