scispace - formally typeset
Search or ask a question

Showing papers by "Anurag Mittal published in 2020"


Posted Content
TL;DR: This work proposes a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces the dependency of the model on the language priors, and achieves state-of-the-art results on the bias-sensitive split of the VQAv2 dataset.
Abstract: Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer "what sport is" as "tennis" or "what color banana" as "yellow." This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. VGQE utilizes both visual and language modalities equally while encoding the question. Hence the question representation itself gets sufficient visual-grounding, and thus reduces the dependency of the model on the language priors. We demonstrate the effect of VGQE on three recent VQA models and achieve state-of-the-art results on the bias-sensitive split of the VQAv2 dataset; VQA-CPv2. Further, unlike the existing bias-reduction techniques, on the standard VQAv2 benchmark, our approach does not drop the accuracy; instead, it improves the performance.

26 citations


Proceedings ArticleDOI
01 Mar 2020
TL;DR: A generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR is proposed and a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.
Abstract: Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.

23 citations


Book ChapterDOI
23 Aug 2020
TL;DR: Visually-grounded question encoder (VGQE) as mentioned in this paper utilizes both visual and language modalities equally while encoding the question, and thus reduces the dependency of the model on the language priors.
Abstract: Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer “what sport is” as “tennis” or “what color banana” as “yellow.” This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. VGQE utilizes both visual and language modalities equally while encoding the question. Hence the question representation itself gets sufficient visual-grounding, and thus reduces the dependency of the model on the language priors. We demonstrate the effect of VGQE on three recent VQA models and achieve state-of-the-art results on the bias-sensitive split of the VQAv2 dataset; VQA-CPv2. Further, unlike the existing bias-reduction techniques, on the standard VQAv2 benchmark, our approach does not drop the accuracy; instead, it improves the performance.

12 citations


Posted Content
TL;DR: This paper presents a novel approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner and introduces a novel cross entropy loss that leverages pseudo labels from the teacher.
Abstract: Practical autonomous driving systems face two crucial challenges: memory constraints and domain gap issues. In this paper, we present a novel approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner. We term this as "Domain Adaptive Knowledge Distillation" and address the same in the context of unsupervised domain-adaptive semantic segmentation by proposing a multi-level distillation strategy to effectively distil knowledge at different levels. Further, we introduce a novel cross entropy loss that leverages pseudo labels from the teacher. These pseudo teacher labels play a multifaceted role towards: (i) knowledge distillation from the teacher network to the student network & (ii) serving as a proxy for the ground truth for target domain images, where the problem is completely unsupervised. We introduce four paradigms for distilling domain adaptive knowledge and carry out extensive experiments and ablation studies on real-to-real as well as synthetic-to-real scenarios. Our experiments demonstrate the profound success of our proposed method.

10 citations


Proceedings ArticleDOI
01 Mar 2020
TL;DR: This work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other.
Abstract: Divide and Conquer is a well-established approach in the literature that has efficiently solved a variety of problems. However, it is yet to be explored in full in solving image super-resolution. To predict a sharp up-sampled image, this work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other Half of these subproblems deal with predicting the overall features of the high-resolution image, while the remaining are exclusively for predicting the finer details. Additionally, a technique that is found to be more effective in calibrating the pixel intensities has been proposed. Results obtained on multiple datasets demonstrate the improved performance of the proposed wide and deep network over state-of-the-art methods.

9 citations


Journal ArticleDOI
TL;DR: A multi-stage neural network architecture ‘HFR-Net’ is proposed that works on the principle of ‘explicit refinement and fusion of high-frequency details’ that gives better results than the current state-of-the-art techniques.

9 citations


Posted Content
TL;DR: This work proposes an approach to divide the problem of image super-resolution into multiple subproblems and then solve/conquer them with the help of a neural network, and designs an alternate network architecture that is much wider than existing networks and is specially designed to implement the divide-and-conquer design paradigm with a Neural Network.
Abstract: Divide and conquer is an established algorithm design paradigm that has proven itself to solve a variety of problems efficiently. However, it is yet to be fully explored in solving problems with a neural network, particularly the problem of image super-resolution. In this work, we propose an approach to divide the problem of image super-resolution into multiple sub-problems and then solve/conquer them with the help of a neural network. Unlike a typical deep neural network, we design an alternate network architecture that is much wider (along with being deeper) than existing networks and is specially designed to implement the divide-and-conquer design paradigm with a neural network. Additionally, a technique to calibrate the intensities of feature map pixels is being introduced. Extensive experimentation on five datasets reveals that our approach towards the problem and the proposed architecture generate better and sharper results than current state-of-the-art methods.

4 citations


Proceedings ArticleDOI
01 Mar 2020
TL;DR: An upsampling network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’ is proposed and a novel technique named 2-phase progressive-retrogressive training is being proposed to train and implement this principle.
Abstract: A video super-resolution technique is expected to generate a ‘sharp’ upsampled video. The sharpness in the generated video comes from the precise prediction of the high-frequency details (e.g. object edges). Thus high-frequency prediction becomes a vital sub-problem of the super-resolution task. To generate a sharp-upsampled video, this paper proposes an upsampling network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’. To implement this principle and to train HFR-Net, a novel technique named 2-phase progressive-retrogressive training is being proposed. Additionally, a method called dual motion warping is also being introduced to preprocess the videos that have varying motion intensities (slow and fast). Results on multiple video datasets demonstrate the improved performance of our approach over the current state-of-the-art.

4 citations


Posted Content
Abstract: Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.

2 citations


Posted Content
TL;DR: This work presents an approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner and introduces a cross entropy loss that leverages pseudo labels from the teacher.
Abstract: Practical autonomous driving systems face two crucial challenges: memory constraints and domain gap issues. We present an approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner. We delve into this in the context of unsupervised domain-adaptive semantic segmentation and propose a multi-level distillation strategy to effectively distil knowledge at different levels. Further, we introduce a cross entropy loss that leverages pseudo labels from the teacher. These pseudo teacher labels play a multifaceted role towards: (i) knowledge distillation from the teacher network to the student network & (ii) serving as a proxy for the ground truth for target domain images, where the problem is completely unsupervised. We introduce four paradigms for distilling domain adaptive knowledge and carry out extensive experiments and ablation studies on real-to-real and synthetic-to-real scenarios. Our experiments demonstrate the profound success of our proposed method.

2 citations


Posted Content
TL;DR: The proposed Multi-Abstraction Refinement Network (MARNet) ensures an effective exchange of information between multi-level features to gain local and global contextual cues while effectively preserving them till the final layer.
Abstract: Representation learning from 3D point clouds is challenging due to their inherent nature of permutation invariance and irregular distribution in space. Existing deep learning methods follow a hierarchical feature extraction paradigm in which high-level abstract features are derived from low-level features. However, they fail to exploit different granularity of information due to the limited interaction between these features. To this end, we propose Multi-Abstraction Refinement Network (MARNet) that ensures an effective exchange of information between multi-level features to gain local and global contextual cues while effectively preserving them till the final layer. We empirically show the effectiveness of MARNet in terms of state-of-the-art results on two challenging tasks: Shape classification and Coarse-to-fine grained semantic segmentation. MARNet significantly improves the classification performance by 2% over the baseline and outperforms the state-of-the-art methods on semantic segmentation task.