scispace - formally typeset
Search or ask a question
Author

Shlok Kumar Mishra

Bio: Shlok Kumar Mishra is an academic researcher from University of Maryland, College Park. The author has contributed to research in topics: Similarity (geometry) & Support vector machine. The author has an hindex of 3, co-authored 8 publications receiving 20 citations. Previous affiliations of Shlok Kumar Mishra include Birla Institute of Technology, Mesra & Johns Hopkins University.

Papers
More filters
Posted Content
TL;DR: This work proposes a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers and leverages the large-scale open-source Google Natural Questions dataset to create the aforementioned long-answer AQG benchmark.
Abstract: Automatic question generation (AQG) has broad applicability in domains such as tutoring systems, conversational agents, healthcare literacy, and information retrieval. Existing efforts at AQG have been limited to short answer lengths of up to two or three sentences. However, several real-world applications require question generation from answers that span several sentences. Therefore, we propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers. We leverage the large-scale open-source Google Natural Questions dataset to create the aforementioned long-answer AQG benchmark. We empirically demonstrate that the performance of existing AQG methods significantly degrades as the length of the answer increases. Transformer-based methods outperform other existing AQG methods on long answers in terms of automatic as well as human evaluation. However, we still observe degradation in the performance of our best performing models with increasing sentence length, suggesting that long answer QA is a challenging benchmark task for future research.

11 citations

Posted Content
TL;DR: A new model for joint-based action recognition is presented, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning, and which outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.
Abstract: Most human action recognition systems typically consider static appearances and motion as independent streams of information. In this paper, we consider the evolution of human pose and propose a method to better capture interdependence among skeleton joints. Our model extracts motion information from each joint independently, reweighs the information and finally performs inter-joint reasoning. The effectiveness of pose and joint-based representations is strengthened using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics of the action. Our best model gives an absolute improvement of 8.19% on JHMDB, 4.31% on HMDB and 1.55 mAP on Charades datasets over state-of-the-art methods using pose heat-maps alone. Fusing with RGB and flow streams leads to improvement over state-of-the-art. Our model also outperforms the baseline on Mimetics, a dataset with out-of-context videos by 1.14% while using only pose heatmaps. Further, to filter out clips irrelevant for action recognition, we re-purpose our model for clip selection guided by pose information and show improved performance using fewer clips.

10 citations

Journal Article
TL;DR: This paper proposes to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture to address problems of self-supervised learning and suggests that this approach helps in learning better representations that better transfer.
Abstract: Recent works have shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information. In self-supervised learning, in particular, texture as a low-level cue may provide shortcuts that prevent the network from learning higher-level representations. To address these problems we propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture. This simple method helps retain important edge information and suppress texture at the same time. We report our observations for fully supervised and self-supervised learning tasks like MoCoV2 and Jigsaw and achieve state-of-the-art results on object detection and image classification with eight diverse datasets. Our method is particularly effective for transfer learning tasks and we observed improved performance on five standard transfer learning datasets. The large improvements on the Sketch-ImageNet dataset, DTD dataset and additional visual analyses of saliency maps suggest that our approach helps in learning better representations that transfer well.

6 citations

Book ChapterDOI
09 Dec 2015
TL;DR: This paper defined a technical domain question taxonomy containing six classes and identified a set of features suitable for the technical domain and employed the tree kernel and a level-wise matching approach to capture the parse tree similarity.
Abstract: This paper presents our attempt on developing a question classification system for technical domain. Question classification system classifies a question into the type of answer it requires and therefore plays an important role in question answering. Although the task is quite popular in general domain, we were unable to find any question classification system that classifies the questions of a technical subject. We defined a technical domain question taxonomy containing six classes. We manually created a dataset containing 1086 questions. Then we identified a set of features suitable for the technical domain. We observed that the parse structure similarity plays an important role in this classification. To capture the parse tree similarity we employed the tree kernel and we proposed a level-wise matching approach. We have used these features and dataset in a support vector machine classifier to achieve 93.22i¾?% accuracy.

4 citations

Posted Content
TL;DR: In this article, the authors used an image decomposition network to extract albedo and normal features from real and spoof face images to distinguish face spoofs generated by a photograph of a subject from live images.
Abstract: Presentation attack detection (PAD) is a critical component in secure face authentication. We present a PAD algorithm to distinguish face spoofs generated by a photograph of a subject from live images. Our method uses an image decomposition network to extract albedo and normal. The domain gap between the real and spoof face images leads to easily identifiable differences, especially between the recovered albedo maps. We enhance this domain gap by retraining existing methods using supervised contrastive loss. We present empirical and theoretical analysis that demonstrates that the contrast and lighting effects can play a significant role in PAD; these show up particularly in the recovered albedo. Finally, we demonstrate that by combining all of these methods we achieve state-of-the-art results on datasets such as CelebA-Spoof, OULU and CASIA-SURF.

Cited by
More filters
Posted Content
TL;DR: In this paper, the authors propose to represent videos as space-time region graphs which capture temporal shape dynamics and functional relationships between humans and objects, and perform reasoning on this graph representation via Graph Convolutional Networks.
Abstract: How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.

278 citations

Journal ArticleDOI
TL;DR: This study presents a hybrid approach to semantic feature extraction and lexical feature extraction, which achieves a coarse accuracy of 96% and fine accuracy of 90.4%, which is superior to existing methods.
Abstract: The purpose of question classification (QC) is to assign a question to an appropriate category from the set of predefined categories that constitute a question taxonomy. Selected question features are able to significantly improve the performance of QC. However, feature extraction, particularly syntax feature extraction, has a high computational cost. To maintain or enhance performance without syntax features, this study presents a hybrid approach to semantic feature extraction and lexical feature extraction. These features are generated by improved information gain and sequential pattern mining methods, respectively. Selected features are then fed into classifiers for questions classification. Benchmark testing is performed using the public UIUC data set. The results reveal that the proposed approach achieves a coarse accuracy of 96% and fine accuracy of 90.4%, which is superior to existing methods.

40 citations

Journal Article
TL;DR: The multiple‐choice question format has come to dominate large‐scale testing, and there are good reasons for its dominance, including the large number of questions, which makes it possible to test a broad range of content and provides a good sample of the test taker’s knowledge.
Abstract: The multiple‐choice question format has come to dominate large‐scale testing, and there are good reasons for its dominance. A test taker can answer a large number of multiple‐choice questions in a limited amount of testing time. The large number of questions makes it possible to test a broad range of content and provides a good sample of the test taker’s knowledge, reducing the effect of “the luck of the draw” (in the selection of questions) on the test taker’s score. The responses can be scored by machine, making the scoring process fast and inexpensive, with no room for differences of opinion.

25 citations

Journal ArticleDOI
TL;DR: A systematic review of the literature on automatic question classifiers and the technology directly involved revealed that SVM is the main algorithm of the Machine Learning used, while BOW and TF-IDF are the main techniques for feature extraction and selection, respectively.
Abstract: Question classification is a key point in many applications, such as Question Answering (QA, e.g., Yahoo! Answers), Information Retrieval (IR, e.g., Google search engine), and E-learning systems (e.g., Bloom's tax. classifiers). This paper aims to carry out a systematic review of the literature on automatic question classifiers and the technology directly involved. Automatic classifiers are responsible for labeling a certain evaluation item using a type of categorization as a selection criterion. The analysis of 80 primary studies previously selected revealed that SVM is the main algorithm of the Machine Learning used, while BOW and TF-IDF are the main techniques for feature extraction and selection, respectively. According to the analysis, the taxonomies proposed by Li and Roth and Bloom were the most used ones for the classification criteria, and Accuracy/Precision/Recall/F1-score were proven to be the most used metrics. In the future, the objective is to perform a meta-analysis with the studies that authorize the availability of their data.

18 citations