scispace - formally typeset
Search or ask a question
Author

Ashutosh Modi

Other affiliations: Siemens, Saarland University, The Walt Disney Company  ...read more
Bio: Ashutosh Modi is an academic researcher from Indian Institute of Technology Kanpur. The author has contributed to research in topics: SemEval & Task (project management). The author has an hindex of 14, co-authored 72 publications receiving 721 citations. Previous affiliations of Ashutosh Modi include Siemens & Saarland University.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 Jun 2018
TL;DR: This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge, where the best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.
Abstract: This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.

137 citations

Proceedings ArticleDOI
04 Apr 2019
TL;DR: This paper presents an affect-driven dialog system, which generates emotional responses in a controlled manner using a continuous representation of emotions by modeling emotions at a word and sequence level using a vector representation of the desired emotion.
Abstract: According to one implementation, an affect-driven dialog generation system includes a computing platform having a hardware processor and a system memory storing a software code including a sequence-to-sequence (seq2seq) architecture trained using a loss function having an affective regularizer term based on a difference in emotional content between a target dialog response and a dialog sequence determined by the seq2seq architecture during training. The hardware processor executes the software code to receive an input dialog sequence, and to use the seq2seq architecture to generate emotionally diverse dialog responses based on the input dialog sequence and a predetermined target emotion. The hardware processor further executes the software code to determine, using the seq2seq architecture, a final dialog sequence responsive to the input dialog sequence based on an emotional relevance of each of the emotionally diverse dialog responses, and to provide the final dialog sequence as an output.

86 citations

Proceedings ArticleDOI
01 Jun 2014
TL;DR: This work induces common sense knowledge about prototypical sequence of events in the form of graphs based on distributed representations of predicates and their arguments, and then these representations are used to predict prototypical event orderings.
Abstract: Induction of common sense knowledge about prototypical sequence of events has recently received much attention (e.g., Chambers and Jurafsky (2008); Regneri et al. (2010)). Instead of inducing this knowledge in the form of graphs, as in much of the previous work, in our method, distributed representations of event realizations are computed based on distributed representations of predicates and their arguments, and then these representations are used to predict prototypical event orderings. The parameters of the compositional process for computing the event representations and the ranking component of the model are jointly estimated. We show that this approach results in a substantial boost in performance on the event ordering task with respect to the previous approaches, both on natural and crowdsourced texts.

82 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A neural network model which relies on distributed compositional representations of events, which captures statistical dependencies between events in a scenario, overcomes some of the shortcomings of previous approaches and outperforms count-based counterparts on the narrative cloze task.
Abstract: Semantic scripts is a conceptual representation which defines how events are organized into higher level activities. Practically all the previous approaches to inducing script knowledge from text relied on count-based techniques (e.g., generative models) and have not attempted to compositionally model events. In this work, we introduce a neural network model which relies on distributed compositional representations of events. The model captures statistical dependencies between events in a scenario, overcomes some of the shortcomings of previous approaches (e.g., by more effectively dealing with data sparsity) and outperforms count-based counterparts on the narrative cloze task.

68 citations

Proceedings Article
14 Mar 2018
TL;DR: A large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge, and shows that the mode of data collection via crowdsourcing results in a substantial amount of inference questions.
Abstract: We introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.

66 citations


Cited by
More filters
Posted Content
TL;DR: This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.
Abstract: We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics.

1,456 citations

Proceedings Article
30 Apr 2020
TL;DR: This article proposed BERTScore, an automatic evaluation metric for text generation, which computes a similarity score for each token in the candidate sentence with each token from the reference sentence. But instead of exact matches, they compute token similarity using contextual embeddings.
Abstract: We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task and show that BERTScore is more robust to challenging examples compared to existing metrics.

819 citations

DOI
01 Jan 1969

791 citations

Journal ArticleDOI
TL;DR: The CoQA dataset as mentioned in this paper contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains, and the answers are free-form text with their corresponding evidence highlighted in the passage.
Abstract: Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa

720 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
Abstract: Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 55k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs, as they remove the paraphrase-and-entity-typing shortcuts available in prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literatures on this dataset and show that the best systems only achieve 38.4% F1 on our generalized accuracy metric, while expert human performance is 96%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.

364 citations