scispace - formally typeset
Search or ask a question
Author

Saroj Kaushik

Bio: Saroj Kaushik is an academic researcher from Indian Institute of Technology Delhi. The author has contributed to research in topics: Location-based service & Context (language use). The author has an hindex of 13, co-authored 64 publications receiving 402 citations. Previous affiliations of Saroj Kaushik include Indian Institutes of Technology & Shiv Nadar University.


Papers
More filters
Proceedings Article
06 Jan 2007
TL;DR: This paper presents a technique to automatically select the answer templates corresponding to a customer query email given a set of query-response email pairs and finds the associations between the actual questions and answers within them and uses this information to map future questions to their answer templates.
Abstract: Contact center agents typically respond to email queries from customers by selecting predefined answer templates that relate to the questions present in the customer query In this paper we present a technique to automatically select the answer templates corresponding to a customer query email Given a set of query-response email pairs we find the associations between the actual questions and answers within them and use this information to map future questions to their answer templates We evaluate the system on a small subset of the publicly available Pine-Info discussion list email archive and also on actual contact center data comprising customer queries, agent responses and templates

48 citations

Book ChapterDOI
26 Mar 2018
TL;DR: T-PAN as discussed by the authors uses a two-phase approach to detect the stance of the text content with respect to a given topic: in the first phase, the subjectivity of a given tweet is classified, and in the second phase, sentiment of the subjective tweets is classified by whether a given subjective tweet has a favor or against stance towards the topic.
Abstract: The topical stance detection problem addresses detecting the stance of the text content with respect to a given topic: whether the sentiment of the given text content is in favor of (positive), is against (negative), or is none (neutral) towards the given topic. Using the concept of attention, we develop a two-phase solution. In the first phase, we classify subjectivity - whether a given tweet is neutral or subjective with respect to the given topic. In the second phase, we classify sentiment of the subjective tweets (ignoring the neutral tweets) - whether a given subjective tweet has a favor or against stance towards the topic. We propose a Long Short-Term memory (LSTM) based deep neural network for each phase, and embed attention at each of the phases. On the SemEval 2016 stance detection Twitter task dataset [7], we obtain a best-case macro F-score of 68.84% and a best-case accuracy of 60.2%, outperforming the existing deep learning based solutions. Our framework, T-PAN, is the first in the topical stance detection literature, that uses deep learning within a two-phase architecture.

48 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This paper addresses the problem of detecting the stance of given tweets, with respect to given topics, from user-generated text (tweets), using the SemEval 2016 stance detection task dataset and develops a two-phase feature-driven model.
Abstract: The problem of stance detection from Twitter tweets, has recently gained significant research attention. This paper addresses the problem of detecting the stance of given tweets, with respect to given topics, from user-generated text (tweets). We use the SemEval 2016 stance detection task dataset. The labels comprise of positive, negative and neutral stances, with respect to given topics. We develop a two-phase feature-driven model. First, the tweets are classified as neutral vs. non-neutral. Next, non-neutral tweets are classified as positive vs. negative. The first phase of our work draws inspiration from the subjectivity classification and the second phase from the sentiment classification literature. We propose the use of two novel features, which along with our streamlined approach, plays a key role deriving the strong results that we obtain. We use traditional support vector machine (SVM) based machine learning. Our system (F-score: 74.44 for SemEval 2016 Task A and 61.57 for Task B) significantly outperforms the state of the art (F-score: 68.98 for Task A and 56.28 for Task B). While the performance of the system on Task A shows the effectiveness of our model for targets on which the model was trained upon, the performance of the system on Task B shows the generalization that our model achieves. The stance detection problem in Twitter is applicable for user opinion mining related applications and other social influence and information flow modeling applications, in real life.

32 citations

Proceedings Article
01 Dec 2016
TL;DR: This paper proposes a set of features that, although well-known in the NLP literature for solving other problems, have not been explored for detecting paraphrase or semantic similarity, on noisy user-generated short-text data such as Twitter, and applies support vector machine (SVM) based learning.
Abstract: Existing systems deliver high accuracy and F1-scores for detecting paraphrase and semantic similarity on traditional clean-text corpus. For instance, on the clean-text Microsoft Paraphrase benchmark database, the existing systems attain an accuracy as high as 0:8596. However, existing systems for detecting paraphrases and semantic similarity on user-generated short-text content on microblogs such as Twitter, comprising of noisy and ad hoc short-text, needs significant research attention. In this paper, we propose a machine learning based approach towards this. We propose a set of features that, although well-known in the NLP literature for solving other problems, have not been explored for detecting paraphrase or semantic similarity, on noisy user-generated short-text data such as Twitter. We apply support vector machine (SVM) based learning. We use the benchmark Twitter paraphrase data, released as a part of SemEval 2015, for experiments. Our system delivers a paraphrase detection F1-score of 0.717 and semantic similarity detection F1-score of 0.741, thereby significantly outperforming the existing systems, that deliver F1-scores of 0.696 and 0.724 for the two problems respectively. Our features also allow us to obtain a rank among the top-10, when trained on the Microsoft Paraphrase corpus and tested on the corresponding test data, thereby empirically establishing our approach as ubiquitous across the different paraphrase detection databases.

29 citations

Journal ArticleDOI
TL;DR: This paper presents machine learning approach for the classification of Demonstrative Pronouns for Indirect Anaphora in Hindi corpus and suggests looking for certain patterns following the indirect anaphor and marking demonstrative pronoun as directly or indirectly anaphoric accordingly.
Abstract: In this paper, we present machine learning approach for the classification indirect anaphora in Hindi corpus. The direct anaphora is able to find the noun phrase antecedent within a sentence or across few sentences. On the other hand indirect anaphora does not have explicit referent in the discourse. We suggest looking for certain patterns following the indirect anaphor and marking demonstrative pronoun as directly or indirectly anaphoric accordingly. Our focus of study is pronouns without noun phrase antecedent. We analyzed 177 news items having 1334 sentences, 780 demonstrative pronouns of which 97 (12.44 %) were indirectly anaphoric. The experiment with machine learning approaches for the classification of these pronouns based on the semantic cue provided by the collocation patterns following the pronoun is also carried out.

24 citations


Cited by
More filters
01 Jan 2016
TL;DR: This mathematical epidemiology of infectious diseases model building analysis and interpretation shows how people cope with some malicious virus inside their desktop computer, instead of enjoying a good book with a cup of tea in the afternoon.
Abstract: Thank you for reading mathematical epidemiology of infectious diseases model building analysis and interpretation. As you may know, people have search hundreds times for their favorite novels like this mathematical epidemiology of infectious diseases model building analysis and interpretation, but end up in infectious downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they cope with some malicious virus inside their desktop computer.

466 citations

Book ChapterDOI
02 Sep 2008
TL;DR: This work evaluated fourteen existing text similarity measures which have been used to calculate similarity score between sentences in many text applications, and found three of them to be inadequate.
Abstract: The ability to accurately judge the similarity between natural language sentences is critical to the performance of several applications such as text mining, question answering, and text summarization Given two sentences, an effective similarity measure should be able to determine whether the sentences are semantically equivalent or not, taking into account the variability of natural language expression That is, the correct similarity judgment should be made even if the sentences do not share similar surface form In this work, we evaluate fourteen existing text similarity measures which have been used to calculate similarity score between sentences in many text applications The evaluation is conducted on three different data sets, TREC9 question variants, Microsoft Research paraphrase corpus, and the third recognizing textual entailment data set

223 citations

Journal ArticleDOI
TL;DR: This paper divides the diffusion models into two categories—explanatory models and predictive models—in which the former includes epidemics and influence models and the latter includes independent cascade, linear threshold, and game theory models.
Abstract: By now, personal life has been invaded by online social networks (OSNs) everywhere. They intend to move more and more offline lives to online social networks. Therefore, online social networks can reflect the structure of offline human society. A piece of information can be exchanged or diffused between individuals in social networks. From this diffusion process, lots of latent information can be mined. It can be used for market predicting, rumor controlling, and opinion monitoring among other things. However, the research of these applications depends on the diffusion models and methods. For this reason, we survey various information diffusion models from recent decades. From a research process view, we divide the diffusion models into two categories—explanatory models and predictive models—in which the former includes epidemics and influence models and the latter includes independent cascade, linear threshold, and game theory models. The purpose of this paper is to investigate the research methods and techniques, and compare them according to the above categories. The whole research structure of the information diffusion models based on our view is given. There is a discussion at the end of each section, detailing related models that are mentioned in the literature. We conclude that these two models are not independent, they always complement each other. Finally, the issues of the social networks research are discussed and summarized, and directions for future study are proposed.

163 citations

Journal ArticleDOI
TL;DR: A survey of stance detection in social media posts and (online) regular texts is presented and it is hoped that this newly emerging topic will act as a significant resource for interested researchers and practitioners.
Abstract: Automatic elicitation of semantic information from natural language texts is an important research problem with many practical application areas. Especially after the recent proliferation of online content through channels such as social media sites, news portals, and forums; solutions to problems such as sentiment analysis, sarcasm/controversy/veracity/rumour/fake news detection, and argument mining gained increasing impact and significance, revealed with large volumes of related scientific publications. In this article, we tackle an important problem from the same family and present a survey of stance detection in social media posts and (online) regular texts. Although stance detection is defined in different ways in different application settings, the most common definition is “automatic classification of the stance of the producer of a piece of text, towards a target, into one of these three classes: {Favor, Against, Neither}.” Our survey includes definitions of related problems and concepts, classifications of the proposed approaches so far, descriptions of the relevant datasets and tools, and related outstanding issues. Stance detection is a recent natural language processing topic with diverse application areas, and our survey article on this newly emerging topic will act as a significant resource for interested researchers and practitioners.

131 citations

Journal ArticleDOI
TL;DR: An analysis of the papers focused at boosting the current developments in fuzzy-based recommender systems, indexed in Thomson Reuters Web of Science database, in terms of they key features, evaluation strategies, datasets employed, and application areas is developed.
Abstract: Recommender systems are currently successful solutions for facilitating access for online users to the information that fits their preferences and needs in overloaded search spaces. In the last years several methodologies have been developed to improve their performance. This paper is focused on developing a review on the use of fuzzy tools in recommender systems, for detecting the more common research topics and also the research gaps, in order to suggest future research lines for boosting the current developments in fuzzy-based recommender systems. Specifically, it is developed an analysis of the papers focused at such aim, indexed in Thomson Reuters Web of Science database, in terms of they key features, evaluation strategies, datasets employed, and application areas.

127 citations