scispace - formally typeset
Search or ask a question
Author

Manjira Sinha

Bio: Manjira Sinha is an academic researcher from Indian Institute of Technology Kharagpur. The author has contributed to research in topics: Mental lexicon & Sentence. The author has an hindex of 8, co-authored 51 publications receiving 198 citations. Previous affiliations of Manjira Sinha include Indian Institutes of Technology & Accenture.

Papers published on a yearly basis

Papers
More filters
Proceedings Article
01 Dec 2012
TL;DR: This paper presents the first ever definitive readability models for these languages incorporating their salient features, including their salient stru ctural features, for each Bangla and Hindi.
Abstract: In this paper we present computational models to compute readability of Indian language text documents. We first demonstrate the inadequacy and the consequent inap plicability of some of the popular readability metrics in English to Hindi and Bangla. Next, we present user experiments to identify important structural parameters of Bangla and Hindi that affect readability of texts in these two languages. Accordingly, we propose two different readability models for each Bangla and Hindi. The models are tested against a second round of user studies with completely new set of data. The results validate the propose models. Compar ed to the handful of existing works in Hindi and Bangla text readability, this pap er presents the first ever definitive readability models for these languages incorporating their salient stru ctural features.

31 citations

Proceedings ArticleDOI
11 Jan 2018
TL;DR: This paper proposes using a rich non-traditional set of features such as medical semantic relations, stance vectors, sentiment polarity, textual entailment, and study their impact on MPCHI stance classification using an SVM and a neural network classifier, finding that using novel non- traditional features improves MPCHi stance classification performance over traditional BoW model.
Abstract: While search engines are effective in answering direct factual questions such as, 'What are the symptoms of a disease X?', they are not so effective in addressing complex consumer health queries, which do not have a single definitive answer, such as, 'Is treatment X effective for disease Y?'. Instead, the users are presented with a vast number of search results with often contradictory perspectives and no definitive answer. We denote such queries as Multi-Perspective Consumer Health Information (MPCHI) queries for which there is no single 'Yes or No' answer. While ascertaining the credibility of the claims requires domain expertise, an efficient categorization of the search results according to their stance (support or oppose) to the queries will help the searcher in decision making. Hence, this paper focuses on the problem of stance classification for MPCHI data at sentence level, presenting a new data set for MPCHI queries. Unlike typical debate or argumentative text, the linguistic characteristics of MPCHI is quite different, with extensive use of scientific formal language and absence of opinion bearing words. Hence, such inherently different characteristic of MPCHI text requires going beyond traditional Bag of Words (BoW) features for stance classification. Hence, we propose using a rich non-traditional set of features such as medical semantic relations, stance vectors, sentiment polarity, textual entailment, and study their impact on MPCHI stance classification using an SVM and a neural network classifier. We find that using novel non-traditional features improves MPCHI stance classification performance over traditional BoW model by 24% for the SVM classifier, and 44% for the neural network classifier respectively, for the best feature combination.

26 citations

Posted Content
TL;DR: This paper represents a news article as a 3-mode tensor of the structure - and proposes a tensor factorization based method to encode the news article in a latent embedding space preserving the community structure.
Abstract: Detecting whether a news article is fake or genuine is a crucial task in today's digital world where it's easy to create and spread a misleading news article. This is especially true of news stories shared on social media since they don't undergo any stringent journalistic checking associated with main stream media. Given the inherent human tendency to share information with their social connections at a mouse-click, fake news articles masquerading as real ones, tend to spread widely and virally. The presence of echo chambers (people sharing same beliefs) in social networks, only adds to this problem of wide-spread existence of fake news on social media. In this paper, we tackle the problem of fake news detection from social media by exploiting the very presence of echo chambers that exist within the social network of users to obtain an efficient and informative latent representation of the news article. By modeling the echo-chambers as closely-connected communities within the social network, we represent a news article as a 3-mode tensor of the structure - and propose a tensor factorization based method to encode the news article in a latent embedding space preserving the community structure. We also propose an extension of the above method, which jointly models the community and content information of the news article through a coupled matrix-tensor factorization framework. We empirically demonstrate the efficacy of our method for the task of Fake News Detection over two real-world datasets. Further, we validate the generalization of the resulting embeddings over two other auxiliary tasks, namely: \textbf{1)} News Cohort Analysis and \textbf{2)} Collaborative News Recommendation. Our proposed method outperforms appropriate baselines for both the tasks, establishing its generalization.

26 citations

Book ChapterDOI
23 May 2017
TL;DR: This work proposes two novel weakly supervised approaches for detecting fine-grained emotions in contact center chat utterances in real time and proposes a neural net based method for emotion prediction in call center chats that does not require extensive feature engineering.
Abstract: Contact center chats are textual conversations involving customers and agents on queries, issues, grievances etc. about products and services. Contact centers conduct periodic analysis of these chats to measure customer satisfaction, of which the chat emotion forms one crucial component. Typically, these measures are performed at chat level. However, retrospective chat-level analysis is not sufficiently actionable for agents as it does not capture the variation in the emotion distribution across the chat. Towards that, we propose two novel weakly supervised approaches for detecting fine-grained emotions in contact center chat utterances in real time. In our first approach, we identify novel contextual and meta features and treat the task of emotion prediction as a sequence labeling problem. In second approach, we propose a neural net based method for emotion prediction in call center chats that does not require extensive feature engineering. We establish the effectiveness of the proposed methods by empirically evaluating them on a real-life contact center chat dataset. We achieve average accuracy of the order 72.6% with our first approach and 74.38% with our second approach respectively.

24 citations

Proceedings ArticleDOI
28 Aug 2018
TL;DR: In this article, a tensor factorization based method was proposed to encode the news article in a latent embedding space preserving the community structure of echo-chambers in social networks.
Abstract: In this paper, we tackle the problem of fake news detection from social media by exploiting the presence of echo chamber communities (communities sharing same beliefs) that exist within the social network of the users By modeling the echo-chambers as closely-connected communities within the social network, we represent a news article as a 3-mode tensor of the structure - and propose a tensor factorization based method to encode the news article in a latent embedding space preserving the community structure We also propose an extension of the above method, which jointly models the community and content information of the news article through a coupled matrix-tensor factorization framework We empirically demonstrate the efficacy of our method for the task of Fake News Detection over two real-world datasets

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
12 Jan 2018
TL;DR: It is recommended that a properly powered reaction time experiment with repeated measures has at least 1,600 word observations per condition, considerably more than current practice, and it is shown that researchers must include the number of observations in meta-analyses.
Abstract: In psychology, attempts to replicate published findings are less successful than expected. For properly powered studies replication rate should be around 80%, whereas in practice less than 40% of the studies selected from different areas of psychology can be replicated. Researchers in cognitive psychology are hindered in estimating the power of their studies, because the designs they use present a sample of stimulus materials to a sample of participants, a situation not covered by most power formulas. To remedy the situation, we review the literature related to the topic and introduce recent software packages, which we apply to the data of two masked priming studies with high power. We checked how we could estimate the power of each study and how much they could be reduced to remain powerful enough. On the basis of this analysis, we recommend that a properly powered reaction time experiment with repeated measures has at least 1,600 word observations per condition (e.g., 40 participants, 40 stimuli). This is considerably more than current practice. We also show that researchers must include the number of observations in meta-analyses because the effect sizes currently reported depend on the number of stimuli presented to the participants. Our analyses can easily be applied to new datasets gathered.

597 citations

01 Jan 1985
TL;DR: This paper discusses one aspect of the lexicon, namely its morphological organization, and describes this kind of lexicon system as most suited for describing suffixing languages with a comparatively high degree of agglutination.
Abstract: 1. In trod u otion In this paper, I w ill discuss one aspect of the lexicon, namely its morphological organization. For about two years I have been working with Koakenniemi's twolevel model (Koakenniemi 1983)i on a Polish two-level description. In this work, I have become more and more Interested in the formalism it s e l f , something that has tended to push work on the language description into the background. It seems to be the case that in moat concrete two-level descriptions, the rule component is forced to carry too heavy a burden in comparison with the lexicon, perhaps because the lexicon is very simple as to its implementation. Like many other lexical systems in computer applications i t is implemented as a tree, with a root node and leaves, from which the lexical entries eire retrieved when the analysts routine has traversed the tree. The twolevel lexicon la a bit more sophisticated than th is , however, in that there la not only one, but several lexicon trees, the so called minilexicons. The user links the mlnilexicona into a whole, moat often into a root lexicon and a number of su ffix lexicons. In Hockett'a terminology, we could apeak of an Itern-and-Arrangement (lA) model (Hockett 1958, pp 386ff). Elsewhere I have characterized this kind of lexicon system as most suited for describing suffixing languages with a comparatively high degree of agglutination (Borin 1985f p 35); This is mainly due to the fact that the system works according to what Blåberg (1984, p 6 l) aptly has termed the "forget-where-you-came-from"

261 citations

Journal ArticleDOI
TL;DR: A novel Deep Learning based approach to detect emotions - Happy, Sad and Angry in textual dialogues using semi-automated techniques to gather large scale training data with diverse ways of expressing emotions to train the model.

244 citations