scispace - formally typeset
Search or ask a question
Author

Sushovan De

Bio: Sushovan De is an academic researcher from Arizona State University. The author has contributed to research in topics: Tuple & Database design. The author has an hindex of 6, co-authored 10 publications receiving 424 citations.

Papers
More filters
Proceedings Article
16 May 2014
TL;DR: These findings reveal, for the first time, the kind of unique information needs that a social media like reddit might be fulfilling when it comes to a stigmatic illness, and expand the understanding of the role of the social web in behavioral therapy.
Abstract: Social media is continually emerging as a platform of information exchange around health challenges. We study mental health discourse on the popular social media:reddit. Building on findings about health information seeking and sharing practices in online forums, and social media like Twitter, we address three research challenges. First, we present a characterization of self-disclosure inmental illness communities on reddit. We observe individuals discussing a variety of concerns ranging from the daily grind to specific queries about diagnosis and treatment. Second, we build a statistical model to examine the factors that drive social support on mental health reddit communities. We also develop language models to characterize mental health social support, which are observed to bear emotional, informational, instrumental, and prescriptive information. Finally, we study disinhibition in the light of the dissociative anonymity that reddit’s throwaway accounts provide. Apart from promoting open conversations,such anonymity surprisingly is found to gather feedback that is more involving and emotionally engaging. Our findings reveal, for the first time, the kind of unique information needs that a social media like reddit might be fulfilling when it comes to a stigmatic illness. They also expand our understanding of the role of the social web in behavioral therapy.

492 citations

Proceedings Article
27 Jul 2014
TL;DR: The implementation of AI-MIX, a tour plan generation system that uses automated checks and alerts to improve the quality of plans created by human workers; and a preliminary evaluation of the effectiveness of steering provided by automated planning.
Abstract: One subclass of human computation applications are those directed at tasks that involve planning (e.g. tour planning) and scheduling (e.g. conference scheduling). Interestingly, work on these systems shows that even primitive forms of automated oversight on the human contributors helps in significantly improving the effectiveness of the humans/crowd. In this paper, we argue that the automated oversight used in these systems can be viewed as a primitive automated planner, and that there are several opportunities for more sophisticated automated planning in effectively steering the crowd. Straightforward adaptation of current planning technology is however hampered by the mismatch between the capabilities of human workers and automated planners. We identify and partially address two important challenges that need to be overcome before such adaptation of planning technology can occur: (i) interpreting inputs of the human workers (and the requester) and (ii) steering or critiquing plans produced by the human workers, armed only with incomplete domain and preference models. To these ends, we describe the implementation of AI-MIX, a tour plan generation system that uses automated checks and alerts to improve the quality of plans created by human workers; and present a preliminary evaluation of the effectiveness of steering provided by automated planning.

25 citations

Journal ArticleDOI
TL;DR: This article provides a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly, to avoid the necessity for a domain expert or clean master data.
Abstract: Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of the approaches addressing these problems focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like Conditional Functional Dependencies (which have to be provided by domain experts or learned from a clean sample of the database). In this article, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.

13 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: This paper provides a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly, to avoid the necessity for a domain expert or clean master data.
Abstract: Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.

10 citations

Posted Content
TL;DR: Empirical studies are presented to demonstrate that at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and the relevant possible answers retrieved by the queries reformulated using Bayesian Networks provide higher precision and recall than AFDs while keeping query processing costs manageable.
Abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values---which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayes networks. Our approach involves mining/"learning" Bayes networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayes networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayes networks provide higher precision and recall than AFDs while keeping query processing costs manageable.

8 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI

3,152 citations

Proceedings ArticleDOI
07 May 2016
TL;DR: This paper develops a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation, and utilizes semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts.
Abstract: History of mental illness is a major factor behind suicide risk and ideation. However research efforts toward characterizing and forecasting this risk is limited due to the paucity of information regarding suicide ideation, exacerbated by the stigma of mental illness. This paper fills gaps in the literature by developing a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation. We utilize semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts. We develop language and interactional measures for this purpose, as well as a propensity score matching based statistical approach. Our approach allows us to derive distinct markers of shifts to suicidal ideation. These markers can be modeled in a prediction framework to identify individuals likely to engage in suicidal ideation in the future. We discuss societal and ethical implications of this research.

513 citations

Proceedings ArticleDOI
07 May 2016
TL;DR: This paper uses mixed methods to understand abuse-related posts on reddit and uses quantitative methods to investigate the use of "throwaway" accounts, which provide greater anonymity, and reports on factors associated with support seeking and first-time disclosures.
Abstract: Support seeking in stigmatized contexts is useful when the discloser receives the desired response, but it also entails social risks. Thus, people do not always disclose or seek support when they need it. One such stigmatized context for support seeking is sexual abuse. In this paper, we use mixed methods to understand abuse-related posts on reddit. First, we take a qualitative approach to understand post content. Then we use quantitative methods to investigate the use of "throwaway" accounts, which provide greater anonymity, and report on factors associated with support seeking and first-time disclosures. In addition to significant linguistic differences between throwaway and identified accounts, we find that those using throwaway accounts are significantly more likely to engage in seeking support. We also find that men are significantly more likely to use throwaway accounts when posting about sexual abuse. Results suggest that subreddit moderators and members who wish to provide support pay attention to throwaway accounts, and we discuss the importance of context-specific anonymity in support seeking.

271 citations