Home
/
Authors
/
Sushovan De

Author

Sushovan De

Bio: Sushovan De is an academic researcher from Arizona State University. The author has contributed to research in topics: Tuple & Database design. The author has an hindex of 6, co-authored 10 publications receiving 424 citations.

Topics: Tuple, Database design, Data deduplication, Probabilistic logic, Master data ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•

Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity

[...]

Munmun De Choudhury¹, Sushovan De²•Institutions (2)

Georgia Institute of Technology¹, Arizona State University²

16 May 2014

TL;DR: These findings reveal, for the first time, the kind of unique information needs that a social media like reddit might be fulfilling when it comes to a stigmatic illness, and expand the understanding of the role of the social web in behavioral therapy.

...read moreread less

Abstract: Social media is continually emerging as a platform of information exchange around health challenges. We study mental health discourse on the popular social media:reddit. Building on findings about health information seeking and sharing practices in online forums, and social media like Twitter, we address three research challenges. First, we present a characterization of self-disclosure inmental illness communities on reddit. We observe individuals discussing a variety of concerns ranging from the daily grind to specific queries about diagnosis and treatment. Second, we build a statistical model to examine the factors that drive social support on mental health reddit communities. We also develop language models to characterize mental health social support, which are observed to bear emotional, informational, instrumental, and prescriptive information. Finally, we study disinhibition in the light of the dissociative anonymity that reddit’s throwaway accounts provide. Apart from promoting open conversations,such anonymity surprisingly is found to gather feedback that is more involving and emotionally engaging. Our findings reveal, for the first time, the kind of unique information needs that a social media like reddit might be fulfilling when it comes to a stigmatic illness. They also expand our understanding of the role of the social web in behavioral therapy.

...read moreread less

492 citations

Proceedings Article•

AI-MIX: using automated planning to steer human workers towards better crowdsourced plans

[...]

Lydia Manikonda¹, Tathagata Chakraborti¹, Sushovan De¹, Kartik Talamadupula¹, Subbarao Kambhampati¹ - Show less +1 more•Institutions (1)

Arizona State University¹

27 Jul 2014

TL;DR: The implementation of AI-MIX, a tour plan generation system that uses automated checks and alerts to improve the quality of plans created by human workers; and a preliminary evaluation of the effectiveness of steering provided by automated planning.

...read moreread less

Abstract: One subclass of human computation applications are those directed at tasks that involve planning (e.g. tour planning) and scheduling (e.g. conference scheduling). Interestingly, work on these systems shows that even primitive forms of automated oversight on the human contributors helps in significantly improving the effectiveness of the humans/crowd. In this paper, we argue that the automated oversight used in these systems can be viewed as a primitive automated planner, and that there are several opportunities for more sophisticated automated planning in effectively steering the crowd. Straightforward adaptation of current planning technology is however hampered by the mismatch between the capabilities of human workers and automated planners. We identify and partially address two important challenges that need to be overcome before such adaptation of planning technology can occur: (i) interpreting inputs of the human workers (and the requester) and (ii) steering or critiquing plans produced by the human workers, armed only with incomplete domain and preference models. To these ends, we describe the implementation of AI-MIX, a tour plan generation system that uses automated checks and alerts to improve the quality of plans created by human workers; and present a preliminary evaluation of the effectiveness of steering provided by automated planning.

...read moreread less

25 citations

Journal Article•DOI•

BayesWipe: A Scalable Probabilistic Framework for Improving Data Quality

[...]

Sushovan De¹, Yuheng Hu², Venkata Vamsikrishna Meduri¹, Yi Chen³, Subbarao Kambhampati¹ - Show less +1 more•Institutions (3)

Arizona State University¹, University of Illinois at Chicago², New Jersey Institute of Technology³

25 Oct 2016-Journal of Data and Information Quality

TL;DR: This article provides a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly, to avoid the necessity for a domain expert or clean master data.

...read moreread less

Abstract: Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of the approaches addressing these problems focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like Conditional Functional Dependencies (which have to be provided by domain experts or learned from a clean sample of the database). In this article, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.

...read moreread less

13 citations

Proceedings Article•DOI•

BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata

[...]

Sushovan De¹, Yuheng Hu¹, Yi Chen², Subbarao Kambhampati¹•Institutions (2)

Arizona State University¹, New Jersey Institute of Technology²

01 Oct 2014

TL;DR: This paper provides a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly, to avoid the necessity for a domain expert or clean master data.

...read moreread less

Abstract: Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.

...read moreread less

10 citations

Posted Content•

Bayes Networks for Supporting Query Processing Over Incomplete Autonomous Databases

[...]

Rohit Raghunathan, Sushovan De, Subbarao Kambhampati

28 Aug 2012-arXiv: Databases

TL;DR: Empirical studies are presented to demonstrate that at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and the relevant possible answers retrieved by the queries reformulated using Bayesian Networks provide higher precision and recall than AFDs while keeping query processing costs manageable.

...read moreread less

Abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values---which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayes networks. Our approach involves mining/"learning" Bayes networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayes networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayes networks provide higher precision and recall than AFDs while keeping query processing costs manageable.

...read moreread less

8 citations

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

Statistical Analysis with Missing Data

[...]

Martin G. Gibson

01 Mar 1989-The Statistician

3,152 citations

Proceedings Article•DOI•

Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media

[...]

Munmun De Choudhury¹, Emre Kiciman², Mark Dredze³, Glen Coppersmith, Mrinal Kumar¹ - Show less +1 more•Institutions (3)

Georgia Institute of Technology¹, Microsoft², Johns Hopkins University³

07 May 2016

TL;DR: This paper develops a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation, and utilizes semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts.

...read moreread less

Abstract: History of mental illness is a major factor behind suicide risk and ideation. However research efforts toward characterizing and forecasting this risk is limited due to the paucity of information regarding suicide ideation, exacerbated by the stigma of mental illness. This paper fills gaps in the literature by developing a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation. We utilize semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts. We develop language and interactional measures for this purpose, as well as a propensity score matching based statistical approach. Our approach allows us to derive distinct markers of shifts to suicidal ideation. These markers can be modeled in a prediction framework to identify individuals likely to engage in suicidal ideation in the future. We discuss societal and ethical implications of this research.

...read moreread less

513 citations

Shifts to Suicidal Ideation from Mental Health Content in Social Media

[...]

Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, Mrinal Kumar - Show less +1 more

01 May 2016

312 citations

Proceedings Article•DOI•

Understanding Social Media Disclosures of Sexual Abuse Through the Lenses of Support Seeking and Anonymity

[...]

Nazanin Andalibi¹, Oliver L. Haimson², Munmun De Choudhury³, Andrea Forte¹•Institutions (3)

Drexel University¹, University of California, Irvine², Georgia Institute of Technology³

07 May 2016

TL;DR: This paper uses mixed methods to understand abuse-related posts on reddit and uses quantitative methods to investigate the use of "throwaway" accounts, which provide greater anonymity, and reports on factors associated with support seeking and first-time disclosures.

...read moreread less

Abstract: Support seeking in stigmatized contexts is useful when the discloser receives the desired response, but it also entails social risks. Thus, people do not always disclose or seek support when they need it. One such stigmatized context for support seeking is sexual abuse. In this paper, we use mixed methods to understand abuse-related posts on reddit. First, we take a qualitative approach to understand post content. Then we use quantitative methods to investigate the use of "throwaway" accounts, which provide greater anonymity, and report on factors associated with support seeking and first-time disclosures. In addition to significant linguistic differences between throwaway and identified accounts, we find that those using throwaway accounts are significantly more likely to engage in seeking support. We also find that men are significantly more likely to use throwaway accounts when posting about sexual abuse. Results suggest that subreddit moderators and members who wish to provide support pay attention to throwaway accounts, and we discuss the importance of context-specific anonymity in support seeking.

...read moreread less

271 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

Collapse