scispace - formally typeset
Search or ask a question
Author

Tanvi Banerjee

Bio: Tanvi Banerjee is an academic researcher from Wright State University. The author has contributed to research in topics: Computer science & Medicine. The author has an hindex of 18, co-authored 82 publications receiving 926 citations. Previous affiliations of Tanvi Banerjee include B. P. Poddar Institute of Management & Technology & University of Missouri.


Papers
More filters
Proceedings ArticleDOI
31 Jul 2017
TL;DR: This paper used a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter align with the medical findings reported via the PHQ-9 questionnaire clinicians use today.
Abstract: With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%.

112 citations

Journal ArticleDOI
TL;DR: It is demonstrated how categories of discussion on Twitter about an epidemic can be discovered so that public health officials can understand specific societal concerns within the disease-specific categories.
Abstract: Background: In order to harness what people are tweeting about Zika, there needs to be a computational framework that leverages machine learning techniques to recognize relevant Zika tweets and, further, categorize these into disease-specific categories to address specific societal concerns related to the prevention, transmission, symptoms, and treatment of Zika virus. Objective: The purpose of this study was to determine the relevancy of the tweets and what people were tweeting about the 4 disease characteristics of Zika: symptoms, transmission, prevention, and treatment. Methods: A combination of natural language processing and machine learning techniques was used to determine what people were tweeting about Zika. Specifically, a two-stage classifier system was built to find relevant tweets about Zika, and then the tweets were categorized into 4 disease categories. Tweets in each disease category were then examined using latent Dirichlet allocation (LDA) to determine the 5 main tweet topics for each disease characteristic. Results: Over 4 months, 1,234,605 tweets were collected. The number of tweets by males and females was similar (28.47% [351,453/1,234,605] and 23.02% [284,207/1,234,605], respectively). The classifier performed well on the training and test data for relevancy (F1 score=0.87 and 0.99, respectively) and disease characteristics (F1 score=0.79 and 0.90, respectively). Five topics for each category were found and discussed, with a focus on the symptoms category. Conclusions: We demonstrate how categories of discussion on Twitter about an epidemic can be discovered so that public health officials can understand specific societal concerns within the disease-specific categories. Our two-stage classifier was able to identify relevant tweets to enable more specific analysis, including the specific aspects of Zika that were being discussed as well as misinformation being expressed. Future studies can capture sentiments and opinions on epidemic outbreaks like Zika virus in real time, which will likely inform efforts to educate the public at large. [JMIR Public Health Surveill 2017;3(2):e38]

82 citations

Posted Content
TL;DR: In this article, a combination of natural language processing and machine learning techniques were used to determine what people are tweeting about Zika, and a two-stage classifier system was built to find relevant tweets on Zika and then categorize these into the four disease categories.
Abstract: The purpose of this study was to do a dataset distribution analysis, a classification performance analysis, and a topical analysis concerning what people are tweeting about four disease characteristics: symptoms, transmission, prevention, and treatment. A combination of natural language processing and machine learning techniques were used to determine what people are tweeting about Zika. Specifically, a two-stage classifier system was built to find relevant tweets on Zika, and then categorize these into the four disease categories. Tweets in each disease category were then examined using latent dirichlet allocation (LDA) to determine the five main tweet topics for each disease characteristic. Results 1,234,605 tweets were collected. Tweets by males and females were similar (28% and 23% respectively). The classifier performed well on the training and test data for relevancy (F=0.87 and 0.99 respectively) and disease characteristics (F=0.79 and 0.90 respectively). Five topics for each category were found and discussed with a focus on the symptoms category. Through this process, we demonstrate how misinformation can be discovered so that public health officials can respond to the tweets with misinformation.

66 citations

Journal ArticleDOI
TL;DR: The authors investigate the role of semantics in measuring the data quality of the system, as well as integrating multimodal data for clinical decision support and the extension of IoT to the Internet of Everything by including human-in-the-loop to enhance the system accuracy.
Abstract: The amount of Internet of Things (IoT) data is growing rapidly. Although there is a growing understanding of the quality of such data at the device and network level, important challenges in interpreting and evaluating the quality at informational and application levels remain to be explored. This article discusses some of these challenges and solutions of IoT systems at the different OSI layers to understand the factors affecting the quality of the overall system. With the help of two IoT-enabled digital health applications, the authors investigate the role of semantics in measuring the data quality of the system, as well as integrating multimodal data for clinical decision support. They also discuss the extension of IoT to the Internet of Everything by including human-in-the-loop to enhance the system accuracy. This paradigm shift through the confluence of sensors and data analytics can lead to accelerated innovation in applications by overcoming the limitations of the current systems, leading to unprecedented opportunities in healthcare.

52 citations

Journal ArticleDOI
TL;DR: The approach described herein is capable of accurately detecting several different activity states related to fall detection and fall risk assessment including sitting, being upright, and being on the floor to ensure that elderly residents get the help they need quickly in case of emergencies and ultimately to help prevent such emergencies.
Abstract: We present an approach for activity state recognition implemented on data collected from various sensors—standard web cameras under normal illumination, web cameras using infrared lighting, and the inexpensive Microsoft Kinect camera system Sensors such as the Kinect ensure that activity segmentation is possible during the daytime as well as night This is especially useful for activity monitoring of older adults since falls are more prevalent at night than during the day This paper is an application of fuzzy set techniques to a new domain The approach described herein is capable of accurately detecting several different activity states related to fall detection and fall risk assessment including sitting, being upright, and being on the floor to ensure that elderly residents get the help they need quickly in case of emergencies and ultimately to help prevent such emergencies

51 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Book
16 Nov 1998

766 citations

Journal ArticleDOI
13 Mar 2020-Cureus
TL;DR: An early quantification of the magnitude of misinformation spread is provided and the importance of early interventions in order to curb this phenomenon that endangers public safety at a time when awareness and appropriate preventive actions are paramount is highlighted.
Abstract: Background Since the beginning of the coronavirus disease 2019 (COVID-19) epidemic, misinformation has been spreading uninhibited over traditional and social media at a rapid pace. We sought to analyze the magnitude of misinformation that is being spread on Twitter (Twitter, Inc., San Francisco, CA) regarding the coronavirus epidemic. Materials and methods We conducted a search on Twitter using 14 different trending hashtags and keywords related to the COVID-19 epidemic. We then summarized and assessed individual tweets for misinformation in comparison to verified and peer-reviewed resources. Descriptive statistics were used to compare terms and hashtags, and to identify individual tweets and account characteristics. Results The study included 673 tweets. Most tweets were posted by informal individuals/groups (66%), and 129 (19.2%) belonged to verified Twitter accounts. The majority of included tweets contained serious content (91.2%); 548 tweets (81.4%) included genuine information pertaining to the COVID-19 epidemic. Around 70% of the tweets tackled medical/public health information, while the others were pertaining to sociopolitical and financial factors. In total, 153 tweets (24.8%) included misinformation, and 107 (17.4%) included unverifiable information regarding the COVID-19 epidemic. The rate of misinformation was higher among informal individual/group accounts (33.8%, p: <0.001). Tweets from unverified Twitter accounts contained more misinformation (31.0% vs 12.6% for verified accounts, p: <0.001). Tweets from healthcare/public health accounts had the lowest rate of unverifiable information (12.3%, p: 0.04). The number of likes and retweets per tweet was not associated with a difference in either false or unverifiable content. The keyword “COVID-19” had the lowest rate of misinformation and unverifiable information, while the keywords “#2019_ncov” and “Corona” were associated with the highest amount of misinformation and unverifiable content respectively. Conclusions Medical misinformation and unverifiable content pertaining to the global COVID-19 epidemic are being propagated at an alarming rate on social media. We provide an early quantification of the magnitude of misinformation spread and highlight the importance of early interventions in order to curb this phenomenon that endangers public safety at a time when awareness and appropriate preventive actions are paramount.

580 citations