scispace - formally typeset
Search or ask a question
Author

Bi Chen

Other affiliations: Pennsylvania State University
Bio: Bi Chen is an academic researcher from Penn State College of Information Sciences and Technology. The author has contributed to research in topics: Social network & Information flow (information theory). The author has an hindex of 4, co-authored 4 publications receiving 345 citations. Previous affiliations of Bi Chen include Pennsylvania State University.

Papers
More filters
Proceedings ArticleDOI
02 Nov 2009
TL;DR: An iterative topic evolution learning framework is proposed by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model, which clearly shows that citations can help to understand topic evolution better.
Abstract: Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words and also considers the impact of authors. However, the impact of one document on another as captured by citations, one important inherent element in scientific literature, has not been considered. In this paper, we address the problem of understanding topic evolution by leveraging citations, and develop citation-aware approaches. We propose an iterative topic evolution learning framework by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model. We evaluate the effectiveness and efficiency of our approaches and compare with the state of the art approaches on a large collection of more than 650,000 research papers in the last 16 years and the citation network enabled by CiteSeerX. The results clearly show that citations can help to understand topic evolution better.

187 citations

Proceedings Article
22 Jul 2007
TL;DR: This paper proposes to detect events from social text streams by exploring the content as well as the temporal, and social dimensions by combining text based clustering, temporal segmentation, and information flow-based graph cuts of the dual graph of the social networks.
Abstract: Recently, social text streams (e.g., blogs, web forums, and emails) have become ubiquitous with the evolution of the web. In some sense, social text streams are sensors of the real world. Often, it is desirable to extract real world events from the social text streams. However, existing event detection research mainly focused only on the stream properties of social text streams but ignored the contextual, temporal, and social information embedded in the streams. In this paper, we propose to detect events from social text streams by exploring the content as well as the temporal, and social dimensions. We define the term event as the information flow between a group of social actors on a specific topic over a certain time period. We represent social text streams as multi-graphs, where each node represents a social actor and each edge represents the information flow between two actors. The content and temporal associations within the flow of information are embedded in the corresponding edge. Events are detected by combining text based clustering, temporal segmentation, and information flow-based graph cuts of the dual graph of the social networks. Experiments conducted with the Enron email dataset and the political blog dataset from Dailykos show the proposed event detection approach outperforms the other alternatives.

117 citations

Proceedings Article
11 Jul 2010
TL;DR: It is found that sentences extracted by the opinion scoring models can effectively express opinionists' standpoints.
Abstract: In this paper, we propose a generative model to automatically discover the hidden associations between topics words and opinion words. By applying those discovered hidden associations, we construct the opinion scoring models to extract statements which best express opinionists' standpoints on certain topics. For experiments, we apply our model to the political area. First, we visualize the similarities and dissimilarities between Republican and Democratic senators with respect to various topics. Second, we compare the performance of the opinion scoring models with 14 kinds of methods to find the best ones. We find that sentences extracted by our opinion scoring models can effectively express opinionists' standpoints.

58 citations

Proceedings ArticleDOI
28 Oct 2007
TL;DR: Experiments show that the social network and profile-based blogging behavior model with ELM regression techniques produce good results for the most active bloggers and can be used to predict blogging behavior.
Abstract: Modeling the behavior of bloggers is an important problem with various applications in recommender systems, targeted advertising, and event detection. In this paper, we propose three models by combining content, temporal, social dimensions: the general blogging-behavior model, the profile-based blogging-behavior model and the social- network and profile-based blogging-behavior model. The models are based on two regression techniques: Extreme Learning Machine (ELM), and Modified General Regression Neural Network (MGRNN). We choose one of the largest blogs, a political blog, DailyKos1, for our empirical evaluation. Experiments show that the social network and profile-based blogging behavior model with ELM regression techniques produce good results for the most active bloggers and can be used to predict blogging behavior.

14 citations


Cited by
More filters
Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Proceedings ArticleDOI
05 Jul 2011
TL;DR: This paper explores approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages, and relies on a rich family of aggregatestatistics of topically similar message clusters.
Abstract: User-contributed messages on social media sites such as Twitter have emerged aspowerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events andnon-event messages. Our approach relies on a rich family of aggregatestatistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.

761 citations

Journal ArticleDOI
TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.
Abstract: Social media platforms provide active communication channels during mass convergence and emergency events such as disasters caused by natural hazards. As a result, first responders, decision makers, and the public can use this information to gain insight into the situation as it unfolds. In particular, many social media messages communicated during emergencies convey timely, actionable information. Processing social media messages to obtain such information, however, involves solving multiple challenges including: parsing brief and informal messages, handling information overload, and prioritizing different types of information found in messages. These challenges can be mapped to classical information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. We survey the state of the art regarding computational methods to process social media messages and highlight both their contributions and shortcomings. In addition, we examine their particularities, and methodically examine a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries. Research thus far has, to a large extent, produced methods to extract situational awareness information from social media. In this survey, we cover these various approaches, and highlight their benefits and shortcomings. We conclude with research challenges that go beyond situational awareness, and begin to look at supporting decision making and coordinating emergency-response actions.

710 citations

Journal ArticleDOI
TL;DR: In this article, the authors investigated highly scholarly articles (between 2003 to 2016) related to topic modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling.
Abstract: Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modelling; Latent Dirichlet Allocation (LDA) is one of the most popular in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper will be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated highly scholarly articles (between 2003 to 2016) related to topic modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. In addition, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.

608 citations

Journal ArticleDOI
TL;DR: This survey first frames the concept of community and the problem of community detection in the context of Social Media, and provides a compact classification of existing algorithms based on their methodological principles, placing special emphasis on the performance of existing methods in terms of computational complexity and memory requirements.
Abstract: The proposed survey discusses the topic of community detection in the context of Social Media. Community detection constitutes a significant tool for the analysis of complex networks by enabling the study of mesoscopic structures that are often associated with organizational and functional characteristics of the underlying networks. Community detection has proven to be valuable in a series of domains, e.g. biology, social sciences, bibliometrics. However, despite the unprecedented scale, complexity and the dynamic nature of the networks derived from Social Media data, there has only been limited discussion of community detection in this context. More specifically, there is hardly any discussion on the performance characteristics of community detection methods as well as the exploitation of their results in the context of real-world web mining and information retrieval scenarios. To this end, this survey first frames the concept of community and the problem of community detection in the context of Social Media, and provides a compact classification of existing algorithms based on their methodological principles. The survey places special emphasis on the performance of existing methods in terms of computational complexity and memory requirements. It presents both a theoretical and an experimental comparative discussion of several popular methods. In addition, it discusses the possibility for incremental application of the methods and proposes five strategies for scaling community detection to real-world networks of huge scales. Finally, the survey deals with the interpretation and exploitation of community detection results in the context of intelligent web applications and services.

607 citations