Twitter data analysis: temporal and term frequency analysis with real-time event
01 Nov 2017-Vol. 263, Iss: 4, pp 042081
TL;DR: This document is performing time-series analysis and term frequency analysis using different techniques such as filtering, information extraction for text-mining that fulfils the objective of finding interesting moments for temporal data in the event and finding the ranking among the players or the teams based on popularity.
Abstract: From the past few years, World Wide Web (www) has become a prominent and huge source for user generated content and opinionative data. Among various social media, Twitter gained popularity as it offers a fast and effective way of sharing users' perspective towards various critical and other issues in different domain. As the data is hugely generated on cloud, it has opened doors for the researchers in the field of data science and analysis. There are various domains such as 'Political' domain, 'Entertainment' domain and 'Business' domain. Also there are various APIs that Twitter provides for developers 1) Search API, focus on the old tweets 2) Rest API, focuses on user details and allow to collect the user profile, friends and followers 3) Streaming API, which collects details like tweets, hashtags, geo locations. In our work we are accessing Streaming API in order to fetch real-time tweets for the dynamic happening event. For this we are focusing on 'Entertainment' domain especially 'Sports' as IPL-T20 is currently the trending on-going event. We are collecting these numerous amounts of tweets and storing them in MongoDB database where the tweets are stored in JSON document format. On this document we are performing time-series analysis and term frequency analysis using different techniques such as filtering, information extraction for text-mining that fulfils our objective of finding interesting moments for temporal data in the event and finding the ranking among the players or the teams based on popularity which helps people in understanding key influencers on the social media platform.
01 Jan 2016
TL;DR: In this paper, a scheme is proposed which can detect what happens in real world in real time only by analyzing tweets as Big Data and let a user know the event by quantifying importance of words accurately and evaluating the quantified values dynamically.
Abstract: Big Data has been one of main topics in the field of computer science. Additionally, demand for observations of the real world in real time has increased to provide services or information to people accordingly. For example, when disaster occurs, government can appropriately respond to the disaster if the situations in the disaster-stricken areas are real-timely grasped. Although there are many kinds of blog services and they are functioning as one of Big Data source, Twitter is considered as the most active Big Data source. Users can feel free to post a tweet anywhere in real time, since twitter limits a tweet to 140 characters. In this paper, a scheme is proposed which can detect what happens in real world in real time only by analyzing tweets as Big Data and let a user know the event. To this end, the following problems has to be solved. They are a) quantifying importance of words accurately and b) evaluating the quantified values dynamically. As the solutions for the problems, two new methods are proposed which are the Extended Hybrid TF-IDF and the Remarkable Word Detecting Method, and they are used in the proposed scheme. Finally an experiment is executed to evaluate the proposed methods and scheme.
••01 Jan 2020
TL;DR: This study categorizes a large number of recent studies and articles in relevant area to get a summarized view of the state of the art in this field and provides a quick baseline for further research.
Abstract: In this new era of Web 2.0, people around the world are expressing their feelings, sentiments, thoughts, daily activities, and local and global events happening around them in different social networking sites like Twitter, Facebook, etc. This generates vast amount of data in social media by registered users which are geographical and temporal information-oriented. This rich data could be potentially useful information and is being extensively used nowadays for different applications like user’s sentiment analysis, product or service reviews, real-time information extraction like traffic, disaster reporting, personalized message or user recommendation, and other areas. Extracting topic distribution from social media in spatial and temporal dimensions is an important research area. Hence, our focus of this study is on discussing various topical modeling techniques and their uses in different recent research works. This chapter gives a brief overview of the recent updates of spatiotemporal topical analysis using Twitter data. This study categorizes a large number of recent studies and articles in relevant area to get a summarized view of the state of the art in this field. This survey will help researchers, who are new to the domain, and provide a quick baseline for further research.
TL;DR: In this paper , contextual keywords are extracted using thematic events with the help of data association and the thematic context for events is identified using the uncertainty principle in the proposed system.
Abstract: Keyword extraction is a crucial process in text mining. The extraction of keywords with respective contextual events in Twitter data is a big challenge. The challenging issues are mainly because of the informality in the language used. The use of misspelled words, acronyms, and ambiguous terms causes informality. The extraction of keywords with informal language in current systems is pattern based or event based. In this paper, contextual keywords are extracted using thematic events with the help of data association. The thematic context for events is identified using the uncertainty principle in the proposed system. The thematic contexts are weighed with the help of vectors called thematic context vectors which signifies the event as certain or uncertain. The system is tested on the Twitter COVID-19 dataset and proves to be effective. The system extracts event-specific thematic context vectors from the test dataset and ranks them. The extracted thematic context vectors are used for the clustering of contextual thematic vectors which improves the silhouette coefficient by 0.5% than state of art methods namely TF and TF-IDF. The thematic context vector can be used in other applications like Cyberbullying, sarcasm detection, figurative language detection, etc.
01 Nov 2015
TL;DR: This research developed a text mining application to detect emotions of Twitter users that are classified into six emotions, namely happiness, sadness, anger, disgust, fear, and surprise, which is able to achieve 83% accuracy for 105 tweets.
Abstract: Twitter is one of social media with more than 500 million users and 400 million tweets per day. In any written tweet of Twitter users it contains various emotions. Most research on the use of social media classifies sentiments into three categories that are positive, negative, and neutral. However, none of these studies has developed an application that can detect user emotions in the social media, particularly on Twitter. Hence, this research developed a text mining application to detect emotions of Twitter users that are classified into six emotions, namely happiness, sadness, anger, disgust, fear, and surprise. Three main phases of the text mining utilized in this application were preprocessing, processing, and validation. Activities conducted in the preprocessing phase were case folding, cleansing, stop-word removal, emoticons conversion, negation conversion, and tokenization to the training data and the test data based on the sentiment analysis that performed morphological analysis to build several models. In the processing phase, it performed weighting and classification using the Naive Bayes algorithm on the validated model. The process for measuring the level of accuracy generated by the application using 10-fold cross validation was done in the validation phase. The findings showed that this application is able to achieve 83% accuracy for 105 tweets. In order to get a higher accuracy, one requires a better model in training data.
••01 Jan 2017
TL;DR: The emotion analysis methodology is shown to present a fast and robust way of analyzing online stream of tweets and an online approach to predict emotion-intensive moments during real-life events.
Abstract: The rapid growth of social media, such as twitter, provides a great opportunity for identifying and analyzing people's emotions in response to various public events, such as epidemics, terrorist attacks and political elections. Detecting the emotions of people on different events are crucial in many applications. However, the high volume and fast pace of social media make it challenging to analyze public emotions from social media data in real-time. In this paper we propose a method to measure public emotion and predict important moments during particular public events. Given a stream of tweets, we analyze the impact of major public events, both tragic and enthusiastic ones, on public emotion. We develop a full-stack architecture that performs real-time emotion analysis on Twitter streams. We design a supervised learning approach for classifying tweets based on the type of the emotion they elicit. Then we aggregate each emotion class to discover emotion-evolving patterns over time. We also propose an online approach to predict emotion-intensive moments during real-life events. Our emotion analysis methodology is shown to present a fast and robust way of analyzing online stream of tweets.
••03 Dec 2014
TL;DR: This paper analyzes tweets relating to the English FA Cup finals 2012 by applying a novel method named TRCM to extract association rules present in hash tag keywords of tweets in different time-slots, and maps the identified hash tags to event highlights of the game as reported in the ground truth of the main stream media.
Abstract: Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hash tag keywords of tweets in different time-slots. Our system identify evolving hash tag keywords with strong association rules in each time-slot. We then map the identified hash tag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.
••25 Aug 2015
TL;DR: The model is used to analyze 24 million geotagged tweets that have been sent in the US from April 9 to April 22, 2013 - the time period of the Boston marathon bombing - and it is shown that the approach can create multi-word events that efficiently summarize real-world events.
Abstract: Twitter is a pervasive technology, with hundreds of millions of users serving as sensors that provide eyewitness accounts of events on the ground. In case of popular events, these sensors start to broadcast news by tweeting to their followers, and to the world. Within minutes these tweets can attract attention and also serve as a primary information source for traditional media. Given a huge set of tweets, the key questions are: (1) How can we detect informative events in general? (2) How can we distinguish relevant events from others? In this paper we tackle these challenges with a statistical model for detecting events by spotting significant frequency deviations of the words' frequency over time. Besides single word events, our model also accounts for events composed of multiple co-occurring words, thus, providing much richer information. Our statistical process is complemented with an optimization algorithm to extract only non-redundant events, overall, providing the user with a succinct summary of the current events. We used our model to analyze 24 million geotagged tweets that have been sent in the US from April 9 to April 22, 2013 -- the time period of the Boston marathon bombing -- and we show that our approach can create multi-word events that efficiently summarize real-world events.
••27 Jun 2015
TL;DR: The aim is to study diffusion dynamics of specific real world events, discussed on Twitter, with respect to location and time, and study the diffusion of the events using the user interaction graph formed by retweet/mention links.
Abstract: Earlier during the times of traditional print media, there used to be one-way information dissemination which was restricted to geographical boundaries having limited span and reach. With the advent of online social media, the process of information diffusion has changed significant ally. It has become the fastest means of communication gaining wide popularity. Online Social Networks like Facebook, Twitter have revolutionized the interpersonal communication by providing a platform to individuals to express themselves at a global level, beyond their immediate geography. Most research in this area has focused on analyzing general information diffusion phenomenon. Our aim is to study diffusion dynamics of specific real world events, discussed on Twitter, with respect to location and time. We categorize the events into broad categories based on the following features - temporal (short or long), geo-spatial distribution (local or global), information diffusion mechanism (viral or gradual), influence(popular or unpopular) and cause (natural or planned). Temporal analysis shows that pre-event, during-event and post event frequency distribution of tweets differ with respect to nature of events. For example, a planned event like "Delhi Elections" is more discussed after its actual occurrence whereas other planned event like "Obama's visit to India" is mainly discussed during the visit only. Through geospatial analysis, we find that some events which are supposed to be constrained locally, cross regional boundaries and become a matter of global discussion. We also study the diffusion of the events using the user interaction graph formed by retweet/mention links. We conclude with the three-dimensional analysis of spatio-temporal diffusion dynamics of real-world events by exploring relationships among them.
Related Papers (5)
31 Oct 2017