scispace - formally typeset
Search or ask a question

Showing papers by "Haewoon Kwak published in 2014"


Proceedings ArticleDOI
26 Apr 2014
TL;DR: It is identified that simple features extracted from the phone, such as the user's interaction with the notification center, the screen activity, the proximity sensor, and the ringer mode, are strong predictors of how quickly the user will attend to the messages.
Abstract: Mobile instant messaging (e.g., via SMS or WhatsApp) often goes along with an expectation of high attentiveness, i.e., that the receiver will notice and read the message within a few minutes. Hence, existing instant messaging services for mobile phones share indicators of availability, such as the last time the user has been online. However, in this paper we not only provide evidence that these cues create social pressure, but that they are also weak predictors of attentiveness. As remedy, we propose to share a machine-computed prediction of whether the user will view a message within the next few minutes or not. For two weeks, we collected behavioral data from 24 users of mobile instant messaging services. By the means of machine-learning techniques, we identified that simple features extracted from the phone, such as the user's interaction with the notification center, the screen activity, the proximity sensor, and the ringer mode, are strong predictors of how quickly the user will attend to the messages. With seven automatically selected features our model predicts whether a phone user will view a message within a few minutes with 70.6% accuracy and a precision for fast attendance of 81.2%

183 citations


Posted Content
TL;DR: The authors proposed a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decision.
Abstract: One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts. In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

97 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: This paper proposes a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourcing decisions.
Abstract: One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts. In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

96 citations


Proceedings ArticleDOI
02 Dec 2014
TL;DR: The technique is to devise path segmentation, de-noising, and inference procedures to estimate the device stationary location, as well as its mobility path between stationary positions, and it is shown that mobility path accuracy improves with its length and speed, and counter to the intuition, accuracy appears to improve in suburban areas.
Abstract: Through their normal operation, cellular networks are a repository of continuous location information from their subscribed devices. Such information, however, comes at a coarse granularity both in terms of space, as well as time. For otherwise inactive devices, location information can be obtained at the granularity of the associated cellular sector, and at infrequent points in time, that are sensitive to the structure of the network itself, and the level of mobility of the device. In this paper, we are asking the question of whether such sparse information can help to identify the paths followed by mobile connected devices throughout the day. If such a task is possible, then we would not only enable continuous mobility path estimation for smartphones, but also for the millions of future connected "things". The challenge we face is that cellular data has one to two orders of magnitude less spatial and temporal resolution than typical GPS traces. Our contribution is to devise path segmentation, de-noising, and inference procedures to estimate the device stationary location, as well as its mobility path between stationary positions. We call our technique Cell*. We complement the lack of spatio-temporal granularity with information on the cellular network topology, and GIS (Geographic Information System). We collect more than 3,000 mobility trajectories over 8 months and show that Cell* achieves a median error of 230m for the stationary location estimation, while mobility paths are estimated with a median accuracy of 70m. We show that mobility path accuracy improves with its length and speed, and counter to our intuition, accuracy appears to improve in suburban areas. Cell* is the first technology, we are aware of, that allows location services for the new generation of connected mobile devices, that may feature no GPS, due to cost, size, or battery constraints.

64 citations


Posted Content
TL;DR: The concept of "flow motifs" is introduced to characterize the statistically significant pass sequence patterns of soccer teams based on their pass networks, and extends the idea of the network motifs, highly significant subgraphs that usually consists of three or four nodes.
Abstract: Is it possible to have a unique, recognizable style in soccer nowadays? We address this question by proposing a method to quantify the motif characteristics of soccer teams based on their pass networks We introduce the the concept of "flow motifs" to characterize the statistically significant pass sequence patterns It extends the idea of the network motifs, highly significant subgraphs that usually consists of three or four nodes The analysis of the motifs in the pass networks allows us to compare and differentiate the styles of different teams Although most teams tend to apply homogenous style, surprisingly, a unique strategy of soccer exists Specifically, FC Barcelona's famous tiki-taka does not consist of uncountable random passes but rather has a precise, finely constructed structure

59 citations


Book ChapterDOI
11 Nov 2014
TL;DR: The structure of global news coverage of disasters and its determinants are revealed by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world.
Abstract: In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as population, political stability, damage, and more, are well aligned with a series of previous research. However, we find strong regionalism in news geography, highlighting the necessity of comprehensive datasets for the study of global news coverage.

54 citations


Book ChapterDOI
10 Nov 2014
TL;DR: This paper explored the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends.
Abstract: In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

29 citations


Patent
19 Dec 2014
TL;DR: In this article, a method for predicting reactiveness of MMI users is proposed, which consists of collecting ground truth data for a machine-learning classifier, extracting from the collected ground-truth data a list of features which determines a current or past context of the user, and each feature having a feature's prediction strength calculated as fraction of classes misclassified when removing the feature.
Abstract: A method for predicting reactiveness of MMI users comprises: reacting to a message with a mobile user device which is a receiver of the message, collecting ground-truth data ( 11 ) for a machine-learning classifier, extracting from the collected ground-truth data ( 11 ) a list of features ( 12 ) which determines a current or past context of the user, and each feature having a feature's prediction strength calculated as fraction of classes misclassified when removing the feature; selecting the list of features ( 12 ) based on each feature's prediction strength; defining a plurality of reactiveness classes ( 101 ); both the extracted list of features ( 12 ) and the reactiveness classes ( 101 ) being input to the machine-learning classifier; classifying ( 102 ) the user according to the defined reactiveness classes ( 101 ); predicting the user's reactiveness for the given current or past context of the user by determining the most likely reactiveness class via the machine-learning classifier.

23 citations


Posted Content
TL;DR: Using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends, a series of linguistic analyses are performed to gain a deeper understanding of the role communication plays in the expression of toxic behavior.
Abstract: In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

19 citations


Posted Content
Haewoon Kwak1, Jisun An1
TL;DR: The structure of global news coverage of disasters and its determinants are revealed by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media from the whole world.
Abstract: In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as the number of population, the political stability, the damage, and more, are well aligned with a series of previous research. Yet, strong regionalism we found in news geography highlights the necessity of the comprehensive dataset for the study of global news coverage.

16 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: The findings show the great potential of Twitter as a platform for paper sharing, but at the same time, indicate the limitations of measuring scientific impact through the lens of social media mainly due to the highly skewed and limited attention to few number of top journals.
Abstract: We explore how research papers are shared in Twitter to understand its potential and limitation of the current practice that measures or predicts the scientific impact of research papers from the web. We track 54 second-level domains offering the top 100 journals listed in Google Scholar and collect 403,165 tweets sharing 75,677 unique research papers by 142,743 users over the course of 135 days. Our findings show the great potential of Twitter as a platform for paper sharing, but at the same time, indicate the limitations of measuring scientific impact through the lens of social media mainly due to the highly skewed and limited attention to few number of top journals.

Proceedings ArticleDOI
Haewoon Kwak1
07 Apr 2014
TL;DR: It is discovered that regional differences affect the likelihood of being reported and the proportion of being punished of toxic players in the Tribunal, and a supervised learning approach is proposed for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections.
Abstract: With the remarkable advances from isolated console games to massively multi-player online role-playing games, the online gaming world provides yet another place where people interact with each other. Online games have attracted attention from researchers, because i) the purpose of actions is relatively clear, and ii) actions are quantifiable. A wide range of predefined actions for supporting social interaction (e.g., friendship, communication, trade, enmity, aggression, and punishment) reflects either positive or negative connotations among game players, and is unobtrusively recorded by the game servers. These rich electronic footprints have become invaluable assets for the research of social dynamics. In particular, exploring negative behavior in online games is a key research direction because it directly influences gaming experience and user satisfaction. Even a few negative players can impact many others because of the design of multi-player games. For this reason these players are called toxic. The definition of toxic play is not cut and dry. Even if someone follows the game rules, he could be considered toxic. For example, killing one player repetitively is often deemed toxic behavior, although it does not break game rules at all. The vagueness of toxicity makes it hard to understand, detect, and prevent it. League of Legends (LoL), created by Riot Games with 70 million users as of 2012, offers a new way to understand toxic behavior. Riot Games develops a crowdsourcing framework, the Tribunal, to judge whether reported toxic behavior should be punished or not. Volunteered players review user reports and vote for either pardon or punishment. As of March 2013, 105 million votes had been collected in North America and Europe. We explore toxic playing and reaction based on large-scale data from the Tribunal[1]. We collect and investigate over 10 million user reports on 1.46 million toxic players and corresponding crowdsourced decisions made in the Tribunal. We crawl data from three different regions, North America, Western Europe, and Korea, to take regional differences of user behavior into account. To obtain the comprehensive view of toxic playing and reaction based on huge data collection, we answer following research questions in a bottom-up approach: how individuals react to toxic players, how teams interact with toxic players, how general toxic or non-toxic players behave across the match, and how crowds make a decision on toxic players. We find large-scale empirical support for some notoriously difficult theories to test in the wild, which are bystander effect, ingroup favoritism, black sheep effect, cohesion-performance relationships, and attribution theory. We also discover that regional differences affect the likelihood of being reported and the proportion of being punished of toxic players in the Tribunal. We then propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections[2]. Using the same sparse information available to the reviewers, we trained classifiers to detect the presence, and severity of toxicity. We built several models oriented around in-game performance, reports by victims of toxic behavior, and linguistic features of chat messages. We found that training with high agreement decisions resulted in more accuracy on low agreement decisions and that our classifier was adept in detecting clear cut innocence. Finally, we showed that our classifier is relatively robust across cultural regions; our classifier built from a North American dataset performed adequately on a European dataset. Ultimately, our work can be used as a foundation for the further study of toxic behavior.

Book ChapterDOI
10 Nov 2014
TL;DR: The huge volume of behavioral data collected from online games helps researchers study human nature in an unprecedented scale and demonstrates competitive advantage of user behavior data collected in online games.
Abstract: With the remarkable advances from isolated console games to massively multi-player online role-playing games, the online gaming world has become invaluable assets for the research of social dynamics [7]. Online game players interact with each other in various ways, as they do in the real world. More importantly, interactions could be easily quantified and logged in detail. The huge volume of behavioral data collected from online games helps researchers study human nature in an unprecedented scale. For instance, Szell et al. observe six different types of in-game interaction (e.g., friendship, communication, trade, enmity, aggression, and punishment) and analyze the inter-dependence of social networks based on each type [12]. Their rich modeling of human society demonstrates competitive advantage of user behavior data collected in online games.