Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Citations

PDF

Open Access

More filters

Posted Content•

A Survey on Bias and Fairness in Machine Learning

[...]

Ninareh Mehrabi¹, Fred Morstatter¹, Nripsuta Saxena¹, Kristina Lerman¹, Aram Galstyan¹ - Show less +1 more•Institutions (1)

Information Sciences Institute¹

23 Aug 2019-arXiv: Learning

TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.

...read moreread less

Abstract: With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

...read moreread less

1,571 citations

Cites background from "Is the Sample Good Enough? Comparin..."

...The differences and biases in the networks can be a result of many factors, such as network sampling, as shown in [51, 91], which can change the network measures and cause different types of problems....
[...]

Journal Article•DOI•

Social bots distort the 2016 U.S. Presidential election online discussion

[...]

Alessandro Bessi¹, Emilio Ferrara¹•Institutions (1)

University of Southern California¹

03 Nov 2016-First Monday

TL;DR: The findings suggest that the presence of social media bots can indeed negatively affect democratic political discussion rather than improving it, which in turn can potentially alter public opinion and endanger the integrity of the Presidential election.

...read moreread less

Abstract: Social media have been extensively praised for increasing democratic discussion on social issues related to policy and politics. However, what happens when this powerful communication tools are exploited to manipulate online discussion, to change the public perception of political entities, or even to try affecting the outcome of political elections? In this study we investigated how the presence of social media bots, algorithmically driven entities that on the surface appear as legitimate users, affect political discussion around the 2016 U.S. Presidential election. By leveraging state-of-the-art social bot detection algorithms, we uncovered a large fraction of user population that may not be human, accounting for a significant portion of generated content (about one-fifth of the entire conversation). We inferred political partisanships from hashtag adoption, for both humans and bots, and studied spatio-temporal communication, political support dynamics, and influence mechanisms by discovering the level of network embeddedness of the bots. Our findings suggest that the presence of social media bots can indeed negatively affect democratic political discussion rather than improving it, which in turn can potentially alter public opinion and endanger the integrity of the Presidential election.

...read moreread less

767 citations

Journal Article•DOI•

Geo-located Twitter as proxy for global mobility patterns

[...]

Bartosz Hawelka¹, Izabela Sitko¹, Euro Beinat², Stanislav Sobolevsky¹, Pavlos Kazakopoulos², Carlo Ratti¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Salzburg²

23 Apr 2014-Cartography and Geographic Information Science

TL;DR: This article analyses geo-located Twitter messages in order to uncover global patterns of human mobility and reveals spatially cohesive regions that follow the regional division of the world.

...read moreread less

Abstract: Pervasive presence of location-sharing services made it possible for researchers to gain an unprecedented access to the direct records of human activity in space and time. This article analyses geo...

...read moreread less

634 citations

Cites background or methods from "Is the Sample Good Enough? Comparin..."

...The stream was gathered through the Twitter Streaming API.2 Although the service sets a limit on how much data can be accessed to less than 1% of the total Twitter stream, the total geo-located content was found not to exceed this restriction (Morstatter et al. 2013)....
[...]
...These geo-located tweets account for around 1% of the total feed (Morstatter et al. 2013)....
[...]
...Although the service sets a limit on how much data can be accessed to less than 1% of the total Twitter stream, the total geo-located content was found not to exceed this restriction (Morstatter et al. 2013)....
[...]

Monograph•DOI•

Social Media Mining: An Introduction

[...]

Reza Zafarani¹, Mohammad Ali Abbasi¹, Huan Liu¹•Institutions (1)

Arizona State University¹

28 Apr 2014

TL;DR: Social Media Mining introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining.

...read moreread less

Abstract: The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.

...read moreread less

550 citations

Cites background from "Is the Sample Good Enough? Comparin..."

...[203] studied whether Twitter’s heavily sampled Streaming API, a free service for social media data, accurately portrays the true activity on Twitter....
[...]

Journal Article•DOI•

Twitter use in election campaigns: A systematic literature review

[...]

Andreas Jungherr

07 Mar 2016-Journal of Information Technology & Politics

TL;DR: A systematic literature review of 127 studies addressing the use of Twitter in election campaigns is presented in this paper, where the authors discuss the available research with regard to findings on the use by parties, candidates, and publics during election campaigns and during mediated campaign events.

...read moreread less

Abstract: Twitter has become a pervasive tool in election campaigns. Candidates, parties, journalists, and a steadily increasing share of the public are using Twitter to comment on, interact around, and research public reactions to politics. These uses have met with growing scholarly attention. As of now, this research is fragmented, lacks a common body of evidence, and shared approaches to data collection and selection. This article presents the results of a systematic literature review of 127 studies addressing the use of Twitter in election campaigns. In this systematic review, I will discuss the available research with regard to findings on the use of Twitter by parties, candidates, and publics during election campaigns and during mediated campaign events. Also, I will address prominent research designs and approaches to data collection and selection.

...read moreread less

495 citations

Cites result from "Is the Sample Good Enough? Comparin..."

...Although all of these approaches might lead to comparable and stable results, there are only a few studies systematically testing whether these approaches indeed produce identical data sets (e.g., Morstatter et al., 2013)....
[...]

Collapse

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Citations

Cites background from "Is the Sample Good Enough? Comparin..."

Cites background or methods from "Is the Sample Good Enough? Comparin..."

Cites background from "Is the Sample Good Enough? Comparin..."

Cites result from "Is the Sample Good Enough? Comparin..."

References

"Is the Sample Good Enough? Comparin..." refers background or methods in this paper

"Is the Sample Good Enough? Comparin..." refers methods in this paper

"Is the Sample Good Enough? Comparin..." refers methods in this paper

Related Papers (5)

Trending Questions (1)