scispace - formally typeset
Search or ask a question
Book ChapterDOI

On the Volume of Geo-referenced Tweets and Their Relationship to Events Relevant for Migration Tracking

TL;DR: The results are a good basis to use communication patterns as future key indicator for migration analysis, and the natural disasters identified in Japan do not show a clear relationship with the changes in the degree of use of the social media tool Twitter.
Abstract: Migration is a major challenge for the European Union, resulting in early preparedness being an imperative for target states and their stakeholders such as border police forces. This preparedness is necessary for multiple reasons, including the provision of adequate search and rescue measures. To support preparedness, there is a need for early indicators for detection of developing migratory push-factors related to imminent migration flows. To address this need, we have investigated the daily number of geo-referenced Tweets in three regions of Ukraine and the whole of Japan from August 2014 until October 2014. This analysis was done by using the data handling tool Ubicity. Additionally, we have identified days when relevant natural, civil or political events took place in order to identify possible event triggered changes of the daily number of Tweets. In all the examined Ukrainian regions a considerable increase in the number of daily Tweets was observed for the election day of a new parliament. Furthermore, we identified a significant decrease in the number of daily Tweets for the Crimea for the whole examined period which could be related to the political changes that took place. The natural disasters identified in Japan do not show a clear relationship with the changes in the degree of use of the social media tool Twitter. The results are a good basis to use communication patterns as future key indicator for migration analysis.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors developed a new algorithm for selecting graphical loglinear models that is suitable for analyzing hyper-sparse contingency tables, showing how multi-way contingency tables can be used to represent patterns of human mobility.
Abstract: Methods for selecting loglinear models were among Steve Fienberg’s research interests since the start of his long and fruitful career. After we dwell upon the string of papers focusing on loglinear models that can be partly attributed to Steve’s contributions and influential ideas, we develop a new algorithm for selecting graphical loglinear models that is suitable for analyzing hyper-sparse contingency tables. We show how multi-way contingency tables can be used to represent patterns of human mobility. We analyze a dataset of geolocated tweets from South Africa that comprises $46$ million latitude/longitude locations of $476\mbox{,}601$ Twitter users that is summarized as a contingency table with $214$ variables.

19 citations

12 Dec 2019
TL;DR: Exploring the potential of social-media data for measuring EU Mobility Flows and Stocks of EU Movers, this article explored the potential for using social media data to measure EU mobility flows and stocks.
Abstract: Exploring the Potential of Social-Media Data for Measuring EU Mobility Flows and Stocks of EU Movers

5 citations

Posted Content
TL;DR: In this article, the authors discuss the importance of leveraging mobile phone data as an alternative data source to gather precious and previously unavailable insights on various aspects of migration, and highlight pending challenges that would need to be addressed before we can effectively benefit from the availability of mobile data to help make better decisions that would ultimately improve millions of people's lives.
Abstract: Statistics on migration flows are often derived from census data, which suffer from intrinsic limitations, including costs and infrequent sampling. When censuses are used, there is typically a time gap - up to a few years - between the data collection process and the computation and publication of relevant statistics. This gap is a significant drawback for the analysis of a phenomenon that is continuously and rapidly changing. Alternative data sources, such as surveys and field observations, also suffer from reliability, costs, and scale limitations. The ubiquity of mobile phones enables an accurate and efficient collection of up-to-date data related to migration. Indeed, passively collected data by the mobile network infrastructure via aggregated, pseudonymized Call Detail Records (CDRs) is of great value to understand human migrations. Through the analysis of mobile phone data, we can shed light on the mobility patterns of migrants, detect spontaneous settlements and understand the daily habits, levels of integration, and human connections of such vulnerable social groups. This Chapter discusses the importance of leveraging mobile phone data as an alternative data source to gather precious and previously unavailable insights on various aspects of migration. Also, we highlight pending challenges that would need to be addressed before we can effectively benefit from the availability of mobile phone data to help make better decisions that would ultimately improve millions of people's lives.

1 citations

Posted Content
TL;DR: In this paper, the authors developed a new algorithm for selecting graphical loglinear models that is suitable for analyzing hyper-sparse contingency tables, which can be used to represent patterns of human mobility.
Abstract: Methods for selecting loglinear models were among Steve Fienberg's research interests since the start of his long and fruitful career. After we dwell upon the string of papers focusing on loglinear models that can be partly attributed to Steve's contributions and influential ideas, we develop a new algorithm for selecting graphical loglinear models that is suitable for analyzing hyper-sparse contingency tables. We show how multi-way contingency tables can be used to represent patterns of human mobility. We analyze a dataset of geolocated tweets from South Africa that comprises 46 million latitude/longitude locations of 476,601 Twitter users that is summarized as a contingency table with 214 variables. KEYWORDS: contingency tables, model selection, human mobility, graphical models, Bayesian structural learning, birth-death processes, pseudo-likelihood
References
More filters
Posted Content
TL;DR: Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
Abstract: Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

848 citations

Journal ArticleDOI
TL;DR: This paper addresses a framework to harvest ambient geospatial information, and resulting hybrid capabilities to analyze it to support situational awareness as it relates to human activities.
Abstract: Social media generated from many individuals is playing a greater role in our daily lives and provides a unique opportunity to gain valuable insight on information flow and social networking within a society. Through data collection and analysis of its content, it supports a greater mapping and understanding of the evolving human landscape. The information disseminated through such media represents a deviation from volunteered geography, in the sense that it is not geographic information per se. Nevertheless, the message often has geographic footprints, for example, in the form of locations from where the tweets originate, or references in their content to geographic entities. We argue that such data conveys ambient geospatial information, capturing for example, people’s references to locations that represent momentary social hotspots. In this paper we address a framework to harvest such ambient geospatial information, and resulting hybrid capabilities to analyze it to support situational awareness as it relates to human activities. We argue that this emergence of ambient geospatial analysis represents a second step in the evolution of geospatial data availability, following on the heels of volunteered geographical information.

522 citations

Proceedings Article
21 Jun 2013
TL;DR: In this paper, the authors compare data collected using Twitter's sampled API service with data collected from the full, albeit costly, Firehose stream that includes every single published tweet, using common statistical metrics as well as metrics that allow them to compare topics, networks, and locations of tweets.
Abstract: Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

469 citations

Journal ArticleDOI
01 Aug 2013
TL;DR: This work presents a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.
Abstract: Microblogging services such as Twitter, Facebook, and Foursquare have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the temporal context of messages. However, exploiting the location information of georeferenced messages, too, is important to detect localized events, such as public events or emergency situations. Users posting messages that are close to the location of an event serve as human sensors to describe an event. In this demonstration, we present a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time. For this, spatio-temporal characteristics of keywords are continuously extracted to identify meaningful candidates for event descriptions. Then, localized event information is extracted by clustering keywords according to their spatial similarity. To determine the most important events in a (recent) time frame, we introduce a scoring scheme for events. We demonstrate the functionality of our system, called Even-Tweet, using a stream of tweets from Europe during the 2012 UEFA European Football Championship.

268 citations

Proceedings ArticleDOI
07 Apr 2014
TL;DR: Geolocated Twitter data can be used to predict turning points in migration trends, which are particularly relevant for migration forecasting, and can substantially improve the understanding of the relationships between internal and international migration.
Abstract: Data about migration flows are largely inconsistent across countries, typically outdated, and often inexistent. Despite the importance of migration as a driver of demographic change, there is limited availability of migration statistics. Generally, researchers rely on census data to indirectly estimate flows. However, little can be inferred for specific years between censuses and for recent trends. The increasing availability of geolocated data from online sources has opened up new opportunities to track recent trends in migration patterns and to improve our understanding of the relationships between internal and international migration. In this paper, we use geolocated data for about 500,000 users of the social network website "Twitter". The data are for users in OECD countries during the period May 2011- April 2013. We evaluated, for the subsample of users who have posted geolocated tweets regularly, the geographic movements within and between countries for independent periods of four months, respectively. Since Twitter users are not representative of the OECD population, we cannot infer migration rates at a single point in time. However, we proposed a difference-in-differences approach to reduce selection bias when we infer trends in out-migration rates for single countries. Our results indicate that our approach is relevant to address two longstanding questions in the migration literature. First, our methods can be used to predict turning points in migration trends, which are particularly relevant for migration forecasting. Second, geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration. Our analysis relies uniquely on publicly available data that could be potentially available in real time and that could be used to monitor migration trends. The Web Science community is well-positioned to address, in future work, a number of methodological and substantive questions that we discuss in this article.

191 citations