Showing papers in "EPJ Data Science in 2015"

PDF

Open Access

Journal Article•DOI•

A survey of results on mobile phone datasets analysis

[...]

Vincent D. Blondel¹, Adeline Decuyper¹, Gautier Krings¹•Institutions (1)

05 Aug 2015-EPJ Data Science

TL;DR: In this article, the authors survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.

...read moreread less

Abstract: In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.

...read moreread less

556 citations

Journal Article•DOI•

Enhancing disease surveillance with novel data streams: challenges and opportunities

[...]

Benjamin M. Althouse¹, Samuel V. Scarpino¹, Lauren Ancel Meyers², Lauren Ancel Meyers¹, John W. Ayers³, Marisa Bargsten⁴, Joan Baumbach⁴, John S. Brownstein⁵, John S. Brownstein⁶, John S. Brownstein⁷, Lauren Castro⁸, Hannah E. Clapham⁹, Derek A. T. Cummings⁹, Sara Y. Del Valle⁸, Stephen Eubank¹⁰, Geoffrey Fairchild⁸, Lyn Finelli¹¹, Nicholas Generous⁸, Dylan B. George¹², David R. Harper¹³, Laurent Hébert-Dufresne¹, Michael A. Johansson¹¹, Kevin J. Konty¹⁴, Marc Lipsitch⁵, Gabriel J. Milinovich¹⁵, Joseph D. Miller¹¹, Elaine O. Nsoesie⁵, Elaine O. Nsoesie⁶, Donald R. Olson¹⁴, Michael J. Paul⁹, Philip M. Polgreen¹⁶, Reid Priedhorsky⁸, Jonathan M. Read¹⁷, Isabel Rodriguez-Barraquer⁹, Derek J. Smith¹⁸, Christian Stefansen¹⁹, David L. Swerdlow²⁰, Deborah L. Thompson⁴, Alessandro Vespignani²¹, Amy Wesolowski⁵ - Show less +36 more•Institutions (21)

16 Oct 2015-EPJ Data Science

TL;DR: This paper outlines a conceptual framework for integrating NDS into current public health surveillance and presents the case that clearly articulating surveillance objectives and systematically evaluating NDS and comparing the performance of NDS to existing surveillance data and alternative NDS data is critical.

...read moreread less

Abstract: Novel data streams (NDS), such as web search data or social media updates, hold promise for enhancing the capabilities of public health surveillance. In this paper, we outline a conceptual framework for integrating NDS into current public health surveillance. Our approach focuses on two key questions: What are the opportunities for using NDS and what are the minimal tests of validity and utility that must be applied when using NDS? Identifying these opportunities will necessitate the involvement of public health authorities and an appreciation of the diversity of objectives and scales across agencies at different levels (local, state, national, international). We present the case that clearly articulating surveillance objectives and systematically evaluating NDS and comparing the performance of NDS to existing surveillance data and alternative NDS data is critical and has not sufficiently been addressed in many applications of NDS currently in the literature.

...read moreread less

220 citations

Journal Article•DOI•

Personalized routing for multitudes in smart cities

[...]

Manlio De Domenico¹, Antonio Lima², Marta C. González³, Alex Arenas¹•Institutions (3)

Rovira i Virgili University¹, University of Birmingham², Massachusetts Institute of Technology³

28 Jan 2015-EPJ Data Science

TL;DR: This paper proposes an adaptive routing strategy which accounts for individual constraints to recommend personalized routes and, at the same time, for constraints imposed by the collectivity as a whole, to reduce the overall traffic in a smart city.

...read moreread less

Abstract: Human mobility in a city represents a fascinating complex system that combines social interactions, daily constraints and random explorations. New collections of data that capture human mobility not only help us to understand their underlying patterns but also to design intelligent systems. Bringing us the opportunity to reduce traffic and to develop other applications that make cities more adaptable to human needs. In this paper, we propose an adaptive routing strategy which accounts for individual constraints to recommend personalized routes and, at the same time, for constraints imposed by the collectivity as a whole. Using big data sets recently released during the Telecom Italia Big Data Challenge, we show that our algorithm allows us to reduce the overall traffic in a smart city thanks to synergetic effects, with the participation of individuals in the system, playing a crucial role.

...read moreread less

147 citations

Journal Article•DOI•

Online social networks and offline protest

[...]

Zachary C. Steinert-Threlkeld¹, Delia Mocanu², Alessandro Vespignani², James H. Fowler¹•Institutions (2)

University of California, San Diego¹, Northeastern University²

09 Nov 2015-EPJ Data Science

TL;DR: It is shown that increased coordination of messages on Twitter using specific hashtags is associated with increased protests the following day, and that traditional actors like the media and elites are not driving the results.

...read moreread less

Abstract: Large-scale protests occur frequently and sometimes overthrow entire political systems. Meanwhile, online social networks have become an increasingly common component of people’s lives. We present a large-scale longitudinal study that connects online social media behaviors to offline protest. Using almost 14 million geolocated tweets and data on protests from 16 countries during the Arab Spring, we show that increased coordination of messages on Twitter using specific hashtags is associated with increased protests the following day. The results also show that traditional actors like the media and elites are not driving the results. These results indicate social media activity correlates with subsequent large-scale decentralized coordination of protests, with important implications for the future balance of power between citizens and their states.

...read moreread less

115 citations

Journal Article•DOI•

High resolution population estimates from telecommunications data

[...]

Rex W. Douglass¹, David A. Meyer¹, Megha Ram, David Rideout¹, Dongjin Song¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

16 May 2015-EPJ Data Science

TL;DR: This paper develops an explicit connection between telecommunications data and the underlying population distribution of Milan, Italy and test the scale invariance of this connection and uses telecommunications data in conjunction with high-resolution census data to create easily updated and potentially real time population estimates in time and space.

...read moreread less

Abstract: Spatial variations in the distribution and composition of populations inform urban development, health-risk analyses, disaster relief, and more. Despite the broad relevance and importance of such data, acquiring local census estimates in a timely and accurate manner is challenging because population counts can change rapidly, are often politically charged, and suffer from logistical and administrative challenges. These limitations necessitate the development of alternative or complementary approaches to population mapping. In this paper we develop an explicit connection between telecommunications data and the underlying population distribution of Milan, Italy. We go on to test the scale invariance of this connection and use telecommunications data in conjunction with high-resolution census data to create easily updated and potentially real time population estimates in time and space.

...read moreread less

100 citations

Journal Article•DOI•

Urban magnetism through the lens of geo-tagged photography

[...]

Silvia Paldino¹, Iva Bojic², Stanislav Sobolevsky², Carlo Ratti², Marta C. González² - Show less +1 more•Institutions (2)

University of Calabria¹, Massachusetts Institute of Technology²

29 May 2015-EPJ Data Science

TL;DR: This paper proposes an unconventional way to study how people experience the city, using information from geotagged photographs that people take at different locations, and compares the spatial behavior of residents and tourists in 10 most photographed cities all around the world.

...read moreread less

Abstract: There is an increasing trend of people leaving digital traces through social media. This reality opens new horizons for urban studies. With this kind of data, researchers and urban planners can detect many aspects of how people live in cities and can also suggest how to transform cities into more efficient and smarter places to live in. In particular, their digital trails can be used to investigate tastes of individuals, and what attracts them to live in a particular city or to spend their vacation there. In this paper we propose an unconventional way to study how people experience the city, using information from geotagged photographs that people take at different locations. We compare the spatial behavior of residents and tourists in 10 most photographed cities all around the world. The study was conducted on both a global and local level. On the global scale we analyze the 10 most photographed cities and measure how attractive each city is for people visiting it from other cities within the same country or from abroad. For the purpose of our analysis we construct the users’ mobility network and measure the strength of the links between each pair of cities as a level of attraction of people living in one city (i.e., origin) to the other city (i.e., destination). On the local level we study the spatial distribution of user activity and identify the photographed hotspots inside each city. The proposed methodology and the results of our study are a low cost mean to characterize touristic activity within a certain location and can help cities strengthening their touristic potential.

...read moreread less

96 citations

Journal Article•DOI•

Sentiment cascades in the 15M movement

[...]

Raquel Álvarez¹, David Garcia², Yamir Moreno¹, Frank Schweitzer²•Institutions (2)

University of Zaragoza¹, ETH Zurich²

30 May 2015-EPJ Data Science

TL;DR: This work performs the first large scale test of theories on collective emotions and social interaction in collective actions of the 15M movement in Spain, and shows that non-rational factors play a role in the formation and activity of social movements through online media, having important consequences for viral spreading.

...read moreread less

Abstract: Recent grassroots movements have suggested that online social networks might play a key role in their organization, as adherents have a fast, many-to-many, communication channel to help coordinate their mobilization. The structure and dynamics of the networks constructed from the digital traces of protesters have been analyzed to some extent recently. However, less effort has been devoted to the analysis of the semantic content of messages exchanged during the protest. Using the data obtained from a microblogging service during the brewing and active phases of the 15M movement in Spain, we perform the first large scale test of theories on collective emotions and social interaction in collective actions. Our findings show that activity and information cascades in the movement are larger in the presence of negative collective emotions and when users express themselves in terms related to social content. At the level of individual participants, our results show that their social integration in the movement, as measured through social network metrics, increases with their level of engagement and of expression of negativity. Our findings show that non-rational factors play a role in the formation and activity of social movements through online media, having important consequences for viral spreading.

...read moreread less

59 citations

Journal Article•DOI•

The effect of recency to human mobility

[...]

Hugo Barbosa¹, Fernando B. de Lima-Neto², Alexandre G. Evsukoff³, Ronaldo Menezes¹•Institutions (3)

Florida Institute of Technology¹, Universidade de Pernambuco², Federal University of Rio de Janeiro³

17 Dec 2015-EPJ Data Science

TL;DR: This work proposes a model in which exploitation in human movement also considers recently-visited locations and not solely frequently-Visited locations, and test the hypothesis against different empirical data of human mobility and show that the proposed model replicates the characteristic patterns of the recency bias.

...read moreread less

Abstract: In recent years, we have seen scientists attempt to model and explain human dynamics and in particular human movement. Many aspects of our complex life are affected by human movement such as disease spread and epidemics modeling, city planning, wireless network development, and disaster relief, to name a few. Given the myriad of applications, it is clear that a complete understanding of how people move in space can lead to considerable benefits to our society. In most of the recent works, scientists have focused on the idea that people movements are biased towards frequently-visited locations. According to them, human movement is based on a exploration/exploitation dichotomy in which individuals choose new locations (exploration) or return to frequently-visited locations (exploitation). In this work we focus on the concept of recency. We propose a model in which exploitation in human movement also considers recently-visited locations and not solely frequently-visited locations. We test our hypothesis against different empirical data of human mobility and show that our proposed model replicates the characteristic patterns of the recency bias.

...read moreread less

56 citations

Journal Article•DOI•

Quantifying socio-economic indicators in developing countries from mobile phone communication data: applications to Côte d’Ivoire

[...]

Huina Mao¹, Xin Shuai², Yong-Yeol Ahn³, Johan Bollen³•Institutions (3)

Oak Ridge National Laboratory¹, Thomson Reuters², Indiana University³

13 Oct 2015-EPJ Data Science

TL;DR: The CallRank indicator is introduced to quantify the relative importance of an area on the basis of call records, and it is shown that a region’s ratio of in- and out-going calls can predict its income level.

...read moreread less

Abstract: The widespread adoption of mobile devices that record the communications, social relations, and movements of billions of individuals in great detail presents unique opportunities for the study of social structures and human dynamics at very large scales. This is particularly the case for developing countries where social and economic data can be hard to obtain and is often too sparse for real-time analytics. Here we leverage mobile call log data from Cote d’Ivoire to analyze the relations between its nation-wide communications network and the socio-economic dynamics of its regional economies. We introduce the CallRank indicator to quantify the relative importance of an area on the basis of call records, and show that a region’s ratio of in- and out-going calls can predict its income level. We detect a communication divide between rich and poor regions of Cote d’Ivoire, which corresponds to existing socio-economic data. Our results demonstrate the potential of mobile communication data to monitor the economic development and social dynamics of low-income developing countries in the absence of extensive econometric and social data. Our work may support efforts to stimulate sustainable economic development and to reduce poverty and inequality.

...read moreread less

44 citations

Journal Article•DOI•

Spatio-temporal techniques for user identification by means of GPS mobility data

[...]

Luca Rossi¹, James Alfred Walker¹, Mirco Musolesi¹, Mirco Musolesi²•Institutions (2)

University of Birmingham¹, University College London²

05 Aug 2015-EPJ Data Science

TL;DR: In this paper, the authors present a series of techniques for identifying individuals from their GPS movements, and study the uniqueness of GPS information for three popular datasets, and provide a detailed analysis of the discriminatory power of speed, direction and distance of travel.

...read moreread less

Abstract: One of the greatest concerns related to the popularity of GPS-enabled devices and applications is the increasing availability of the personal location information generated by them and shared with application and service providers. Moreover, people tend to have regular routines and be characterized by a set of “significant places”, thus making it possible to identify a user from his/her mobility data. In this paper we present a series of techniques for identifying individuals from their GPS movements. More specifically, we study the uniqueness of GPS information for three popular datasets, and we provide a detailed analysis of the discriminatory power of speed, direction and distance of travel. Most importantly, we present a simple yet effective technique for the identification of users from location information that are not included in the original dataset used for training, thus raising important privacy concerns for the management of location datasets.

...read moreread less

44 citations

Journal Article•DOI•

Unveiling patterns of international communities in a global city using mobile phone data

[...]

Paolo Bajardi, Matteo Delfino¹, André Panisson¹, Giovanni Petri¹, Michele Tizzoni¹ - Show less +1 more•Institutions (1)

Institute for Scientific Interchange¹

29 Apr 2015-EPJ Data Science

TL;DR: It is found that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables, suggesting that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

...read moreread less

Abstract: We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

...read moreread less

Journal Article•DOI•

Investigating Causality in Human Behavior from Smartphone Sensor Data: A Quasi-Experimental Approach

[...]

Fani Tsapeli¹, Mirco Musolesi², Mirco Musolesi¹•Institutions (2)

University of Birmingham¹, University College London²

18 Dec 2015-EPJ Data Science

TL;DR: The design, implementation and evaluation of a generic quasi-experimental framework for conducting causation studies on human behavior from smartphone data are discussed and the effectiveness of the approach is demonstrated by investigating the causal impact of several factors such as exercise, social interactions and work on stress level.

...read moreread less

Abstract: Smartphones and wearables have become an indispensable part of our daily life. Their improved sensing and computing capabilities bring new opportunities for human behavior monitoring and analysis. Most work so far has been focused on detecting correlation rather than causation among features extracted from smartphone data. However, pure correlation analysis does not offer sufficient understanding of human behavior. Moreover, causation analysis could allow scientists to identify factors that have a causal effect on health and well-being issues, such as obesity, stress, depression and so on and suggest actions to deal with them. Finally, detecting causal relationships in this kind of observational data is challenging since, in general, subjects cannot be randomly exposed to an event. In this article, we discuss the design, implementation and evaluation of a generic quasi-experimental framework for conducting causation studies on human behavior from smartphone data. We demonstrate the effectiveness of our approach by investigating the causal impact of several factors such as exercise, social interactions and work on stress level. Our results indicate that exercising and spending time outside home and working environment have a positive effect on participants stress level while reduced working hours only slightly impact stress.

...read moreread less

Journal Article•DOI•

Topology and evolution of the network of western classical music composers

[...]

Doheum Park¹, Arram Bae¹, Maximilian Schich², Juyong Park¹•Institutions (2)

KAIST¹, University of Texas at Dallas²

22 Apr 2015-EPJ Data Science

TL;DR: In this article, the authors explore the complex network of western classical composers constructed from a comprehensive CD (Compact Disc) recordings data that represent the centuries-old musical tradition using modern data analysis and modeling techniques.

...read moreread less

Abstract: The expanding availability of high-quality, large-scale data from the realm of culture and the arts promises novel opportunities for understanding and harnessing the dynamics of the creation, collaboration, and dissemination processes - fundamentally network phenomena - of artistic works and styles. To this end, in this paper we explore the complex network of western classical composers constructed from a comprehensive CD (Compact Disc) recordings data that represent the centuries-old musical tradition using modern data analysis and modeling techniques. We start with the fundamental properties of the network such as the degree distribution and various centralities, and find how they correlate with composer attributes such as artistic styles and active periods, indicating their significance in the formation and evolution of the network. We also investigate the growth dynamics of the network, identifying superlinear preferential attachment as a major growth mechanism that implies a future of the musical landscape where an increasing concentration of recordings onto highly-recorded composers coexists with the diversity represented by the growth in the sheer number of recorded composers. Our work shows how the network framework married with data can be utilized to advance our understanding of the underlying principles of complexities in cultural systems.

...read moreread less

Journal Article•DOI•

Understanding the variability of daily travel-time expenditures using GPS trajectory data

[...]

Riccardo Gallotti, Armando Bazzani¹, Sandro Rambaldi¹•Institutions (1)

University of Bologna¹

04 Nov 2015-EPJ Data Science

TL;DR: In this paper, the authors studied the differences in daily travel-time expenditures among 24 Italian cities, extracted from a large set of GPS data on vehicles mobility and introduced a trip duration model to understand variations at the level of individual behavior.

...read moreread less

Abstract: Transportation planning is strongly influenced by the assumption that every individual has a constant daily budget of ≈1 hour for his daily mobility. However, recent experimental results are proving this assumption as wrong. Here, we study the differences in daily travel-time expenditures among 24 Italian cities, extracted from a large set of GPS data on vehicles mobility. To understand these variations at the level of individual behaviour, we introduce a trip duration model that allows for a description of the distribution of travel-time expenditures in a given city using two parameters. The first parameter reflects the accessibility of desired destinations, whereas the second one can be associated to a travel-time budget and represents physiological limits due to stress and fatigue. Within the same city, we observe variations in the distributions according to home position, number of mobility days and a driver’s average number of daily trips. These results can be interpreted by a stochastic time-consumption model, where the generalised cost of travel times is given by a logarithmic-like function, in agreement with the Weber-Fechner law. Our experimental results show a significant variability in the travel-time budgets in different cities, and for different categories of drivers within the same city. This explicitly clashes with the idea of the existence of a constant travel-time budget and opens new perspectives for the modelling and governance of urban mobility.

...read moreread less

Journal Article•DOI•

Win-stay lose-shift strategy in formation changes in football

[...]

Kohei Tamura¹, Naoki Masuda²•Institutions (2)

University of Tokyo¹, University of Bristol²

17 Jul 2015-EPJ Data Science

TL;DR: Japanese and German football data are used to investigate correlates between temporal patterns of formation changes across matches and match results and it is found that individual teams and managers both showed win-stay lose-shift behavior, a type of reinforcement learning.

...read moreread less

Abstract: Managerial decision making is likely to be a dominant determinant of performance of teams in team sports. Here we use Japanese and German football data to investigate correlates between temporal patterns of formation changes across matches and match results. We found that individual teams and managers both showed win-stay lose-shift behavior, a type of reinforcement learning. In other words, they tended to stick to the current formation after a win and switch to a different formation after a loss. In addition, formation changes did not statistically improve the results of succeeding matches. The results indicate that a swift implementation of a new formation in the win-stay lose-shift manner may not be a successful managerial rule of thumb.

...read moreread less

Journal Article•DOI•

Whom should we sense in “social sensing” - analyzing which users work best for social media now-casting

[...]

Jisun An¹, Ingmar Weber¹•Institutions (1)

Qatar Computing Research Institute¹

30 Nov 2015-EPJ Data Science

TL;DR: In this paper, the authors investigate how different sampling strategies affect the performance of now-casting of two common offline indices: flu activity and unemployment rate, and they conclude that Babblers are better than non-babblers.

...read moreread less

Abstract: Given the ever increasing amount of publicly available social media data, there is growing interest in using online data to study and quantify phenomena in the offline “real” world. As social media data can be obtained in near real-time and at low cost, it is often used for “now-casting” indices such as levels of flu activity or unemployment. The term “social sensing” is often used in this context to describe the idea that users act as “sensors”, publicly reporting their health status or job losses. Sensor activity during a time period is then typically aggregated in a “one tweet, one vote” fashion by simply counting. At the same time, researchers readily admit that social media users are not a perfect representation of the actual population. Additionally, users differ in the amount of details of their personal lives that they reveal. Intuitively, it should be possible to improve now-casting by assigning different weights to different user groups. In this paper, we ask “How does social sensing actually work?” or, more precisely, “Whom should we sense-and whom not-for optimal results?”. We investigate how different sampling strategies affect the performance of now-casting of two common offline indices: flu activity and unemployment rate. We show that now-casting can be improved by (1) applying user filtering techniques and (2) selecting users with complete profiles. We also find that, using the right type of user groups, now-casting performance does not degrade, even when drastically reducing the size of the dataset. More fundamentally, we describe which type of users contribute most to the accuracy by asking if “babblers are better”. We conclude the paper by providing guidance on how to select better user groups for more accurate now-casting.

...read moreread less

Journal Article•DOI•

Making big data work: smart, sustainable, and safe cities

[...]

Bruno Lepri, Fabrizio Antonelli¹, Fabio Pianesi, Alex Pentland²•Institutions (2)

Telecom Italia¹, Massachusetts Institute of Technology²

16 Oct 2015-EPJ Data Science

TL;DR: The goal of the present thematic series is to showcase some of the most relevant contributions submitted to the ‘Telecom Italia Big Data Challenge 2014’ and to provide a discussion venue about recent advances in the appplication of mobile phone and social media data to the study of individual and collective behaviors.

...read moreread less

Abstract: The goal of the present thematic series is to showcase some of the most relevant contributions submitted to the ‘Telecom Italia Big Data Challenge 2014’ and to provide a discussion venue about recent advances in the appplication of mobile phone and social media data to the study of individual and collective behaviors. Particular attention is devoted to data-driven studies aimed at understanding city dynamics. These studies include: modeling individual and collective traffic patterns and automatically identifying areas with traffic congestion, creating high-resolution population estimates for Milan inhabitants, clustering urban dynamics of migrants and visitors traveling to a city for business or tourism, and investigating the relationship between urban communication and urban happiness.

...read moreread less

Journal Article•DOI•

Misery loves company: happiness and communication in the city

[...]

Aamena Alshamsi¹, Edmond Awad¹, Maryam Almehrezi¹, Vahan Babushkin¹, Pai-Ju Chang¹, Zakariyah Shoroye¹, Attila-Péter Tóth¹, Iyad Rahwan², Iyad Rahwan¹ - Show less +5 more•Institutions (2)

Masdar Institute of Science and Technology¹, Massachusetts Institute of Technology²

02 Jul 2015-EPJ Data Science

TL;DR: It is revealed that happy (respectively unhappy) areas preferentially communicate with other areas of their type, which constitutes evidence of homophilous communities at the scale of an entire city (Milan), and has implications on interventions that aim to improve urban well-being.

...read moreread less

Abstract: The high population density in cities confers many advantages, including improved social interaction and information exchange. However, it is often argued that urban living comes at the expense of reducing happiness. The goal of this research is to shed light on the relationship between urban communication and urban happiness. We analyze geo-located social media posts (tweets) within a major urban center (Milan) to produce a detailed spatial map of urban sentiments. We combine this data with high-resolution mobile communication intensity data among different urban areas. Our results reveal that happy (respectively unhappy) areas preferentially communicate with other areas of their type. This observation constitutes evidence of homophilous communities at the scale of an entire city (Milan), and has implications on interventions that aim to improve urban well-being.

...read moreread less

Journal Article•DOI•

Mining open datasets for transparency in taxi transport in metropolitan environments

[...]

Anastasios Noulas¹, Anastasios Noulas², Vsevolod Salnikov³, Renaud Lambiotte³, Cecilia Mascolo¹ - Show less +1 more•Institutions (3)

University of Cambridge¹, Lancaster University², Université de Namur³

10 Dec 2015-EPJ Data Science

TL;DR: In this article, the authors explore the power of the new generation of open datasets towards understanding the impact of new disruption technologies that emerge in the area of public transport, and provide a direct price comparison between Uber and the Yellow Cab company in New York.

...read moreread less

Abstract: Uber has recently been introducing novel practices in urban taxi transport. Journey prices can change dynamically in almost real time and also vary geographically from one area to another in a city, a strategy known as surge pricing. In this paper, we explore the power of the new generation of open datasets towards understanding the impact of the new disruption technologies that emerge in the area of public transport. With our primary goal being a more transparent economic landscape for urban commuters, we provide a direct price comparison between Uber and the Yellow Cab company in New York. We discover that Uber, despite its lower standard pricing rates, effectively charges higher fares on average, especially during short in length, but frequent in occurrence, taxi journeys. Building on this insight, we develop a smartphone application, OpenStreetCab, that offers a personalized consultation to mobile users on which taxi provider is cheaper for their journey. Almost five months after its launch, the app has attracted more than three thousand users in a single city. Their journey queries have provided additional insights on the potential savings similar technologies can have for urban commuters, with a highlight being that on average, a user in New York saves 6 U.S. Dollars per taxi journey if they pick the cheapest taxi provider. We run extensive experiments to show how Uber’s surge pricing is the driving factor of higher journey prices and therefore higher potential savings for our application’s users. Finally, motivated by the observation that Uber’s surge pricing is occurring more frequently that intuitively expected, we formulate a prediction task where the aim becomes to predict a geographic area’s tendency to surge. Using exogenous to Uber data, in particular Yellow Cab and Foursquare data, we show how it is possible to estimate customer demand within an area, and by extension surge pricing, with high accuracy.

...read moreread less

Journal Article•DOI•

Testing the hypothesis of preferential attachment in social network formation

[...]

Thomas House¹, Thomas House², Jonathan M. Read³, Leon Danon⁴, Matthew James Keeling¹ - Show less +1 more•Institutions (4)

University of Warwick¹, University of Manchester², Lancaster University³, University of Bristol⁴

09 Oct 2015-EPJ Data Science

TL;DR: This work presents a general statistical method to test directly for evidence of PA in count data and applies this to data for contacts relevant to the spread of respiratory diseases.

...read moreread less

Abstract: The hypothesis of preferential attachment (PA) - whereby better connected individuals make more connections - is hotly debated, particularly in the context of epidemiological networks. The simplest models of PA, for example, are incompatible with the eradication of any disease through population-level control measures such as random vaccination. Typically, evidence has been sought for the presence or absence of preferential attachment via asymptotic power-law behaviour. Here, we present a general statistical method to test directly for evidence of PA in count data and apply this to data for contacts relevant to the spread of respiratory diseases. We find that while standard methods for model selection prefer a form of PA, careful analysis of the best fitting PA models allows for a level of contact heterogeneity that in fact allows control of respiratory diseases. Our approach is based on a flexible but numerically cheap likelihood-based model that could in principle be applied to other integer data where the hypothesis of PA is of interest.

...read moreread less

Journal Article•DOI•

Unbiased metrics of friends’ influence in multi-level networks

[...]

Alexandre Vidmer¹, Matúš Medo¹, Yi-Cheng Zhang¹•Institutions (1)

University of Fribourg¹

14 Nov 2015-EPJ Data Science

TL;DR: It is demonstrated that the currently existing metrics of friends’ influence are biased by the presence of highly popular items in the data, and as a result can lead to an illusion of friends influence where there is none, and three metrics are developed that allow to distinguish the influence of friends from the effects of item popularity.

...read moreread less

Abstract: The spreading of information is of crucial importance for the modern information society. While we still receive information from mass media and other non-personalized sources, online social networks and influence of friends have become important personalized sources of information. This calls for metrics to measure the influence of users on the behavior of their friends. We demonstrate that the currently existing metrics of friends’ influence are biased by the presence of highly popular items in the data, and as a result can lead to an illusion of friends influence where there is none. We correct for this bias and develop three metrics that allow to distinguish the influence of friends from the effects of item popularity, and apply the metrics on real datasets. We use a simple network model based on the influence of friends and preferential attachment to illustrate the performance of our metrics at different levels of friends’ influence.

...read moreread less

Journal Article•DOI•

Complex networks and public funding: the case of the 2007-2013 Italian program

[...]

Stefano Nicotri¹, Eufemia Tinelli², Nicola Amoroso¹, Nicola Amoroso², Elena Garuccio³, Roberto Bellotti², Roberto Bellotti¹ - Show less +3 more•Institutions (3)

Istituto Nazionale di Fisica Nucleare¹, University of Bari², University of Siena³

09 Jul 2015-EPJ Data Science

TL;DR: In this paper, the authors apply techniques of complex network analysis to data sources representing public funding programs and discuss the importance of the considered indicators for program evaluation, starting from the Open Data repository of the 2007-2013 Italian Program Programma Operativo Nazionale ‘Ricerca e Competitivita’ (PON-R&C), they build a set of data models and perform network analysis over them.

...read moreread less

Abstract: In this paper we apply techniques of complex network analysis to data sources representing public funding programs and discuss the importance of the considered indicators for program evaluation. Starting from the Open Data repository of the 2007-2013 Italian Program Programma Operativo Nazionale ‘Ricerca e Competitivita’ (PON R&C), we build a set of data models and perform network analysis over them. We discuss the obtained experimental results outlining interesting new perspectives that emerge from the application of the proposed methods to the socio-economical evaluation of funded programs.

...read moreread less

Journal Article•DOI•

Product assortment and customer mobility

[...]

Michele Coscia¹, Diego Pennacchioli², Diego Pennacchioli³, Fosca Giannotti³•Institutions (3)

Harvard University¹, Istituto di Scienza e Tecnologie dell'Informazione², IMT Institute for Advanced Studies Lucca³

09 Oct 2015-EPJ Data Science

TL;DR: It is shown that larger shops are able to retain most of their closest customers and they are ableto catch large portions of customers from smaller shops around them, providing an empirical validation of the Central Place Theory.

...read moreread less

Abstract: Customers mobility is dependent on the sophistication of their needs: sophisticated customers need to travel more to fulfill their needs. In this paper, we provide more detailed evidence of this phenomenon, providing an empirical validation of the Central Place Theory. For each customer, we detect what is her favorite shop, where she purchases most products. We can study the relationship between the favorite shop and the closest one, by recording the influence of the shop’s size and the customer’s sophistication in the discordance cases, i.e. the cases in which the favorite shop is not the closest one. We show that larger shops are able to retain most of their closest customers and they are able to catch large portions of customers from smaller shops around them. We connect this observation with the shop’s larger sophistication, and not with its other characteristics, as the phenomenon is especially noticeable when customers want to satisfy their sophisticated needs. This is a confirmation of the recent extensions of the Central Place Theory, where the original assumptions of homogeneity in customer purchase power and needs are challenged. Different types of shops have also different survival logics. The largest shops get closed if they are unable to catch customers from the smaller shops, while medium size shops get closed if they cannot retain their closest customers. All analysis are performed on a large real-world dataset recording all purchases from millions of customers across the west coast of Italy.

...read moreread less

Journal Article•DOI•

Router-level community structure of the Internet Autonomous Systems

[...]

Mariano G. Beiró¹, Mariano G. Beiró², Sebastian P. Grynberg², J. Ignacio Alvarez-Hamelin², J. Ignacio Alvarez-Hamelin³ - Show less +1 more•Institutions (3)

Institute for Scientific Interchange¹, University of Buenos Aires², National Scientific and Technical Research Council³

15 Aug 2015-EPJ Data Science

TL;DR: This work develops a low-complexity multiresolution modularity optimization algorithm that finds communities at different resolution levels in a continuous scale, in one single run and shows that with a scarce knowledge of the node affiliations, multiresolved methods can be adjusted to retrieve the Autonomous Systems, significantly improving the results of classical single-resolution methods.

...read moreread less

Abstract: The Internet is composed of routing devices connected between them and organized into independent administrative entities: the Autonomous Systems. The existence of different types of Autonomous Systems (like large connectivity providers, Internet Service Providers or universities) together with geographical and economical constraints, turns the Internet into a complex modular and hierarchical network. This organization is reflected in many properties of the Internet topology, like its high degree of clustering and its robustness. In this work we study the modular structure of the Internet router-level graph in order to assess to what extent the Autonomous Systems satisfy some of the known notions of community structure. We observe that most of the classical community detection methods fail to detect the Autonomous Systems as communities, mainly because the modular structure of the Internet (as that of many complex networks) is much richer than what can be captured by optimizing a global functional: Autonomous Systems have largely variable sizes, structures and functions. Classical methods are severely affected by resolution limits and by the heterogeneity of the communities; even when using multiresolution methods, there is no single resolution at which most of the communities can be captured. However, we show that multiresolution methods do find the community structure of the Autonomous Systems, but each of them has to be observed at the correct resolution level. Then we develop a low-complexity multiresolution modularity optimization algorithm that finds communities at different resolution levels in a continuous scale, in one single run. Using this method, we show that with a scarce knowledge of the node affiliations, multiresolution methods can be adjusted to retrieve the Autonomous Systems, significantly improving the results of classical single-resolution methods. Finally, in the light of our results, we discuss recent work concerning the use of a priori information to find community structure in complex networks.

...read moreread less