scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Tale of Ten Cities: Characterizing Signatures of Mobile Traffic in Urban Areas

TL;DR: An original technique to identify classes of mobile traffic signatures that are distinctive of different urban fabrics is introduced, which outperforms previous approaches when confronted to ground-truth information, and allows characterizing the mobile demand in greater detail than that attained in the literature to date.
Abstract: Urban landscapes present a variety of socio-topological environments that are associated to diverse human activities. As the latter affect the way individuals connect with each other, a bound exists between the urban tissue and the mobile communication demand. In this paper, we investigate the heterogeneous patterns emerging in the mobile communication activity recorded within metropolitan regions. To that end, we introduce an original technique to identify classes of mobile traffic signatures that are distinctive of different urban fabrics. Our proposed technique outperforms previous approaches when confronted to ground-truth information, and allows characterizing the mobile demand in greater detail than that attained in the literature to date. We apply our technique to extensive real-world data collected by major mobile operators in 10 cities. Results unveil the diversity of baseline communication activities across countries, but also provide evidence of the existence of a number of mobile traffic signatures that are common to all studied areas and specific to particular land uses.

Summary (6 min read)

1 INTRODUCTION

  • It is commonly accepted that most individuals exhibit mobility and activity patterns –driven by their family life, work obligations, hobbies, occupations and personal habits, as well as by the presence of infrastructures, services and amenities– that are highly repetitive and yet very distinctive.
  • These considerations apply to mobile network subscribers and their communication habits as well, as highlighted in a recent, extensive survey of mobile traffic data analyses [2].
  • Specifically, it has been shown that there exist strong relationships between the mobile communication activity and what the authors refer to as urban fabrics, i.e., the combination of infrastructure (e.g., roads, transportation systems, and sports, education, or healthcare facilities) and land use (e.g., residential, industrial, or commercial) that characterizes different zones within a same metropolitan area.
  • Motivated by these results, the authors delve deeper in the characterization of the spatial heterogeneity of mobile communication activities.

3 MOBILE TRAFFIC SIGNATURES

  • For each day, the mobile demand is stored as the aggregate of the traffic generated by all users in a same area during a given time interval; the size of the area and duration of the interval determine the spatial and temporal granularity of the dataset, respectively.
  • (i) summarizing the mobile traffic activity in each unit areas into a meaningful profile, i.e., the unit area signature (first three phases); (ii) grouping similar unit area signatures into a limited set of classes, each exhibiting a unique behavior (last three phases), also known as These phases aim at.
  • The metric controls the actual information in each dataset entry va(d, t).
  • The signature pairwise distance measure determines the degree of similarity of two signatures.
  • Unit areas can map to, e.g., cell sector boundaries, coverage zones of base stations, Voronoi cells, or elements of a grid.

3.1 Weekday-Weekend Signature (WWS)

  • In the definition by Soto et al. [22], mobile traffic signatures correspond to the average voice and text volume observed during (i) a typical working day, and (ii) a typical weekend day.
  • The authors will thus refer to this approach as Weekday-Weekend Signature (WWS).
  • (5) Finally, the clustering of signatures is performed by running a k-means algorithm over the set of all signatures 2. Instead, data traffic is often generated autonomously by applications running or updating in background, and it is thus less representative of the actual occupations of the user.
  • In all original case studies, the best results are always obtained with k = 5.

3.2 Typical Week Signature (TWS)

  • Grauwin et al. [23] propose a variation of WWS, named Typical Week Signature (TWS).
  • Also in this case, the signature metric adds up voice and text volumes.
  • The authors recall that µ(·) represents the mean of the set within parentheses.
  • The only difference from the WWS approach is that the choice of k is guided by the local maxima of the Silhouette Index [26].

3.3 Seasonal Communication Series (SCS)

  • The solution by Cici et al. [24], named Seasonal Communication Series (SCS), considers the whole timeserie in each unit area.
  • In such a definition, the number of elements that compose a signature is not fixed, but depends on the timespan of the dataset D. Also, (8) does not involve any compression, which calls for denoising: to that end, SCS applies a Fast Fourier Transform (FFT) to the signature, so as to clean it from irregular patterns.
  • More precisely, once converted to the frequency domain with FFT, only the highest power frequencies are kept, and the time signal is reconstructed with an Inverse FFT (IFFT) from the retained frequencies.
  • To that end, the skewness of the cluster sizes is evaluated at the different levels of the dendrogram built by the hierarchical clustering: selecting the level with minimum skewness allows grouping unit area signatures into classes of relatively comparable sizes.
  • Since this makes the analysis cumbersome, SCS limits the analysis to the 10 largest classes, which they consider to represent the most relevant urban fabrics in the considered region.

3.4 Median Week Signature (MWS)

  • The current approaches presented above are based on a variety of signature definitions, pairwise distance measures and clustering approaches.
  • First, it has been repeatedly shown that there exists a strong weekly periodicity in human occupations [27], [28], which implies that most of the diversity in mobile traffic activity occurs within a one-week period.
  • Third, understanding (i) whether denoising is beneficial to the signature definition, (ii) which normalization works the best, and (iii) how signature distance measures affect the results is not trivial.
  • A sensible choice needs substantial empirical tests on representative data.
  • The MWS is computed according to the guidelines above, as follows.

4.1 Comparative evaluation datasets

  • The mobile traffic data is provided in both scenarios by Telecom Italia Mobile (TIM), as part of their Big Data Challenge initiative [30].
  • The ground-truth information consists instead in land use data retrieved from open databases of local authorities.
  • As a matter of fact, the authors assess the quality of signature classes identified via each technique by verifying their congruence with the nature of the underlying urban fabrics.
  • This methodology is consistent with those employed in the literature [24], and stems from the expectation that human activities, including mobile communications, are strongly related to the type of city facilities around them.
  • A detailed description of the data follows.

4.1.1 Milan

  • The first urban scenario is that of Milan, Italy.
  • The mobile traffic dataset is referred as Mi-13, and describes the communication activity of TIM subscribers in the conurbation of the city for a two-month period in November and December 2013.
  • In their study, the authors consider a region of approximately 150 Km2 containing 2726 cells.
  • The Mi-13 dataset is the same used in [24] for the evaluation of the SCS approach with respect to WWS.
  • Specifically, it reports, for each unit area in Milan, the number of local inhabitants, business activities, sport centers, universities, schools and bus stops, as well as the percentage of unit area that is covered by green spaces, such as parks or woods.

4.1.2 Turin

  • Mobile traffic information in the region is provided by a dataset, referred to as Tu-15, which describes the mobile traffic activity of TIM customers in the region during March and April 2015.
  • The spatial tessellation of the geographical surface provided by the network operator is different from that in Mi-13; cells, i.e., unit areas, are not regular, but feature heterogeneous sizes that mimic the non-uniform coverage provided by each base station in the region.
  • For the sake of consistency with the Milan case study, the authors limit their analysis to four weeks of data, from the March 1 to March 28, 2015.
  • The authors collected ground-truth data for each unit area in Tu15 by leveraging open data published by the local municipality [32].
  • The authors selected information related to the latitudelongitude coordinates of schools, universities and business activities, and they associated them to individual unit areas, exactly as done for Milan in [24].

4.2 Metrics

  • In order to evaluate the consistency of the signature classes with respect to the ground-truth data, the authors introduce a set of suitable metrics, presented next.
  • The authors remark that, for the sake of comparability of their study with previous research, they include in their list the metrics proposed in [24].

4.2.4 F-score

  • The F-score provides a single value that summarizes the quality of the signature classes.
  • The Fscore index ranges in [0, 1], with 1 indicating the best performance, i.e., minimum entropy and maximum coverage.

4.3 Results

  • Fig. 2 and Fig. 3 show the entropy, coverage and F-score for the state-of-the-art approaches of WWS, TWS and SCS as well as for their proposed MWS-stdscr-euclidean and MWSstdscr-pearson solutions.
  • As shown in Fig. 2a, regardless of the signature pairwise distance measure used (Euclidean or based on Pearson correlation coefficient), the solutions based on the MWS model attain a significantly lower entropy than that granted by previous approaches.
  • The conclusions above are not specific to the Milan case study, as Fig 3 shows that they hold also for the Turin scenario.
  • For the sake of brevity, the authors limit results to the F-score, which is a more comprehensive metric according to their previous analysis.
  • Slightly better performance is achieved by the MWS-daily-euclidean approach, and thus Euclidean distance appears to work better in combination with a daily normalization.

5 SIGNATURE ANALYSIS

  • The authors leverage the MWS-stdscr-pearson technique to extract meaningful classes of mobile traffic signatures in a substantial set of urban scenarios in Italy and France.
  • Such a study allows characterizing mobile communication dynamics and their intertwining with the urban landscape with high accuracy, across diverse cities and countries.
  • To that end, the authors first introduce in Sec. 5.1 the mobile traffic datasets they employ in their study.
  • Then, Sec. 5.3–Sec. 5.6 focus on a subset of such classes, especially interesting through their repeated occurrence or, on the contrary, their peculiarity.

5.1 Signature analysis datasets

  • The authors datasets describe the mobile communication activity recorded in four major cities in Italy i.e., Milan, Turin, Rome and Trento, as well as in six major cities in France, i.e., Paris, Lyon, Marseille, Toulouse, Lille and Bordeaux.
  • 2 labels the datasets and summarizes their main features.
  • In all case studies, the authors consider a geographic region of 150 Km2 around the city center.
  • There, the authors collect time series of the mobile traffic demand generated by subscribers of major operators, i.e., Orange in France and TIM in Italy.
  • Time slots span one hour, as this is the maximum precision granted by the data.

5.2 Overview of signature classes

  • The authors use the MWS-stdscr-pearson methodology to determine mobile traffic signature classes for the 6,581 unit areas that cover the urban regions in the reference datasets of Tab.
  • These numbers underscore how the scale of their study, encompassing ten cities and hundreds of signature classes, is significantly larger than that of previous works, focusing on one to three cities and five to ten signatures.
  • The plot shows how classes (rows) are distributed across cities .
  • In the remainder of this section, the authors will explore the causes behind the classification features outlined above, and more.
  • The consistency of this behavior lets us speculate that the preprocessing enforced on the TIM datasets may have induced important information loss, flattening the diversity of mobile traffic activity.

5.3 Residential urban fabrics

  • The authors start their analysis by studying the signature classes that appear the most frequently in the reference urban regions.
  • This class characterizes all unit areas of the analyzed Italian cities that do not present any noticeable infrastructure and that do not draw any particular activity of inhabitants.
  • 3 shapes, in Fig. 7e and Fig. 7f, respectively, reveals significant similarities in the semantics of the two signatures.
  • This diversity is imputable to different routines in the two countries, and entails interesting sociological questions.

5.4 Office urban fabrics

  • Unit areas with signatures matching class c2 are extensively present in both Italy and France, as shown in Fig.
  • Thus, the authors do not treat OSM data as ground truth; rather, it provides hints towards a correct interpretation of the mobile traffic signatures.
  • The signature is characterized by a fairly constant activity during office hours; more importantly, mobile activities tend to disappear during the weekend, when a very small fraction of offices is open.
  • This suggests that unit areas characterized by c5 still contain mostly offices, but have a minor presence of residential fabrics.

5.5 Transportation urban fabrics

  • A very distinctive class emerging from Fig. 6 is c4, which only appears in Paris, and includes more than 10% of the unit areas in the city.
  • During the weekend, all peaks basically vanish.
  • 10, whose classes capture mobile traffic dynamics that are found in both Italian and French cities.
  • In these cases, very specific signatures are associated to the few (possibly one) unit areas covering the railway station.

5.6 Touristic and leisure urban fabrics

  • Fig. 11b and Fig. 11c show examples for the unit areas in Milan and Turin, respectively: large parts of the cities historical centers, where famous monuments, museums and squares are located, have mobile traffic signatures that belong to this class.
  • Details of the labeled locations in the maps are provided in Tab.
  • Still, the clear indicator of visitor activity, which allows separating areas in c16 from office-only zones is the persistent mobile communication load during weekends.

5.7 Unique urban fabrics

  • All signature classes presented above describe mobile traffic dynamics that are common to a significant number of unit areas across cities of countries.
  • The traffic peaks match exactly the weekly blessing ceremonies of the Pope in that place, which regularly gather a large, diverse audience.
  • In fact, outlying mobile traffic signatures are often associated to large-scale social events.
  • A representative case are sports events that attract supporters to stadiums.
  • C150, whose signature and unit area are in Fig. 12b and Fig. 12f, refers to the Vélodrome stadium in Marseille: the local football team plays both national (during weekends) and international (on Tuesday) matches, which reflects in unique peaks.

6 DISCUSSION

  • The authors signature analysis provides a number of interesting cues that stimulate discussion on the interplay between ur- ban fabrics and mobile traffic dynamics.
  • Below, the authors summarize the main takeaway messages, separating observations that they find intuitive from insights they deem surprising.
  • Also, one can anticipate that residential areas occupy most of the urban surface, and, thus, that residential mobile traffic signatures are the most common temporal profiles that operators must assume their networks to accommodate: the results in Sec. 5 confirm all these speculations.
  • First, residential mobile traffic in Italy and France shows striking differences, which one would hardly expect from countries that are in geographical and cultural proximity.
  • Fourth, considering ten different cities at once allows us to comment on the diversity observed across them.

7 CONCLUSIONS

  • Today, mobile communications permeate their social life.
  • An interesting side effect of mobile device pervasiveness is the possibility of analyzing datasets collected by network operators for fine-grained analyses of subscribers’ endeavors.
  • The authors unveiled the strong intertwining between the mobile traffic activity and the urban fabrics that characterize the areas when such activity takes place.
  • The authors did so by (i) devising a novel signature classification technique that outperforms current state-of-the-art solutions, and (ii) using their technique on a dataset of unprecedented scale and heterogeneity.
  • The proposed methodology has applications in automatic land use detection and network management.

Did you find this useful? Give us your feedback

Figures (17)

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON MOBILE COMPUTING 1
A Tale of Ten Cities: Characterizing
Signatures of Mobile Traffic in Urban Areas
A. Furno, Member, IEEE, M. Fiore, Member, IEEE, R. Stanica, C. Ziemlicki, and Z. Smoreda
Abstract—Urban landscapes present a variety of socio-topological environments that are associated to diverse human activities. As
the latter affect the way individuals connect with each other, a bound exists between the urban tissue and the mobile communication
demand. In this paper, we investigate the heterogeneous patterns emerging in the mobile communication activity recorded within
metropolitan regions. To that end, we introduce an original technique to identify classes of mobile traffic signatures that are distinctive
of different urban fabrics. Our proposed technique outperforms previous approaches when confronted to ground-truth information, and
allows characterizing the mobile demand in greater detail than that attained in the literature to date. We apply our technique to
extensive real-world data collected by major mobile operators in ten cities. Results unveil the diversity of baseline communication
activities across countries, but also evidence the existence of a number of mobile traffic signatures that are common to all studied
areas and specific to particular land uses.
Index Terms—Mobile Networks, Mobile Traffic Data Analysis, Communication Activity Profiles, Mobile Traffic Signatures, Land Use.
F
1 INTRODUCTION
It is commonly accepted that most individuals exhibit mo-
bility and activity patterns –driven by their family life,
work obligations, hobbies, occupations and personal habits,
as well as by the presence of infrastructures, services and
amenities– that are highly repetitive and yet very distinctive.
These considerations apply to mobile network subscribers
and their communication habits as well, as highlighted in a
recent, extensive survey of mobile traffic data analyses [2].
The regular but variegated behavior of mobile users leads
to heterogeneity in subscribers’ profiles [3], temporal peri-
odicity of the aggregated demand [4], load fluctuations in
presence of large-scale social events [5], or to geographic
diversity of mobile communications [6].
In this paper, we focus on this latter aspect. Specifically,
it has been shown that there exist strong relationships
between the mobile communication activity and what we
refer to as urban fabrics, i.e., the combination of infrastructure
(e.g., roads, transportation systems, and sports, education,
or healthcare facilities) and land use (e.g., residential, in-
dustrial, or commercial) that characterizes different zones
within a same metropolitan area. Important correlations
were found between the mobile demand and the underly-
ing city cartography: notable examples include the spatial
diversity of mobile activity within a conurbation [7], the
similarity of temporal dynamics of traffic in residential ar-
eas [8], or the fact that load peaks undergo geographic shifts
between precise urban areas throughout the day [9] and dur-
ing weekday-to-weekend transitions [10]. Recently, mobile
phone data was even leveraged to validate urban planning
theories on conditions that promote life in a city [11].
Motivated by these results, we delve deeper in the char-
acterization of the spatial heterogeneity of mobile commu-
nication activities. More precisely, we inherit the notion of
mobile traffic signature to denote the typical activity pattern of
This work is an extended version of a IEEE/ACM ASONAM 2015 paper [1].
A. Furno is with IFSTTAR-ENTPE, Universit´e de Lyon, F-69675 Bron,
France. E-mail: angelo.furno@ifsttar.fr
M. Fiore is with CNR, 10129 Torino, Italy. E-mail: marco.fiore@ieiit.cnr.it
R. Stanica is with Univ Lyon, INSA Lyon, Inria, CITI, F-69621 Villeurbanne,
France. E-mail: razvan.stanica@insa-lyon.fr
C. Ziemlicki and Z. Smoreda are with SENSE, Orange Labs, F-92794 Issy-
les-Moulineaux, France. E-mail: name.surname@orange.com
the mobile demand at one specific geographic zone [12]. We
find that such signatures can provide an evident association
of prototypical mobile communication dynamics to precise
urban fabrics. Moreover, many of these signatures appear
to be general in nature, since they emerge in different cities
and countries. The work leading to these conclusions yields
a number of original contributions, summarized as follows.
i) We propose a novel methodology to construct mobile
traffic signatures and classify them, which builds on the
understanding and refinement of previous approaches. We
demonstrate the superiority of our approach over state-of-
the-art solutions, by showing that it creates mobile demand
profiles that better agree with land use ground-truth data.
ii) We apply our proposed methodology to real-world
mobile traffic data collected by major network operators in
ten different cities during several months. This is a much
larger dataset than those employed in previous analyses: it
allows generalizing our results, and investigating similari-
ties and diversity across a substantial set of different cities.
iii) The reference signatures we obtain characterize the
geographic diversity of mobile communications with a sig-
nificantly higher granularity than previously achieved in the
literature. Related works only investigate five to ten major
profiles of mobile traffic activity in the cities they studied; in-
stead, we identify tens of signatures in each city. We discuss
a relevant subset of signatures that characterize common
dynamics typical of many city neighborhoods, as well as
peculiar behaviors pinpointing specific city locations.
iv) We identify the baseline signatures of mobile traffic
that can be associated to residential urban fabrics in two
different countries. Our results highlight the significant dis-
similarity emerging between countries in this regard.
v) We identify a number of distinctive signatures that are
linked to particular urban fabrics, such as offices, universi-
ties, industrial areas, transportation hubs or leisure centers.
Interestingly, we find many of such mobile demand patterns
to be consistent across countries, proving that the inter-
country diversity observed in residential behaviors tends to
disappear when focusing on precise human activities.
2 RELATED WORK
The dynamics of human presence in an urban area are inher-
ent to the notion of urban fabric: a city is composed of fun-

IEEE TRANSACTIONS ON MOBILE COMPUTING 2
tionally diverse regions, and inhabitants move among them
according to regular patterns, so as to carry out activities
related to their destination region [13]. Using information
on human presence and activity allows then telling apart
the functional regions in a city, as shown by Yuan et al. [14],
who exploited to that end geo-localized taxi trips.
Similarly, there exists an intertwining between urban
fabrics and mobile network usage. This is an intuitive phe-
nomenon that has been known for a long time, and which
mobile operators are keen to take into account during access
network planning [15], [16]. Yet, the phenomenon is far from
being completely understood, since the exact influences of
urban fabrics on the behavior of mobile network customers
are not easily characterized. This has led to the emergence of
a significant literature specifically addressing this problem.
In a seminal work, Girardin et al. [12] introduce the no-
tion of mobile traffic signatures, i.e., condensed representa-
tions of the typical mobile demand dynamics observed in a
given geographical region. They demonstrate that different
urban fabrics can indeed generate diverse mobile traffic
signatures: specifically, the work focuses on the mobile
phone activity of roaming users, used to detect touristic
areas in Rome, Italy. The approach is then elaborated by
other studies. Calabrese et al. [21] use Wi-Fi associations
of staff and students to map activity areas within the MIT
campus. Becker et al. [17] study mobile traffic signatures in
the city of Morristown, NJ, USA, and identify differences
between the demand in downtown and that in high school
areas. In a larger-scale study, Toole et al. [18] analyze the
entire conurbation of Boston, MA, USA. The authors use
information on five land use types (residential, commercial,
industrial, parks, and others) as ground truth, and associate
each cellular base station with one of these areas. They then
build signatures for the five surfaces through a random
forest classifier: this allows predicting land use from mo-
bile traffic signatures with 57% accuracy. These studies all
focus on specific scenarios, whereas we aim at associating
accurate signatures to an exhaustive set of urban fabrics.
Moreover, most of the works above make use of accurate
ground truth information to train mobile traffic signatures
for specific land use zones. Our proposed solution is instead
unsupervised, and does not require a-priori ground truth.
Recently, an original spectral analysis approach is taken
by Secchi et al. [19], who apply wavelet transforms and
principal component analysis to mobile traffic signatures
in the urban area of Milan, Italy. This allows drawing
heatmaps of the most significant human activities in the city.
This study, strongly focused on activity characterization,
is complementary to ours, which provides instead a fine-
grained representation of the prototypical demand observed
over space. Another recent work that is orthogonal to ours is
that by Lenormand et al. [20], who aim at understanding the
scaling laws of (possibly mixed) land uses and at modelling
them through theoretical characterization. They find inter-
esting properties for four land uses in Spain; instead, our
approach unveils a much higher variety (i.e., tens) of land
uses without any inclination towards theoretical modelling.
Three works are the closest to ours. Soto et al. [22],
Grauwin et al. [23] and Cici et al. [24] all propose traffic
signature clustering techniques, and apply them to urban-
scale scenarios. We detail their proposed methods in Sec. 3.1,
Sec. 3.2, and Sec. 3.3, respectively, and show that our
approach outperforms them in Sec. 4. Moreover, our sig-
nature analysis in Sec. 5 builds on a much larger dataset
than those considered in previous works. This allows for
unprecedented detail and generality of the characterization.
3 MOBILE TRAFFIC SIGNATURES
Let us consider a generic dataset D, describing the commu-
nication activity of a mobile subscriber population during a
set of days d = {d}. For each day, the mobile demand is
stored as the aggregate of the traffic generated by all users
in a same area during a given time interval; the size of the
area and duration of the interval determine the spatial and
temporal granularity of the dataset, respectively. We name
unit area the spatial aggregation level: the whole geographic
region under consideration a = {a} is thus divided
1
into
unit areas a. The time granularity is instead characterized
by the duration of a time slot, i.e., the interval during which
user activity is aggregated in each unit area. Each day d d
is thus split into a set t = {t} of time slots t. Overall,
D = {v
a
(d, t)}, where every element v
a
(d, t) describes the
total mobile communication activity within each unit area a
at time slot t of day d.
The techniques for the construction of a representative
set of mobile traffic signatures process the dataset D through
six phases. These phases aim at: (i) summarizing the mobile
traffic activity in each unit areas into a meaningful profile,
i.e., the unit area signature (first three phases); (ii) grouping
similar unit area signatures into a limited set of classes, each
exhibiting a unique behavior (last three phases). Next, we
discuss each phase in detail.
1. The signature metric indicates the nature of subscriber
activity to be represented. Examples of metrics are
the number or duration of voice calls, the number of
short text messages (SMS), the volume of Internet data
traffic, or the kind of mobile services consumed by the
users. The metric controls the actual information in each
dataset entry v
a
(d, t).
2. The signature support is the time interval over which the
signature is defined. Denoted as a set of days δ = {δ},
the support entails the level of compression of the
data into the signature. It can range from a couple
of days (implying a high level of compression, since
datasets typically span weeks or months) to the entire
observation period, i.e., δ = d (no compression).
3. The data denoising component extracts information
deemed to be representative of the typical mobile traffic
activity in a unit area, isolating it from the inherent
noise in the data. In cases where the signature support
is smaller than the observation period, implicit denois-
ing is realized through compression, which increases
data robustness by merging multiple v
a
(d, t) samples
into a single value.
4. The signature normalization makes signatures inde-
pendent from the absolute volume of mobile traffic
recorded at a unit area. This allows comparing the
mobile communication activity at different unit areas
on the sole basis of the mobile demand variations.
5. The signature pairwise distance measure determines the
degree of similarity of two signatures.
6. The signature clustering algorithm groups together sig-
natures that are alike, leveraging the distance measure
above. Ultimately, this last phase returns a set of classes
of archetypal signatures, denoted as c. Each class c c
maps to a distinct type of human activity.
In Sec. 3.1–Sec. 3.3, we will survey the current state-of-
the-art definitions for mobile traffic signatures provided by
1. The definition of unit area is general, and can accommodate any
tessellation of space. Unit areas can map to, e.g., cell sector boundaries,
coverage zones of base stations, Voronoi cells, or elements of a grid.

IEEE TRANSACTIONS ON MOBILE COMPUTING 3
Soto et al. [22], Grauwin et al. [23], and Cici et al. [24]. We
will then introduce our own definition in Sec. 3.4.
3.1 Weekday-Weekend Signature (WWS)
In the definition by Soto et al. [22], mobile traffic signatures
correspond to the average voice and text volume observed
during (i) a typical working day, and (ii) a typical weekend
day. Thus, in this case, the signature metric is the sum of
voice and text volumes
2
, and the signature support is two
days, i.e., δ = {WD,WE}. We will thus refer to this approach
as Weekday-Weekend Signature (WWS).
Formally, the set of days d is split into two sets d
WD
and
d
WE
, which contain all Mondays-to-Fridays, and all Satur-
days and Sundays, respectively. Then, the generic element
in the signature of a unit area a is
s
a
(WD, t) =
1
|d
WD
|
X
dd
WD
v
a
(d, t), a a, (1)
for time slots t during working days, and
s
a
(WE, t) =
1
|d
WE
|
X
dd
WE
v
a
(d, t), a a, (2)
for time slots t during weekends.
We remark that this approach induces a significant level
of compression, squeezing months of data into a two-day
support. Thus, further data denoising is unnecessary. The
signature of a is built from the elements in (1) and (2) as
s
a
= k
δδ
k
tt
s
a
(δ, t)
!
, a a. (3)
In (3), k indicates the time-ordered concatenation of all
elements in a set: s
a
is thus the concatenation of all elements
computed at every time slot during the average working
day and the average weekend day.
Signatures then undergo a standard score normalization.
To that end, each element obtained in (1) and (2) is normal-
ized with respect to the mean and standard deviation of
all elements referring to the same unit area. Formally, for a
generic element of unit area a
ˆs
a
(δ, t) =
s
a
(δ, t) µ(s
a
)
σ(s
a
)
, δ δ, t t, a a, (4)
where µ(s
a
) and σ(s
a
) denote the mean and standard de-
viation of the set of elements concatenated in the signature
s
a
. Then, the normalized signature
ˆ
s
a
is simply obtained by
concatenation of ˆs
a
(δ, t) for all δ δ and t t, as in (3).
As far as the similarity between signatures is concerned,
WWS considers a simple Euclidean distance. Given the
signatures of unit areas a and a
0
, their distance is
a,a
0
=
s
X
δδ
X
tt
(ˆs
a
(δ, t) ˆs
a
0
(δ, t))
2
, a, a
0
a. (5)
Finally, the clustering of signatures is performed by
running a k-means algorithm over the set of all signatures
2. As shown throughout this Section, all proposed definitions of
mobile traffic signatures leverage voice and text activity, while they
discard data traffic. The reason is that the former are an excellent proxy
of human endeavors and thus of the urban fabrics that affect them.
Instead, data traffic is often generated autonomously by applications
running or updating in background, and it is thus less representative
of the actual occupations of the user.
ˆ
s
a
, a a, using (5) as the k-means distance measure. The
algorithm requires the parametrization of k, i.e., the desired
number of signature classes: WWS selects k according to the
validity index proposed in [25]. In all original case studies,
the best results are always obtained with k = 5.
3.2 Typical Week Signature (TWS)
Grauwin et al. [23] propose a variation of WWS, named
Typical Week Signature (TWS). Also in this case, the sig-
nature metric adds up voice and text volumes. However,
the support is one week, from Monday to Sunday, i.e.,
δ = {MON,TUE,WED,THU,FRI,SAT,SUN}. Let us denote as
d
δ
d the set of days in the dataset D that correspond
to the day of the week δ, with
S
δδ
d
δ
= d. For instance,
d
MON
groups all Mondays in the dataset. Then, the generic
element in the signature of unit area a is
s
a
(δ, t) = µ
n
v
a
(d, t) | d d
δ
o
, a a, (6)
for time slots t during day δ. We recall that µ(·) represents
the mean of the set within parentheses. Also in this case,
δ is small with respect to the overall set of days d, which
implies high compression and makes denoising pointless.
Signatures are then obtained by concatenation of time-
ordered elements, through (3). The normalization procedure
is different from WWS, as each element is normalized with
respect to a signature average, as
ˆs
a
(δ, t) =
s
a
(δ, t)
µ({s
a
(δ, t)| δ δ, t t})
, δδ, tt, aa. (7)
Normalized signatures are again clustered with a k-
means algorithm using (5) as the distance measure. In this
regard, the only difference from the WWS approach is
that the choice of k is guided by the local maxima of the
Silhouette Index [26]. This leads to a value k = 6 as the best
choice in all considered scenarios.
3.3 Seasonal Communication Series (SCS)
The solution by Cici et al. [24], named Seasonal Communica-
tion Series (SCS), considers the whole timeserie in each unit
area. In other words, δ = d, and the signature of area a is
s
a
= k
dd
k
tt
s
a
(d, t)
!
, a a, (8)
where s
a
(d, t) = v
a
(d, t), a a, d d, t t, and v
a
(d, t)
corresponds to the volume of voice call and text messages.
In such a definition, the number of elements that com-
pose a signature is not fixed, but depends on the timespan of
the dataset D. Also, (8) does not involve any compression,
which calls for denoising: to that end, SCS applies a Fast
Fourier Transform (FFT) to the signature, so as to clean it
from irregular patterns. More precisely, once converted to
the frequency domain with FFT, only the highest power
frequencies are kept, and the time signal is reconstructed
with an Inverse FFT (IFFT) from the retained frequencies.
The filtering returns a so-called seasonal (i.e., typical) com-
ponent of the original signature.
Normalization of whole-time series filtered signatures is
then performed using the standard-score approach in (4),
where, clearly, δ = d.

IEEE TRANSACTIONS ON MOBILE COMPUTING 4
The pairwise signature similarity is based on the Pearson
correlation coefficient, which, for two unit areas a and a
0
, is
C
a,a
0
=
P
δδ
P
tt
(ˆs
a
(δ, t)µ(
ˆ
s
a
))(ˆs
a
0
(δ, t)µ(
ˆ
s
a
0
))
r
P
δδ
P
tt
(ˆs
a
(δ, t)µ(
ˆ
s
a
))
2
·
r
P
δδ
P
tt
(ˆs
a
0
(δ, t)µ(
ˆ
s
a
0
))
2
.
(9)
We recall that δ = d in (9). The distance measure is then
a,a
0
= 1 C
a,a
0
, a, a
0
a. (10)
Concerning signature clustering, SCS adopts an agglom-
erative hierarchical clustering, namely, the linkage cluster-
ing algorithm with average distance criterion. This hierar-
chical clustering outputs a whole family of solutions that
can be represented as a dendrogram: it thus returns a
richer information than a single-cluster set solution, as in
the case of, e.g., k-means. However, this also implies that
some criterion must be adopted to select the best clustering
among all those in the family. To that end, the skewness of
the cluster sizes is evaluated at the different levels of the
dendrogram built by the hierarchical clustering: selecting
the level with minimum skewness allows grouping unit
area signatures into classes of relatively comparable sizes.
It is important to note that, by using the lowest-skewness
criterion, the number of generated signature classes can be
high, in the order of hundreds. Since this makes the analysis
cumbersome, SCS limits the analysis to the 10 largest classes,
which they consider to represent the most relevant urban
fabrics in the considered region.
3.4 Median Week Signature (MWS)
The current approaches presented above are based on a vari-
ety of signature definitions, pairwise distance measures and
clustering approaches. Here, we introduce a novel signature
model that aims at combining the advantages of previous
proposals, while overcoming their limitations. Specifically,
our definition of a Median Week Signature (MWS) is based on
the following considerations.
First, it has been repeatedly shown that there exists a
strong weekly periodicity in human occupations [27],
[28], which implies that most of the diversity in mobile
traffic activity occurs within a one-week period. We
thus speculate that a signature describing the typical
weekly behavior of the mobile demand at one unit area
contains the vast majority of the significant information
about the nature of that area. This lets us consider
a week-long signature, avoiding dimensionality prob-
lems in presence of long time series (which can instead
affect SCS [29]), and not discarding any important
knowledge (an issue in highly-compressed WWS [24]).
Second, we deem the median to be a more reliable
statistical measure than, e.g., the average or the absolute
values, when it comes to assessing the typical activity in
mobile traffic. As a matter of fact, the median is much
more robust to outliers, which are frequent in mobile
traffic due to special events of social, political, sports,
or cultural nature [4], [7].
Third, understanding (i) whether denoising is beneficial
to the signature definition, (ii) which normalization
works the best, and (iii) how signature distance mea-
sures affect the results is not trivial. A sensible choice
needs substantial empirical tests on representative data.
The MWS is computed according to the guidelines
above, as follows. The metric is the sum of voice and
text activity volumes, as assumed by all techniques in the
literature. The support is the same considered in TWS, i.e.,
δ = {MON,TUE,WED,THU,FRI,SAT,SUN}. By using the same
notation, the element associated to time slot t of day δ δ
in the signature of unit area a is
s
a
(δ, t) = µ
1/2
n
v
a
(d, t) | d d
δ
o
, a a, (11)
where, µ
1/2
(·) represents the median of the set within
parenthesis. The MWS is then defined as the concatenation
of time-ordered samples according to (3). We remark that
this defintion realizes our first two considerations above.
Taking as a pivot the MWS model just defined, we
explore the design space of a complete solution for urban
fabric detection. This implements the last consideration
listed before, and results in the MWS variants below.
1. We assess the impact of denoising, considering both the
case where the signature is filtered via the FFT/IFFT
procedure proposed in [24] and described in Sec. 3.3,
and the case where it is used as is.
2. We evaluate two different techniques to normal-
ize MWS. One option is the standard score nor-
malization introduced above; in this case, signa-
tures are normalized according to (4), where δ =
{MON,TUE,WED,THU,FRI,SAT,SUN}. The other option is
daily normalization, where the signature element of
unit area a at time slot t of day δ is
ˆs
a
(δ, t) =
s
a
(δ, t)
P
tt
s
a
(δ, t)
, a a. (12)
The expression in (12) normalizes each element with
respect to the total activity during the weekday the
element belongs to.
3. We test the distance measures used in WWS, TWS and
SCS, i.e., the Euclidean distance in (5) and the distance
based on the Pearson correlation coefficient in (10); in
both cases δ = {MON,TUE,WED,THU,FRI,SAT,SUN}.
Finally, signature clustering is performed as in SCS,
using the agglomerative hierarchical algorithm described in
Sec. 3.3, and considering minimum skewness as the stop-
ping rule. We favor this approach over simpler ones based
on k-means, since it is fully unsupervised. Also, unlike SCS,
our MWS approach does not limit relevant classes to an
arbitrary number, but considers that all signatures covey
some unique behavior of mobile network subscribers and
thus deserve to be studied and understood.
4 COMPARATIVE EVALUATION
In this section, we provide a complete comparative evalua-
tion of the techniques for signature classification described
in Sec. 3. A summary of the different solutions we test is
provided in Tab. 1. In order to carry out our study, we gather
mobile traffic as well as ground-truth data in two major
cities in Italy, as detailed in Sec. 4.1, and we define a set
of relevant metrics, presented in Sec. 4.2. We then discuss
the results of our study in Sec. 4.3.
4.1 Comparative evaluation datasets
We consider two citywide case studies. The mobile traffic
data is provided in both scenarios by Telecom Italia Mobile
(TIM), as part of their Big Data Challenge initiative [30].
The ground-truth information consists instead in land use
data retrieved from open databases of local authorities.
As a matter of fact, we assess the quality of signature

IEEE TRANSACTIONS ON MOBILE COMPUTING 5
TABLE 1
Summary of the considered techniques for the detection of classes of mobile traffic signatures.
Name Signature Filtering Normalization Distance Clustering
WWS average weekday-weekend standard score Euclidean k-means, k=5
TWS average week average rescaling Euclidean k-means, k=6
SCS whole time series FFT/IFFT standard score Pearson correlation linkage, minimum skewness
MWS-stdscr-pearson median week standard score Pearson correlation linkage, minimum skewness
MWS-stdscr-euclidean median week standard score Euclidean linkage, minimum skewness
MWS-daily-pearson median week daily Pearson correlation linkage, minimum skewness
MWS-daily-euclidean median week daily Euclidean linkage, minimum skewness
MWS-fft-stdscr-pearson median week FFT/IFFT standard score Pearson correlation linkage, minimum skewness
MWS-fft-daily-euclidean median week FFT/IFFT daily Euclidean linkage, minimum skewness
(a) Milan (b) Turin
Fig. 1. Spatial tessellation into unit areas (cells in the legend), and partial
ground-truth data for the (a) Milan and (b) Turin citywide scenarios.
Figure best viewed in colors.
classes identified via each technique by verifying their con-
gruence with the nature of the underlying urban fabrics.
This methodology is consistent with those employed in the
literature [24], and stems from the expectation that human
activities, including mobile communications, are strongly
related to the type of city facilities around them. A detailed
description of the data follows.
4.1.1 Milan
The first urban scenario is that of Milan, Italy. The mobile
traffic dataset is referred as Mi-13, and describes the com-
munication activity of TIM subscribers in the conurbation of
the city for a two-month period in November and December
2013. The dataset differentiates among incoming and out-
going calls, providing information about their number and
duration. The dataset also contains the number of received
and sent SMS and amount of Internet data traffic generated
by TIM mobile devices.
Mobile traffic information is aggregated over 10-minute
time intervals, according to a regular-cell spatial tessellation
of the surface of the city of Milan. Each cell has a 235 × 235
m
2
size, i.e., an area of 0.055 Km
2
, and maps to a unit
area in our analysis. In our study, we consider a region of
approximately 150 Km
2
containing 2726 cells. The Mi-13
dataset is the same used in [24] for the evaluation of the
SCS approach with respect to WWS. For the sake of fairness
in the comparison, here we focus on the same subset of
cell-phone activity as in [24]: a 4-week period ranging from
November 4, 2013 to December 1, 2013.
As far as ground-truth information is concerned, we
leverage the same data used in [24], which was retrieved
from publicly available databases [31]. The data conveys in-
formation on urban infrastructures and land use that can be
associated to different kinds of human activities. Specifically,
it reports, for each unit area in Milan, the number of local
inhabitants, business activities, sport centers, universities,
schools and bus stops, as well as the percentage of unit area
that is covered by green spaces, such as parks or woods.
Fig. 1a displays a map of Milan unit areas with markers
pointing out business, universities, and green spaces.
4.1.2 Turin
The second scenario we consider is that of Turin, Italy.
Mobile traffic information in the region is provided by a
dataset, referred to as Tu-15, which describes the mobile
traffic activity of TIM customers in the region during March
and April 2015. The spatial tessellation of the geographical
surface provided by the network operator is different from
that in Mi-13; cells, i.e., unit areas, are not regular, but
feature heterogeneous sizes that mimic the non-uniform
coverage provided by each base station in the region. Over-
all, the data refers to an area of approximately 150 Km
2
containing 261 unit areas whose size ranges from 255 × 325
m
2
to 2×2.5 Km
2
. For the sake of consistency with the Milan
case study, we limit our analysis to four weeks of data, from
the March 1 to March 28, 2015.
We collected ground-truth data for each unit area in Tu-
15 by leveraging open data published by the local munici-
pality [32]. We selected information related to the latitude-
longitude coordinates of schools, universities and business
activities, and we associated them to individual unit areas,
exactly as done for Milan in [24]. We also leveraged open
data on green zones and population distribution in order to
determine their presence in each unit area. Fig. 1b shows a
map of the resulting unit areas for the city of Turin, together
with a representation of ground-truth data of universities,
business activities and green zones.
4.2 Metrics
In order to evaluate the consistency of the signature classes
with respect to the ground-truth data, we introduce a set
of suitable metrics, presented next. We remark that, for the
sake of comparability of our study with previous research,
we include in our list the metrics proposed in [24].
4.2.1 Density
The density D
G
(c, c) is a measure of the frequency of
ground-truth elements of type G within a signature class
c c. Let us define as k
G
the set of elements of type G
(e.g., the set of universities) in the ground-truth data; also,
1
c
(k) is an indicator function, equal to one if a ground-truth
element k k
G
is located in a unit area whose signature
matches class c, and zero otherwise. The density is then
D
G
(c, c) =
1
|c|
X
kk
G
1
c
(k), (13)
where |c| denotes the cardinality of the class c c, i.e.,
the number of signatures of individual unit areas that it
includes. The density allows quantifying the prevalence of
some land use type G within each signature class c c.

Citations
More filters
Proceedings ArticleDOI
29 Apr 2019
TL;DR: Comparative evaluations with real-world measurement data prove that DeepCog’s tight integration of machine learning into resource orchestration allows for substantial (50% or above) reduction of operating expenses with respect to resource allocation solutions based on state-of-the-art mobile traffic predictors.
Abstract: Network slicing is a new paradigm for future 5G networks where the network infrastructure is divided into slices devoted to different services and customized to their needs. With this paradigm, it is essential to allocate to each slice the needed resources, which requires the ability to forecast their respective demands. To this end, we present DeepCog, a novel data analytics tool for the cognitive management of resources in 5G systems. DeepCog forecasts the capacity needed to accommodate future traffic demands within individual network slices while accounting for the operator’s desired balance between resource overprovisioning (i.e., allocating resources exceeding the demand) and service request violations (i.e., allocating less resources than required). To achieve its objective, DeepCog hinges on a deep learning architecture that is explicitly designed for capacity forecasting. Comparative evaluations with real-world measurement data prove that DeepCog’s tight integration of machine learning into resource orchestration allows for substantial (50% or above) reduction of operating expenses with respect to resource allocation solutions based on state-of-the-art mobile traffic predictors. Moreover, we leverage DeepCog to carry out an extensive first analysis of the trade-off between capacity overdimensioning and unserviced demands in adaptive, sliced networks and in presence of real-world traffic.

153 citations


Cites background from "A Tale of Ten Cities: Characterizin..."

  • ...However, this approach does not consider that correlations in mobile service demands at a base station level do not to depend on space, rather on land use [23]: base stations exhibiting strongly correlated network slice traffic may be far apart, e....

    [...]

Journal ArticleDOI
TL;DR: DeepCog is presented, a deep neural network architecture inspired by advances in image processing and trained via a dedicated loss function that returns a cost-aware capacity forecast, which can be directly used by operators to take short- and long-term reallocation decisions that maximize their revenues.
Abstract: The dynamic management of network resources is both a critical and challenging task in upcoming multi-tenant mobile networks, which requires allocating capacity to individual network slices so as to accommodate future time-varying service demands. Such an anticipatory resource configuration process must be driven by suitable predictors that take into account the monetary cost associated to overprovisioning or underprovisioning of networking capacity, computational power, memory, or storage. Legacy models that aim at forecasting traffic demands fail to capture these key economic aspects of network operation. To close this gap, we present DeepCog, a deep neural network architecture inspired by advances in image processing and trained via a dedicated loss function. Unlike traditional traffic volume predictors, DeepCog returns a cost-aware capacity forecast , which can be directly used by operators to take short- and long-term reallocation decisions that maximize their revenues. Extensive performance evaluations with real-world measurement data collected in a metropolitan-scale operational mobile network demonstrate the effectiveness of our proposed solution, which can reduce resource management costs by over 50% in practical case studies.

55 citations


Cites background from "A Tale of Ten Cities: Characterizin..."

  • ...level do not depend on space, rather on land use [32]: base...

    [...]

Proceedings ArticleDOI
28 Nov 2017
TL;DR: This study unveils a strong heterogeneity in the demand for different mobile services, both in time and space, by studying data collected in a 3G/4G mobile network deployed over a major European country.
Abstract: We investigate how individual mobile services are consumed at a national scale, by studying data collected in a 3G/4G mobile network deployed over a major European country. Through correlation and clustering analyses, our study unveils a strong heterogeneity in the demand for different mobile services, both in time and space. In particular, we show that: (i) somehow surprisingly, almost all considered services exhibit quite different temporal usage patterns; (ii) in contrast to such temporal behavior, spatial patterns are fairly uniform across all services; (iii) when looking at usage patterns at different locations, the average traffic volume per user is dependent on the urbanization level, yet its temporal dynamics are not. Our findings do not only have sociological implications, but are also relevant to the orchestration of network resources.

51 citations


Additional excerpts

  • ..., urban development [5] or planning [6, 7]....

    [...]

Posted Content
TL;DR: It is argued that the widely perceived health risks that are attributed to 5G are not supported by scientific evidence from communications engineering, and how the solutions to minimize the health risks are already mature and ready to be implemented.
Abstract: The deployment of 5G wireless communication services requires the installation of 5G next-generation Node-B Base Stations (gNBs) over the territory and the wide adoption of 5G User Equipment (UE). In this context, the population is concerned about the potential health risks associated with the Radio Frequency (RF) emissions from 5G equipment, with several communities actively working toward stopping the 5G deployment. To face these concerns, in this work, we analyze the health risks associated with 5G exposure by adopting a new and comprehensive viewpoint, based on the communications engineering perspective. By exploiting our background, we debunk the alleged health effects of 5G exposure and critically review the latest works that are often referenced to support the health concerns from 5G. We then precisely examine the up-to-date metrics, regulations, and assessment of compliance procedures for 5G exposure, by evaluating the latest guidelines from IEEE, ICNIRP, ITU, IEC, and FCC, as well as the national regulations in more than 220 countries. We also thoroughly analyze the main health risks that are frequently associated with specific 5G features (e.g., MIMO, beamforming, cell densification, adoption of millimeter waves, and connection of millions of devices). Finally, we examine the risk mitigation techniques based on communications engineering that can be implemented to reduce the exposure from 5G gNB and UE. Overall, we argue that the widely perceived health risks that are attributed to 5G are not supported by scientific evidence from communications engineering. In addition, we explain how the solutions to minimize the health risks from 5G are already mature and ready to be implemented. Finally, future works, e.g., aimed at evaluating long-term impacts of 5G exposure, as well as innovative solutions to further reduce the RF emissions, are suggested.

42 citations


Cites background from "A Tale of Ten Cities: Characterizin..."

  • ..., [82], [83]) have demonstrated that 4G networks are subject to strong temporal and spatial traffic variations....

    [...]

Journal ArticleDOI
26 Feb 2019-Sensors
TL;DR: Recent advances in the field of wireless positioning with focus on cooperation, mobility, and advanced array processing are reviewed, which are key enablers for the design of novel localization solutions for crowdsensing WSNs.
Abstract: The wide availability of sensing modules and computing capabilities in modern mobile devices (smartphones, smart watches, in-vehicle sensors, etc.) is driving the shift from mote-class wireless sensor networks (WSNs) to the new era of crowdsensing WSNs. In this emerging paradigm sensors are no longer static and homogeneous, but are rather worn/carried by people or cars. This results in a new type of wide-area WSN-crowd-based and overlaid on top of heterogeneous communication technologies-that paves the way for very innovative applications. To this aim, the positioning of mobile devices operating in the network becomes crucial. Indeed, the pervasive, almost ubiquitous availability of smart devices brings unprecedented opportunities but also poses new research challenges in their precise location under mobility and dense-multipath environments typical of urban and indoor scenarios. In this paper, we review recent advances in the field of wireless positioning with focus on cooperation, mobility, and advanced array processing, which are key enablers for the design of novel localization solutions for crowdsensing WSNs.

41 citations

References
More filters
Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"A Tale of Ten Cities: Characterizin..." refers methods in this paper

  • ...In this regard, the only difference from the WWS approach is that the choice of k is guided by the local maxima of the Silhouette Index [26]....

    [...]

Journal ArticleDOI
TL;DR: The OpenStreetMap project is a knowledge collective that provides user-generated street maps that follow the peer production model that created Wikipedia; its aim is to create a set of map data that's free to use, editable, and licensed under new copyright schemes.
Abstract: The OpenStreetMap project is a knowledge collective that provides user-generated street maps. OSM follows the peer production model that created Wikipedia; its aim is to create a set of map data that's free to use, editable, and licensed under new copyright schemes. A considerable number of contributors edit the world map collaboratively using the OSM technical infrastructure, and a core group, estimated at approximately 40 volunteers, dedicate their time to creating and improving OSM's infrastructure, including maintaining the server, writing the core software that handles the transactions with the server, and creating cartographical outputs. There's also a growing community of software developers who develop software tools to make OSM data available for further use across different application domains, software platforms, and hardware devices. The OSM project's hub is the main OSM Web site.

2,487 citations

Proceedings ArticleDOI
Jing Yuan1, Yu Zheng1, Xing Xie1
12 Aug 2012
TL;DR: This paper proposes a framework (titled DRoF) that Discovers Regions of different Functions in a city using both human mobility among regions and points of interests (POIs) located in a region.
Abstract: The development of a city gradually fosters different functional regions, such as educational areas and business districts. In this paper, we propose a framework (titled DRoF) that Discovers Regions of different Functions in a city using both human mobility among regions and points of interests (POIs) located in a region. Specifically, we segment a city into disjointed regions according to major roads, such as highways and urban express ways. We infer the functions of each region using a topic-based inference model, which regards a region as a document, a function as a topic, categories of POIs (e.g., restaurants and shopping malls) as metadata (like authors, affiliations, and key words), and human mobility patterns (when people reach/leave a region and where people come from and leave for) as words. As a result, a region is represented by a distribution of functions, and a function is featured by a distribution of mobility patterns. We further identify the intensity of each function in different locations. The results generated by our framework can benefit a variety of applications, including urban planning, location choosing for a business, and social recommendations. We evaluated our method using large-scale and real-world datasets, consisting of two POI datasets of Beijing (in 2010 and 2011) and two 3-month GPS trajectory datasets (representing human mobility) generated by over 12,000 taxicabs in Beijing in 2010 and 2011 respectively. The results justify the advantages of our approach over baseline methods solely using POIs or human mobility.

1,050 citations


"A Tale of Ten Cities: Characterizin..." refers background in this paper

  • ...[14], who exploited to that end geo-localized taxi trips....

    [...]

Proceedings Article
Sid Ray1, Rose H Turi1
01 Jan 2000
TL;DR: This paper presents a simple validity measure based on the intra-clusters and inter-cluster distance measures which allows the number of clusters to be determined automatically and is tested for synthetic images for which theNumber of clusters in known, and is also implemented for natural images.
Abstract: The main disadvantage of the k-means algorithm is that the number of clusters, K, must be supplied as a parameter. In this paper we present a simple validity measure based on the intra-cluster and inter-cluster distance measures which allows the number of clusters to be determined automatically. The basic procedure involves producing all the segmented images for 2 clusters up to Kmax clusters, where Kmax represents an upper limit on the number of clusters. Then our validity measure is calculated to determine which is the best clustering by finding the minimum value for our measure. The validity measure is tested for synthetic images for which the number of clusters in known, and is also implemented for natural images.

649 citations

Journal ArticleDOI
TL;DR: Using an algorithm to analyze opportunistically collected mobile phone location data, the authors estimate weekday and weekend travel patterns of a large metropolitan area with high accuracy.
Abstract: Using an algorithm to analyze opportunistically collected mobile phone location data, the authors estimate weekday and weekend travel patterns of a large metropolitan area with high accuracy.

473 citations


"A Tale of Ten Cities: Characterizin..." refers background in this paper

  • ...First, it has been repeatedly shown that there exists a strong weekly periodicity in human occupations [27], [28], which implies thatmost of the diversity inmobile traffic activity occurs within a one-week period....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions in "A tale of ten cities: characterizing signatures of mobile traffic in urban areas" ?

In this paper, the authors investigate the heterogeneous patterns emerging in the mobile communication activity recorded within metropolitan regions. To that end, the authors introduce an original technique to identify classes of mobile traffic signatures that are distinctive of different urban fabrics. Their proposed technique outperforms previous approaches when confronted to ground-truth information, and allows characterizing the mobile demand in greater detail than that attained in the literature to date. Results unveil the diversity of baseline communication activities across countries, but also evidence the existence of a number of mobile traffic signatures that are common to all studied areas and specific to particular land uses. 

An interesting side effect of mobile device pervasiveness is the possibility of analyzing datasets collected by network operators for fine-grained analyses of subscribers ’ endeavors.