scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks

18 Jun 2012-pp 132-141
TL;DR: The basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.
Abstract: Opportunistic mobile networks consist of personal mobile devices which are intermittently connected with each other. Data access can be provided to these devices via cooperative caching without support from the cellular network infrastructure, but only limited research has been done on maintaining the freshness of cached data which may be refreshed periodically and is subject to expiration. In this paper, we propose a scheme to efficiently maintain cache freshness. Our basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner. Probabilistic replication methods are also proposed to analytically ensure that the freshness requirements of cached data are satisfied. Extensive trace driven simulations show that our scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users.

Summary (5 min read)

Introduction

  • In recent years, personal hand-held mobile devices such as smartphones are capable of storing, processing and displaying various types of digital media contents including news, music, pictures or video clips.
  • In these networks, it is generally difficult to maintain end-toend communication links among mobile users.
  • There is only limited research on maintaining the freshness of cached data in the network, despite the fact that media contents may be refreshed periodically.
  • The authors basic idea is to organize the caching nodes1 as a tree structure during data access, and let each caching node be responsible for refreshing the data cached at its children in a distributed and hierarchical manner.
  • The rest of this paper is organized as follows.

A. Models

  • Opportunistic contacts among nodes are described by a network contact graph 𝐺(𝑉,𝐸), where the contact process between a node pair 𝑖, 𝑗 ∈ 𝑉 is modeled as an edge 𝑒𝑖𝑗 ∈ 𝐸. Similar to previous work [1], [34], the authors consider the pairwise node inter-contact time as exponentially distributed, also known as 1) Network Model.
  • There are cases where an application might have specific requirements on Δ and 𝑝 to achieve sufficient levels of data freshness.
  • Letting 𝑢𝑖𝑗 denote the update of data from version 𝑖 to version 𝑗, the authors assume that any caching node is able to refresh the cached data as 𝑑𝑖⊗𝑢𝑖𝑗 → 𝑑𝑗 , where 𝑑𝑖 and 𝑑𝑗 denote the data with version 𝑖 and 𝑗, respectively.
  • 𝑑𝑗 cannot be refreshed to 𝑑𝑘 by 𝑢𝑖𝑘 even if 𝑗 > 𝑖.

B. Caching Scenario

  • Mobile nodes share data generated by themselves or obtained from the Internet.
  • Each cached data item is associated with a finite lifetime and is automatically removed from cache when it expires.
  • In practice, when multiple data items with varied popularity compete for the limited buffer of caching nodes, more popular data is prioritized to ensure that the cumulative data access delay is minimized.
  • After having its query satisfied by 𝑆, 𝐴 may lose its connection with 𝑆 due to mobility, and hence 𝐴 is unaware of the data cached at nodes 𝐵, 𝐷 and 𝐸.

C. Basic Idea

  • The authors basic idea for maintaining cache freshness is to refresh the cached data in a distributed and hierarchical manner.
  • Particularly, the topology of DAT may change due to the expiration of cached data.
  • When node 𝐴 contacts node 𝐷 at time 𝑡6, 𝐴 updates the data cached at 𝐷 from 𝑑1 to 𝑑3.
  • Instead, 𝐴 has to transmit the complete data 𝑑3 to 𝐷 with 2The update 𝑢13 can only be calculated using 𝑑1 and 𝑑3.

IV. REFRESHING PATTERNS OF WEB CONTENTS

  • The authors investigate the refreshing patterns of realistic web contents, as well as their temporal variations during different time periods in a day.
  • These patterns highlight the homogeneity of data refreshing behaviors among different data sources and categories, and suggest appropriate calculation of utilities of data updates for refreshing cached data.

B. Distribution of Inter-Refreshing Time

  • The authors provide both empirical and analytical evidence of a dichotomy in the Complementary Cumulative Distribution Function (CCDF) of the inter-refreshing time, which is defined Fig. 1) Aggregate distribution: Figure 4 shows the aggregate CCDF of inter-refreshing time for all the RSS feeds, in loglog scale.
  • For the remaining 10% of inter-refreshing time with values larger than the boundary, the CCDF values exhibit linear decay which suggests a power-law tail.
  • A similar test is performed on the inter-contact times with larger values for the generalized Pareto distribution.
  • The significance levels (𝛼) for these null hypotheses being accepted are listed in Table II.

C. Temporal Variations

  • Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day.
  • The authors evaluate such temporal variation on the DieselNet trace.
  • In general, the temporal skewness can be found in all three evaluation metrics, and is determined by the temporal distributions of both node contacts and data updates available during different hours in a day.
  • As shown in Figure 14(a), the refreshing ratio during the time period between 8AM and 4PM is generally higher than the average refreshing ratio, because majority of node contacts have been generated during this time period according to [15].
  • In summary, the authors conclude that the transient performance of maintaining cache freshness differs a lot from the cumulative maintenance performance, and cache freshness can be further improved by appropriately exploiting the temporal variations of data refreshing pattern and node contact process.

A. Utility of Data Updates

  • In practice, the requirement of cache freshness may not be satisfied due to the limited nodes’ contact capability.
  • When a node 𝐵 in the DAT maintains the data update for its child 𝐷, it calculates the utility of this update which is equal to the probability that this update carried by 𝐵 satisfies the freshness requirement for data cached at𝐷.
  • According to Eq. (3), the utility should be calculated following Eq. (4) when the value of 𝑡−𝑡0−Δ is small.

B. Opportunistic Replication of Data Updates

  • If a node in the DAT finds out that the utility of the data update it carries is lower than the required probability 𝑝 for maintaining cache freshness, it opportunistically replicates the data update to other nodes outside of the DAT.
  • Such a replication process is illustrated in Figure 8.
  • 𝑅𝑘 outside of the DAT, it determines whether to replicate the data update for refreshing 𝐵 to 𝑅𝑘.
  • The replication when the utilities of data update at the 𝑘 selected relays satisfy 1− 𝑘∏ 𝑖=0 (1− 𝑈𝑅𝑖) ≥ 𝑝, (7) i.e., the probability that the requirement of cache freshness at 𝐵 is satisfied by at least one relay is equal to or larger than 𝑝.
  • Note that the selected relays are only able to refresh the specific data cached in the DAT, but are unable to provide data access to other nodes outside of the DAT.

VI. OPPORTUNISTIC REFRESHING

  • In addition to intentionally refreshing data cached at its children in the DAT, a node also refreshes other cached data with older versions whenever possible upon opportunistic contacts.
  • The authors propose a probabilistic approach to efficiently make cache refreshing decisions and optimize the tradeoff between cache freshness and network transmission overhead.

A. Probabilistic Decision

  • Opportunistic refreshing is generally more expensive because the complete data usually needs to be transmitted, and its size is much larger than that of data update.
  • As a result, it is important to make appropriate decisions on opportunistic refreshing, so as to optimize the tradeoff between cache freshness and network transmission overhead, and to avoid inefficient consumption of network resources.
  • The authors propose a probabilistic approach to efficiently refresh the cache data, and the data is only refreshed if its required freshness cannot be satisfied by intentional refreshing.
  • Hence, 𝑈𝐵𝐷(𝑡𝐶) can be calculated by 𝐷 and is available to 𝐴 when 𝐴 contacts 𝐷. Since additional relays may be used for delivering data updates in intentional refreshing as described in Section V-B, the utility 𝑈𝐵𝐷(𝑡𝐶) calculated by 𝐷 essentially provides a lower bound on the actual effectiveness of intentional refreshing.

B. Side-Effect of Opportunistic Refreshing

  • Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness.
  • Such side-effect is illustrated in Figure 9.
  • When 𝐴 opportunistically contacts node 𝐷 and refreshes 𝐷’s cached data from 𝑑1 to 𝑑3, it is unaware of the data cached at 𝐵 with a newer version 𝑑4.

VII. PERFORMANCE EVALUATIONS

  • The authors compare the performance of their proposed cache refreshing scheme with the following schemes: ∙ Passive Refreshing: a caching node only refreshes data cached at another node upon contact.
  • It is different from their opportunistic refreshing scheme in Section VI in that it does not consider the tradeoff between cache freshness and network transmission overhead.
  • Every time when the source updates data, it actively disseminates the date update to the whole network, also known as ∙ Active Refreshing.
  • The following metrics are used for evaluations.
  • Each simulation is repeated multiple times with random data sources and user queries for statistical convergence.

A. Simulation Setup

  • The authors evaluations are conducted on two realistic opportunistic mobile network traces, which record contacts among users carrying Bluetooth-enabled mobile devices.
  • These devices periodically detect their peers nearby, and a contact is recorded when two devices move close to each other.
  • The datasets described in Section IV are exploited to simulate the data being cached in the network, as well as the interrefreshing time of data.
  • Since the pairwise node contact frequency is generally lower than the data refreshing frequency, the authors pick up the 4 RSS feeds listed in Table I with average interrefreshing time longer than 0.5 hours for their evaluations.
  • Every time 𝑇 , each node determines whether to request data 𝑗 with probability 𝑃𝑗 .

B. Performance of Maintaining Cache Freshness

  • The authors first compare the performance of their proposed hierarchical refreshing scheme with other schemes by varying the lifetime (𝐿) of the cached data.
  • The evaluation results are shown in Figure 11.
  • Active Refreshing outperforms their scheme by 10%-15%, but Figure 11(c) shows that such performance is achieved at the cost of much higher refreshing overhead.
  • The parameter values are set by default as Δ = 1.5 hours and 𝑝 = 60%, and are varied during different simulations.
  • As described in Section V-B, increasing 𝑝 stimulates the caching nodes to replicate data updates, and hence increases the refreshing overhead as shown in Figure 13(b).

VIII. CONCLUSION

  • The authors focus on maintaining the freshness of cached data in opportunistic mobile networks.
  • The authors basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.
  • Based on the experimental investigation results on the refreshing patterns of real websites, the authors probabilistically replicate data updates, and analytically ensure that the freshness requirements of cached data are satisfied.
  • The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Distributed Maintenance of Cache Freshness in
Opportunistic Mobile Networks
Wei Gao and Guohong Cao
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802
{weigao,gcao}@cse.psu.edu
Mudhakar Srivatsa and Arun Iyengar
IBM T. J. Watson Research Center
Hawthorne, NY 10532
{msrivats, aruni}@us.ibm.com
Abstract—Opportunistic mobile networks consist of personal
mobile devices which are intermittently connected with each
other. Data access can be provided to these devices via cooperative
caching without support from the cellular network infrastructure,
but only limited research has been done on maintaining the
freshness of cached data which may be refreshed periodically
and is subject to expiration. In this paper, we propose a scheme
to efciently maintain cache freshness. Our basic idea is to let
each caching node be only responsible for refreshing a specic
set of caching nodes, so as to maintain cache freshness in a
distributed and hierarchical manner. Probabilistic replication
methods are also proposed to analytically ensure that the fresh-
ness requirements of cached data are satised. Extensive trace-
driven simulations show that our scheme signicantly improves
cache freshness, and hence ensures the validity of data access
provided to mobile users.
I. INT RODUCTION
In recent years, personal hand-held mobile devices such
as smartphones are capable of storing, processing and dis-
playing various types of digital media contents including
news, music, pictures or video clips. It is hence important
to provide efcient data access to m obile users with such
devices. Opportunistic mobile networks, which are also known
as Delay Tolerant Networks (DTNs) [13] or Pocket Switched
Networks (PSNs) [20], are exploited for providing such data
access without support of cellular network infrastructure. In
these networks, it is generally difcult to maintain end-to-
end communication links among mobile users. Mobile users
are only intermittently connected when they opportunistically
contact, i.e., moving into the communication range of the
short-range radio (e.g., Bluetooth, WiFi) of their smartphones.
Data access can be provided to mobile users via cooperative
caching. More specically, data is cached at mobile devices
based on the query history, so that queries for the data in
the future can be satised with less delay. Currently, research
efforts have been focusing on determining the appropriate
caching locations [27], [19], [17] or the optimal caching
policies for minimizing the data access d elay [28], [22].
However, there is only limited research on ma intaining the
freshness of cached data in the network, despite the fact that
media contents may be r efreshed periodically. I n practice, the
This work was supported in part by the US National Science Foundation
(NSF) under grant number CNS-0721479, and by Network Science CTA under
grant W911NF-09-2-0053.
refreshing frequency varies according to the specic content
characteristics. For example, the local weather report is usually
refreshed daily, but the media news at websites of CNN or
New York Times may be refreshed hourly. In such cases, the
versions of cached data in the network may be out-of-date, or
even be completely useless due to expiration.
The maintenance of cache freshness in opportunistic mo-
bile networks is challenging due to the intermittent network
connectivity and subsequent lack of information about cached
data. First, there may be multiple data copies being cached in
the network, so as to ensure timely response to user queries.
Without persistent network connectivity, it is generally difcult
for the data source to obtain information about the caching
locations or current versions of the cached data. It is therefore
challenging for the data source to determine “where to” and
“how to” refresh the cached data. Second, the opportunistic
network conn ectivity in creases the uncertainty of data trans-
mission and complicates the estimation of data transmission
delay. It is therefore difcult to determine whether the cached
data can be refreshed on time.
In this paper, we propose a scheme to address these chal-
lenges and to efciently maintain freshness of the cached data.
Our basic idea is to organize the caching nodes
1
as a tree
structure during data access, and let each caching node be
responsible for refreshing the data cached at its children in
a distributed and h ierarchical manner. The cache freshness
is also improved when the caching nodes opportunistically
contact each other. To the best of our knowledge, our work
is the rst which specically focuses on cache freshness in
opportunistic mobile networks.
Our detailed contributions are as follows:
We investigate the refreshing patterns of realistic web
contents. We observe that the distributions of inter-
refreshing time of the RSS feeds from major news
websites exhibit hybrid characteristics of exponential and
power-law, which have been validated by both empirical
and analytical evidences.
Based on the experimental investigation results, we ana-
lytically measure the utility of data updates for refreshing
the cached data via opportunistic node contacts. These
1
In the rest of this paper, the terms “devices” and “nodes” are used
interchangeably.

utilities are calculated based on a probabilistic model to
measure cache freshness. They are then used to oppor-
tunistically replicate data updates and analytically ensure
that the freshness requirements of cached data can be
satised.
The rest of this p aper is organized as follows. In Section
II we briey review the existing work. Section III provides
an overview about the models an d caching scenario we use,
and also highlights our basic idea. Section IV presents our
experimen tal investigation results on the refreshing patterns
of real web sites. Sections V and VI describe the d etails of
our proposed cache refreshing schemes. The results of trace-
driven performance evaluations are shown in Section VII, and
Section VIII concludes the paper.
II. R
ELATED WORK
Due to the intermittent network connectivity in opportunistic
mobile networks, data is forwarded in a “carry-and-forward”
manner. Node mobility is exploited to let nodes physically
carry data as relays, and forward data opportunistically when
contacting others. The key problem is hence how to select the
most appropriate nodes as relays, based o n the prediction o f
node contacts in the future. Some forwarding schemes do such
prediction based on node mobility patterns [9], [33], [14]. In
some other schemes [4], [1], stochastic node contact process is
exploited for better prediction accuracy. Social contact patterns
of mobile users, such as centr ality and community structures,
have also been exploited for relay selection [10], [21], [18].
Based on this opportunistic communication paradigm, data
access can be provided to mobile users in various ways. In
some schemes [23], [16], data is actively disseminated to
specic users based on their interest proles. Publish/subscribe
systems [32], [24] are also used for data dissemination by ex-
ploiting social commun ity structures to determine the brokers.
Caching is another way to provide data access. Determining
appropriate caching policies in opportunistic mobile networks
is complicated by the lack of global network information.
Some research efforts focus on improving data accessibility
from infrastructure networks such as WiFi [19] or Internet
[27], and some others study peer-to-peer data sharing among
mobile nodes. In [17], data is cached at specic nodes which
can be easily accessed by others. In [28], [22], caching policies
are dynamically determined based on data importance, so that
the aggregate utility of mobile nodes can be maximized.
When the versions of cached data in the network are het-
erogeneous and different from that of the source data, research
efforts have been focusing on maintaining the consistency of
these cache versions [7], [11], [5], [6]. Being different from
existing work, in this paper we focus on ensuring the freshness
of cached data, i.e., the version of any cached data should be
as close to that of the source data as possible. [22] discussed
the practical scenario in which data is periodically refreshed,
but did not provided specic solutions for maintaining cache
freshness. We propose methods to maintain cache freshness in
a distributed and hierarchical mann e r, and analytically ensure
that the freshness requirement of cached data can be satised.
Fig. 1. Data Access T ree (DAT). Each node in the DAT accesses data when
it contacts its parent node in the DAT.
III. OVERVIEW
A. Models
1) Network Model: Opportunistic contacts among nodes
are described by a network contact graph 𝐺(𝑉,𝐸),wherethe
contact process between a node pair 𝑖, 𝑗 𝑉 is modeled as
an edge 𝑒
𝑖𝑗
𝐸. The characteristics of an edge 𝑒
𝑖𝑗
𝐸
are determined by the properties of inter-contact time among
nodes. Similar to previous work [1], [34], we consider the
pairwise node inter-contact time as exponentially distributed.
Contacts between nodes 𝑖 and 𝑗 then form a Poisson process
with con tact rate 𝜆
𝑖𝑗
, which is calculated in real time from the
cumulative contacts b etween nodes 𝑖 and 𝑗.
2) Cache Freshness Model: We focus on ensuring the
freshness of cached data, i.e., the version of any cached data
should be as close to that of the source data as possible. Letting
𝑣
𝑡
𝑆
denote the version number of source data at time 𝑡 and 𝑣
𝑡
𝑗
denote that of data cached at node 𝑗, our requirement on cache
freshness is probabilistically described as
(𝑣
𝑡
𝑗
𝑣
𝑡Δ
𝑆
) 𝑝, (1)
for any time 𝑡 and any node 𝑗. The version number is
initialized as 0 when data is rst generated and monotonically
increased by 1 every time the data is refr eshed.
Higher network storage and transmission overhead is gen-
erally required for decreasing Δ or increasing 𝑝. Hence, our
proposed model provides the exib ility to tradeoff between
cache freshness and network maintenance overhead according
to the specic data characteristics and applications. For exam-
ple, news from CNN or the New York Times may be refreshed
frequently, and smaller Δ (e.g., 1 hour) should be applied
accordingly. In contrast, the local weather report may be
updated daily, and the requirement on Δ can hence be relaxed
to avoid unnecessary network cost. The value of 𝑝 may be
exible based on user interests in the d ata. However, th ere are
cases where an application might have specic requirements
on Δ and 𝑝 to achieve sufcient levels of data freshness.
3) Data Update Model: Whenever data is refreshed, the
data source computes the difference between the current and
previous versions and generates a data update. Cached data is
refreshed by such update instead of complete data for better
storage and transmission efciency. This technique is called
Delta encoding, which has been applied in web caching for
reducing Internet trafc[26].

(a) Intentional and opportunistic refreshing
1
12
3 3 3 1 1
23
3
5 3 3 3 3 1 3
1
13
3
34
1 1 1 1
1
2 2 2 1
4 3 3 3 1
6 4 3 4 3 1 3
(b) Temporal sequence of data access and refreshing operations
Fig. 2. Distributed and hierarchical maintenance of cache freshness
Letting 𝑢
𝑖𝑗
denote the update of data from version 𝑖 to
version 𝑗, we assume that any caching node is able to refresh
the cached data as 𝑑
𝑖
𝑢
𝑖𝑗
𝑑
𝑗
,where𝑑
𝑖
and 𝑑
𝑗
denote the
data with version 𝑖 and 𝑗, respectively. We also assume that
any node is able to compute 𝑢
𝑖𝑗
from 𝑑
𝑖
and 𝑑
𝑗
.
When d ata has been refreshed multiple times, various up-
dates for the same data may co-exist in the network. We
assume that any node is able to merge consecutive data
updates, i.e., 𝑢
𝑖𝑗
𝑢
𝑗𝑘
𝑢
𝑖𝑘
.However,𝑑
𝑗
cannot be refreshed
to 𝑑
𝑘
by 𝑢
𝑖𝑘
even if 𝑗>𝑖. For example, 𝑢
14
which is produced
by merging 𝑢
13
and 𝑢
34
cannot be used to refresh 𝑑
3
to 𝑑
4
.
B. Caching Scenario
Mobile nodes share data generated by themselves or ob-
tained from the Internet. In this paper, we consider a generic
caching scenario which is also used in [22]. The query
generated by a node is satised as soon as this node contacts
some other node caching the data. During the mean time,
the query is stored at the requesting node. After the query
is satised, the requesting node caches the data locally for
answering possible queries in the future. Each cached data
item is associated with a nite lifetime and is automatically
removed from cache when it expires. The data lifetime may
change each time the cached data is refreshed.
In practice, when multiple data items with varied popularity
compete for the limited buffer of caching nodes, more popular
data is prioritized to ensure that the cumulative data access
delay is minimized. Such prioritization is generally formulated
as a knapsack problem [17] and can be solved in pseudo-
polynomial time using a dynamic programming approach
[25]. Hence, the rest of this paper will focus on ensuring
the freshness of cached copies of a specicdataitem.The
consideration of multiple data items and limited node buffer
is orthogonal to the major focus of this paper.
In the above scenario, data is essentially disseminated
among nodes interested in the data when they contact each
other, and these nodes form a “Data Access Tree (DAT)” as
shown in Figure 1. Queries of nodes 𝐴 and 𝐵 are satised
when they contact the data source 𝑆. Data cached at 𝐴 and 𝐵
are then u sed for satisfying queries from nodes 𝐶, 𝐷 and 𝐸.
Due to intermittent network connectivity, each node in the
DAT only has knowledge about data cached at its children. For
example, after having its query satised by 𝑆, 𝐴 may lose its
connection with 𝑆 due to mobility, and hence 𝐴 is unaware of
the data cached at nodes 𝐵, 𝐷 and 𝐸. Similarly, 𝑆 may only
be aware o f data cached at nodes 𝐴 and 𝐵. Such limitation
makes it challenging to maintain cache freshness, because it
is difcult for the data source to determine “where to” and
“how to” refresh the cached data.
C. Basic Idea
Our b asic idea for maintaining cache freshness is to refresh
the cached data in a distributed and hierarchical manner. As
illustrated in Figure 2, this refreshing process is split into
two parts, i.e., the intentional refreshing and the opportunistic
refreshing, according to whether the refreshing node has the
knowledge about the cached data to be refreshed.
In intentional refreshing, each node is only responsible for
refreshing data cached at its children in the DAT. For example,
in Figure 2(a) node 𝑆 is only responsible for refreshing data
cached at 𝐴 and
𝐵.Since𝐴 and 𝐵 obtain their cached
data from 𝑆, 𝑆 has knowledge about the versions of their
cached data and is able to prepare the appropriate data updates
accordingly. In the example shown in Figure 2(b), 𝑆 refreshes
data cached at 𝐴 and 𝐵 using updates 𝑢
23
and 𝑢
13
,when𝑆
contacts 𝐴 and 𝐵 at tim e 𝑡
3
and 𝑡
4
respectively. In Section
V, these updates are also opportunistically replicated to ensure
that they can be delivered to 𝐴 and 𝐵 on time. Particularly,
the topology of DAT may change due to the expiration of
cached data. When 𝐴 is removed from the DAT due to cache
expiration, its child 𝐶 only re-connects to the DAT and gets
updated when 𝐶 contacts another node in the DAT.
In opportunistic refreshing, a node refreshes any cached
data with older versions whenever possible upon opportunistic
contact. For example in Figure 2(a), when node 𝐴 contacts
node 𝐷 at time 𝑡
6
, 𝐴 updates the data cached at 𝐷 from
𝑑
1
to 𝑑
3
.Since𝐴 does not know the version of the data
cached at 𝐷, it cannot prepare 𝑢
13
for 𝐷 in advance
2
.
Instead, 𝐴 has to transmit the complete data 𝑑
3
to 𝐷 with
2
The update 𝑢
13
can only be calculated using 𝑑
1
and 𝑑
3
.

(a) CNN Top Stories (b) BBC Politics (c) NYTimes Sports (d) Business W eek Daily
Fig. 3. CCDF of inter-refreshing time of individual RSS feeds
Avg. inter-
No. RSS feed Number of refreshing time
updates (hours)
1 CNN Top Stories 2051 0.2159
2 NYTimes US 4545 0.0954
3 CNN Politics 623 0.7166
4 BBC Politics 827 0.5429
5 ESPN Sports 2379 0.1856
6 NYTimes Sports 3344 0.1355
7 Business Week Daily 4783 0.0948
8 Google News Business 7266 0.061
9 Weather.com NYC 555 0.8247
10 Google News ShowBiz 5483 0.0808
11 BBC ShowBiz 531 0.8506
TABL E I
N
EWS UPDATES RETRI EVED FROM WEB RSS F EEDS
higher transmission overhead. In Section VI, we propose to
probabilistically determine whether to transmit the complete
data according to the chance of satisfying the requirement of
cache freshness, so as to optimize the tradeoff between cache
freshness and network transmission overhead.
IV. R
EFRESHING PATTERNS OF WEB CONTENTS
In this section, we investigate the refreshing patterns of real-
istic web contents, as well as their temporal variations during
different time periods in a day. These patterns highlight the
homogeneity of data refreshing behaviors among different data
sources and categories, and suggest appropriate calculation of
utilities of data updates for refreshing cached data.
A. Datasets
We investigate the refreshing patterns o f categorized web
news. We dynamically retrieved news updates from news
websites including CNN, New York Times, BBC, Google
News, etc, by subscribing to their public RSS feeds. During the
3-week experiment period between 10/3/2011 and 10/21/2011,
we have retrieved a total number of 32787 RSS updates from
11 RSS feeds in 7 news categories. The information about
these RSS feeds and retrieved news updates is summarized in
Table I, which shows that the RSS feeds differ in their numbers
of updates and the update frequencies.
B. Distribution of Inter-Refreshing Time
We provide both empirical and analytical evidence of a
dichotomy in the Complementary Cumulative Distribution
Function (CCDF) of the inter-refreshing time, which is dened
Fig. 4. Aggregate CCDF of the inter-refreshing time in log-log scale
as the time interval between two consecutive news updates
from the same RSS feed. Our results show that up to a
boundary on the order of several minutes, the decay of the
CCDF is well approximated as exponential. In contrast, the
decay exhibits power-law characteristics beyond this boundary.
1) Aggregate distribution: Figure 4 shows the aggregate
CCDF of inter-refreshing time for all the RSS feeds, in log-
log scale. The CCDF values exhibit slow decay over the range
spanning from a few seconds to 0.3047 hour. It suggests that
around 90% of inter-refreshing time falls into this range and
follows an exponential distribution. Figure 4 also shows that
the CCDF values of inter-refreshing time within this range is
accurately approximated by the random samples drawn from
an exponential distribution with the average inter-refreshing
time (0.1517 hours) as parameter.
For the remaining 10% of inter-refreshing time with values
larger than the boundary, the CCDF values exhibit linear decay
which suggests a power-law tail. To better examine such tail
characteristics, we also plot the CCDF of a generalized Pareto
distribution with the shape parameter 𝜉 =0.5, location param-
eter 𝜇 =0.1517 and scale parameter 𝜎 = 𝜇 𝜉 =0.0759.As
shown in Figure 4, the Pareto CCDF closely approximates that
of the inter-refreshing time beyond the boundary. Especially
when inter-refreshing time is longer than 1 hour, the two
curves almost overlap with each other.
2) Distributions of individual RSS feeds: Surprisingly,
we found that the distributions of inter-refreshing time of
individual RSS feeds exhibit similar characteristics with that
of the aggregate distribution. For example, for the two RSS

(a) NYTimes US (b) CNN Politics (c) ESPN Sports (d) Google News Business
Fig. 5. Temporal distribution of news updates during different hours in a day
No. Boundary Exponential generalized Pareto
RSS (hours) percent. of 𝛼 (%) percent. of 𝛼 (%)
feed updates (%) updates (%)
1 0.2178 91.07 4.33 9.93 5.37
2 0.3245 84.24 6.71 15.76 3.28
3 1.9483 88.12 7.24 11.88 3.65
4 1.6237 86.75 5.69 13.25 4.45
5 0.2382 93.37 6.54 6.63 4.87
6 0.2754 92.28 6.73 7.72 2.12
7 0.3112 87.63 5.26 12.37 3.13
8 0.2466 89.37 8.45 10.63 2.64
9 1.7928 90.22 11.62 9.78 8.25
10 0.1928 88.57 6.75 11.43 3.58
11 2.0983 83.32 7.44 16.68 3.23
TABL E II
N
UMERICAL RES ULTS F OR DIS TRIBUTIONS OF INTER-REFRESHING TIME
OF I NDIVIDUAL
RSS F EEDS
feeds in Figure 3 with different news categories, the CCDF
decay of each RSS feed is analogous to that of the aggregate
CCDF in Figure 4. Figure 3 shows that the boundaries for
different RSS feeds are heterogeneous and mainly determined
by the average inter-refreshing time. These boundaries are
summarized in Table II.
To quantitatively justify the characteristics of exponential
and power-law decay in the CCDF of individual RSS feeds, we
perform a Kolmogorov-Smirnov goodness-of-ttest[30]on
each of the 11 RSS feeds listed in Table I. For each RSS feed,
we collect the inter-contact times smaller than its boundary
and test whether the null hypothesis “these inter-contact times
are exponentially distributed” can be accepted. A similar test
is performed o n the inter-contact times with larger values for
the generalized Pareto distribution.
The signicance levels (𝛼) for these null hypotheses being
accepted are listed in Table II. The lower the signicance
level is, the more condent we are that the corresponding
hypothesis is statistically true. As shown in Table II, for all
the RSS feeds, the probability fo r erroneou sly accepting the
null hypotheses is lower than 10%, which is the signicance
level usually being used for statistical hypothesis testing [8].
Particularly, the signicance levels for accepting a generalized
Pareto distribution are generally better than those for accepting
an exponential distribution.
C. Temporal Variations
We are also interested in the temporal variations of the
RSS feeds’ updating patterns. Figure 5 shows the temporal
distribution of news updates from RSS feeds over d ifferent
Fig. 6. Standard deviation of the numbers of news updates during different
hours in a day
hours in a day. We observe that the characteristics of such
temporal variation are heterogeneous with different RSS feeds.
For example, the majority of news updates from NYTimes and
ESPN are generated during the time period from the afternoon
to the evening. Comparatively, the news updates from Google
News are evenly distributed among different hours in a day.
To better quantify the skewness of such temporal variation,
we calculate the standard deviation of the numbers of news
updates during different hours in a day for each of the 11 RSS
feeds listed in Table I, and the calculation results are shown in
Figure 6. By comparing Figure 6 with Figure 5, we conclude
that the temporal distributions of news updates from most RSS
feeds are highly skewed. The transient distribution of inter-
refresh ing time of a RSS feed during specic time periods
hence may differ a lot from its cumulative distribution. Such
temporal variation may affect the perfor mance of maintaining
cache freshness, and will be evaluated in detail via trace-driven
simulations in Section VII.
V. I
NTENTIONAL REFRESHING
In this section, we explain how to ensure that data updates
are delivered to the caching nodes on time, so that the
freshness requirements of cached data are satised. Based on
investigation results on the distribution of inter-refreshing time
in Section IV, we calculate the utility of each update which
estimates the chance for the requirement being satised by this
update. Such u tility is then used for opportunistic replication
of data updates.

Citations
More filters
Posted Content
TL;DR: The current state of the art in the design and optimization of low-latency cyberphysical systems and applications in which sources send time-stamped status updates to interested recipients is described and AoI timeliness metrics are described.
Abstract: We summarize recent contributions in the broad area of age of information (AoI). In particular, we describe the current state of the art in the design and optimization of low-latency cyberphysical systems and applications in which sources send time-stamped status updates to interested recipients. These applications desire status updates at the recipients to be as timely as possible; however, this is typically constrained by limited system resources. We describe AoI timeliness metrics and present general methods of AoI evaluation analysis that are applicable to a wide variety of sources and systems. Starting from elementary single-server queues, we apply these AoI methods to a range of increasingly complex systems, including energy harvesting sensors transmitting over noisy channels, parallel server systems, queueing networks, and various single-hop and multi-hop wireless networks. We also explore how update age is related to MMSE methods of sampling, estimation and control of stochastic processes. The paper concludes with a review of efforts to employ age optimization in cyberphysical applications.

265 citations


Cites background from "Distributed Maintenance of Cache Fr..."

  • ...[235] were the first to introduce the concept of cache freshness in opportunistic mobile networks....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors summarize recent contributions in the broad area of AoI and present general AoI evaluation analysis that are applicable to a wide variety of sources and systems, starting from elementary single-server queues, and applying these AoI methods to a range of increasingly complex systems, including energy harvesting sensors transmitting over noisy channels, parallel server systems, queueing networks, and various single-hop and multi-hop wireless networks.
Abstract: We summarize recent contributions in the broad area of age of information (AoI). In particular, we describe the current state of the art in the design and optimization of low-latency cyberphysical systems and applications in which sources send time-stamped status updates to interested recipients. These applications desire status updates at the recipients to be as timely as possible; however, this is typically constrained by limited system resources. We describe AoI timeliness metrics and present general methods of AoI evaluation analysis that are applicable to a wide variety of sources and systems. Starting from elementary single-server queues, we apply these AoI methods to a range of increasingly complex systems, including energy harvesting sensors transmitting over noisy channels, parallel server systems, queueing networks, and various single-hop and multi-hop wireless networks. We also explore how update age is related to MMSE methods of sampling, estimation and control of stochastic processes. The paper concludes with a review of efforts to employ age optimization in cyberphysical applications.

213 citations

Journal ArticleDOI
TL;DR: A survey of the state-of-the-art research on SAN with focus on three aspects: routing and forwarding, incentive mechanisms, and data dissemination is presented.
Abstract: The widespread proliferation of handheld devices enables mobile carriers to be connected at anytime and anywhere. Meanwhile, the mobility patterns of mobile devices strongly depend on the users' movements, which are closely related to their social relationships and behaviors. Consequently, today's mobile networks are becoming increasingly human centric. This leads to the emergence of a new field which we call socially aware networking (SAN). One of the major features of SAN is that social awareness becomes indispensable information for the design of networking solutions. This emerging paradigm is applicable to various types of networks (e.g., opportunistic networks, mobile social networks, delay-tolerant networks, ad hoc networks, etc.) where the users have social relationships and interactions. By exploiting social properties of nodes, SAN can provide better networking support to innovative applications and services. In addition, it facilitates the convergence of human society and cyber-physical systems. In this paper, for the first time, to the best of our knowledge, we present a survey of this emerging field. Basic concepts of SAN are introduced. We intend to generalize the widely used social properties in this regard. The state-of-the-art research on SAN is reviewed with focus on three aspects: routing and forwarding, incentive mechanisms, and data dissemination. Some important open issues with respect to mobile social sensing and learning, privacy, node selfishness, and scalability are discussed.

141 citations


Cites background from "Distributed Maintenance of Cache Fr..."

  • ...RADON, Give2Get, SRed and Li and Cao’s scheme are easy to operate, while MobiID and IROMAN are of high reliability thanks to the exploring of social community and group strength....

    [...]

  • ...Li and Cao [79] presented a similar scheme to migrate routing misbehavior through detecting packet dropping....

    [...]

  • ...Recently, Cao et al. [100], for the first time, proposed a scheme to efficiently maintain the cache freshness by organizing the caching nodes as a tree structure during data access....

    [...]

Book ChapterDOI
05 Aug 2014
TL;DR: Threshold Models, Information Diffusion Models, Social Influence And Influence Maximization, and Other Extensions.
Abstract: 1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Social Influence And Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Information Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 Threshold Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1.1 Linear Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1.2 The Majority Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1.3 The Small Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1.4 The Unanimous Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . 10 Other Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.2 Cascading Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

51 citations

Journal ArticleDOI
TL;DR: This paper proposes an incentive-driven and freshness-aware pub/sub Content Dissemination scheme, called ConDis, for selfish OppNets, and shows that ConDis is superior to other existing schemes in terms of total freshness value, total delivered contents, and total transmission cost.
Abstract: Recently, the content-based publish/subscribe (pub/sub) paradigm has been gaining popularity in opportunistic mobile networks (OppNets) for its flexibility and adaptability. Since nodes in OppNets are controlled by humans, they often behave selfishly. Therefore, stimulating nodes in selfish OppNets to collect, store, and share contents efficiently is one of the key challenges. Meanwhile, guaranteeing the freshness of contents is also a big problem for content dissemination in OppNets. In this paper, in order to solve these problems, we propose an incentive-driven and freshness-aware pub/sub Content Dissemination scheme, called ConDis , for selfish OppNets. In ConDis , the Tit-For-Tat (TFT) scheme is employed to deal with selfish behaviors of nodes in OppNets. Moreover, a novel content exchange protocol is proposed when nodes are in contact. Specifically, during each contact, the exchange order is determined by the content utility, which represents the usefulness of a content for a certain node, and the objective of nodes is to maximize the utility of the content inventory stored in their buffer. Extensive realistic trace-driven simulation results show that ConDis is superior to other existing schemes in terms of total freshness value, total delivered contents, and total transmission cost.

45 citations


Cites background from "Distributed Maintenance of Cache Fr..."

  • ...Furthermore, since the cached data may be refreshed periodically and is subject to expiration, a novel scheme was proposed in [24] to efficiently maintain freshness of the cached data....

    [...]

References
More filters
Book
01 Jan 1999
TL;DR: In this article, the authors present an analysis of the general linear model for regression and correlation models, showing that the linear model is invariant and invariant to invariance and equivariance.
Abstract: Preface to the Sixth Edition. List of Examples. Glossary of Abbreviations. 17. Estimation and Sufficiency. 18. Estimation: Maximum Likelihood and Other Methods. 19. Interval Estimation. 20. Tests of Hypotheses: Simple Null Hypotheses. 21. Tests of Hypotheses: Composite Hypotheses. 22. Likelihood Ratio Tests and Test Efficiency. 23. Invariance and Equivariance. 24. Sequential Methods. 25. Tests of Fit. 26. Comparative Statistical Inference. 27. Statistical Relationship: Linear Regression and Correlation. 28. Partial and Multiple Correlation. 29. The General Linear Model. 30. Fixed Effects Analysis of Variance. 31. Other Analysis of Variance Models. 32. Analysis and Diagnostics for the Linear Model. Appendix Tables. References. Index of Examples in Text. Author Index. Subject Index.

113 citations


Additional excerpts

  • ...𝑆 only replicates the data update to 𝑅𝑘 if 𝑈𝑅𝑘 ≥ 𝑈𝑅𝑗 for ∀𝑗 ∈ [0, 𝑘), and 𝑆 itself is considered as 𝑅0....

    [...]

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes an effective scheme which ensures appropriate NCL selection based on a probabilistic selection metric, and coordinate multiple caching nodes to optimize trade off between data accessibility and caching overhead.
Abstract: Disruption Tolerant Networks (DTNs) are characterized by the low node density, unpredictable node mobility and lack of global network information. Most of current research efforts in DTNs focus on data forwarding, but only limited work has been done on providing effective data access to mobile users. In this paper, we propose a novel approach to support cooperative caching in DTNs, which enables the sharing and coordination of cached data among multiple nodes and reduces data access delay. Our basic idea is to intentionally cache data at a set of Network Central Locations (NCLs), which can be easily accessed by other nodes in the network. We propose an effective scheme which ensures appropriate NCL selection based on a probabilistic selection metric, and coordinate multiple caching nodes to optimize trade off between data accessibility and caching overhead. Extensive trace-driven simulations show that our scheme significantly improves data access performance compared to existing schemes.

101 citations


"Distributed Maintenance of Cache Fr..." refers background in this paper

  • ...For example, news from CNN or the New York Times may be refreshed frequently, and smaller Δ (e.g., 1 hour) should be applied accordingly....

    [...]

  • ...Currently, research efforts have been focusing on determining the appropriate caching locations [27], [19], [17] or the optimal caching policies for minimizing the data access delay [28], [22]....

    [...]

Proceedings ArticleDOI
01 Dec 2009
TL;DR: A reactive distributed algorithm, Query Counting Replication (QCR) is developed that for any delay-utility function drives the global cache towards the optimal allocation - without use of any explicit estimators or control channel information.
Abstract: Multimedia content dissemination in mobile settings requires significant bandwidth. Centralized infrastructure is often either inadequate or overly expensive to fill the demand. Here, we study an alternative P2P content dissemination scheme for mobile devices (e.g., smart-phones), which leverages local dedicated caches on these devices to opportunistically fulfill user requests. In our model, the allocation of content in the global distributed cache comprising the union of all local caches, determines the pattern of demand fulfillment. By selectively replicating local content at node meetings, the global cache can be driven towards a more efficient allocation. However, the allocation's efficiency itself is determined by a previously overlooked factor - the impatience of content requesters. By describing user impatience in the form of any monotonically decreasing delay-utility functions, we show that an optimal allocation can be efficient computed or approximated. As users become increasingly impatient, the optimal allocation varies steadily between uniform and highly-skewed towards popular content.Moreover, in opportunistic environments, the global cache state may be difficult or impossible to obtain, requiring that replication decisions be made using only local knowledge. We develop a reactive distributed algorithm, Query Counting Replication (QCR) that for any delay-utility function drives the global cache towards the optimal allocation - without use of any explicit estimators or control channel information. We validate our techniques on real-world contact traces, demonstrating the robustness of our analytic results in the face of heterogeneous meeting rates and bursty contacts. We find QCR compares favorably to a variety of heuristic perfect control-channel competitors.

100 citations


"Distributed Maintenance of Cache Fr..." refers background in this paper

  • ...For example, news from CNN or the New York Times may be refreshed frequently, and smaller Δ (e.g., 1 hour) should be applied accordingly....

    [...]

  • ...Currently, research efforts have been focusing on determining the appropriate caching locations [27], [19], [17] or the optimal caching policies for minimizing the data access delay [28], [22]....

    [...]

Proceedings ArticleDOI
14 Jun 2010
TL;DR: This work proposes Psephos, a novel mechanism for determining the caching policy of each mobile user that is fully distributed and designed for a heterogeneous environment, in which demand for content, access to resources, and mobility characteristics may vary across different users.
Abstract: Sharing content over a mobile network through opportunistic contacts has recently received considerable attention.In proposed scenarios, users store content they download in a local cache and share it with other users they meet, e.g., via Bluetooth or WiFi. The storage capacity of mobile devices is typically limited; therefore, identifying which content a user should store in her cache is a fundamental problem in the operation of any such content distribution system.In this work, we propose Psephos, a novel mechanism for determining the caching policy of each mobile user. Psephos is fully distributed: users compute their own policies individually, in the absence of a central authority. Moreover, it is designed for a heterogeneous environment, in which demand for content, access to resources, and mobility characteristics may vary across different users. Most importantly, the caching policies computed by our mechanism are optimal: we rigorously show that Psephos maximizes the system's social welfare. Our results are derived formally using techniques from stochastic approximation and convex optimization; to the best of our knowledge, our work is the first to address caching with heterogeneity in a fully distributed manner.

88 citations


"Distributed Maintenance of Cache Fr..." refers background in this paper

  • ...For example, news from CNN or the New York Times may be refreshed frequently, and smaller Δ (e.g., 1 hour) should be applied accordingly....

    [...]

  • ...Currently, research efforts have been focusing on determining the appropriate caching locations [27], [19], [17] or the optimal caching policies for minimizing the data access delay [28], [22]....

    [...]

Journal ArticleDOI
TL;DR: A proposed relay-peer-based cache consistency protocol offers a generic and flexible method for carrying out cache invalidation in mobile wireless environments.
Abstract: The trend toward wireless communications and advances in mobile technologies are increasing consumer demand for ubiquitous access to Internet-based information and services. A 3D framework provides a basis for designing, analyzing, and evaluating strategies to address data consistency issues in mobile wireless environments. A proposed relay-peer-based cache consistency protocol offers a generic and flexible method for carrying out cache invalidation

85 citations


"Distributed Maintenance of Cache Fr..." refers background in this paper

  • ...However, there are cases where an application might have specific requirements on Δ and 𝑝 to achieve sufficient levels of data freshness....

    [...]

Frequently Asked Questions (7)
Q1. What are the contributions mentioned in the paper "Distributed maintenance of cache freshness in opportunistic mobile networks" ?

In this paper, the authors propose a scheme to efficiently maintain cache freshness. Extensive tracedriven simulations show that their scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users. 

Due to the intermittent network connectivity in opportunistic mobile networks, data is forwarded in a “carry-and-forward” manner. 

Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness. 

Their results show that up to a boundary on the order of several minutes, the decay of the CCDF is well approximated as exponential. 

since different values of 𝑝 do not affect the calculation of utilities of data updates, such increase of refreshing overhead is relatively smaller than that of decreasing Δ.Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day. 

The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces. 

From Figure 12 the authors observe that, when the value of Δ is small, the cache freshness is mainly constrained by the network contact capability, and the actual refreshing delay is much higher than the required Δ. Such inability to satisfy the cache freshness requirements leads to more replications of data updates as described in Section V-B, and makes caching nodes more prone to perform opportunistic refreshing. 

Trending Questions (1)
How to clear browser cache in Robot Framework?

In this paper, we propose a scheme to efficiently maintain cache freshness.