scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks

18 Jun 2012-pp 132-141
TL;DR: The basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.
Abstract: Opportunistic mobile networks consist of personal mobile devices which are intermittently connected with each other. Data access can be provided to these devices via cooperative caching without support from the cellular network infrastructure, but only limited research has been done on maintaining the freshness of cached data which may be refreshed periodically and is subject to expiration. In this paper, we propose a scheme to efficiently maintain cache freshness. Our basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner. Probabilistic replication methods are also proposed to analytically ensure that the freshness requirements of cached data are satisfied. Extensive trace driven simulations show that our scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users.

Summary (5 min read)

Introduction

  • In recent years, personal hand-held mobile devices such as smartphones are capable of storing, processing and displaying various types of digital media contents including news, music, pictures or video clips.
  • In these networks, it is generally difficult to maintain end-toend communication links among mobile users.
  • There is only limited research on maintaining the freshness of cached data in the network, despite the fact that media contents may be refreshed periodically.
  • The authors basic idea is to organize the caching nodes1 as a tree structure during data access, and let each caching node be responsible for refreshing the data cached at its children in a distributed and hierarchical manner.
  • The rest of this paper is organized as follows.

A. Models

  • Opportunistic contacts among nodes are described by a network contact graph 𝐺(𝑉,𝐸), where the contact process between a node pair 𝑖, 𝑗 ∈ 𝑉 is modeled as an edge 𝑒𝑖𝑗 ∈ 𝐸. Similar to previous work [1], [34], the authors consider the pairwise node inter-contact time as exponentially distributed, also known as 1) Network Model.
  • There are cases where an application might have specific requirements on Δ and 𝑝 to achieve sufficient levels of data freshness.
  • Letting 𝑢𝑖𝑗 denote the update of data from version 𝑖 to version 𝑗, the authors assume that any caching node is able to refresh the cached data as 𝑑𝑖⊗𝑢𝑖𝑗 → 𝑑𝑗 , where 𝑑𝑖 and 𝑑𝑗 denote the data with version 𝑖 and 𝑗, respectively.
  • 𝑑𝑗 cannot be refreshed to 𝑑𝑘 by 𝑢𝑖𝑘 even if 𝑗 > 𝑖.

B. Caching Scenario

  • Mobile nodes share data generated by themselves or obtained from the Internet.
  • Each cached data item is associated with a finite lifetime and is automatically removed from cache when it expires.
  • In practice, when multiple data items with varied popularity compete for the limited buffer of caching nodes, more popular data is prioritized to ensure that the cumulative data access delay is minimized.
  • After having its query satisfied by 𝑆, 𝐴 may lose its connection with 𝑆 due to mobility, and hence 𝐴 is unaware of the data cached at nodes 𝐵, 𝐷 and 𝐸.

C. Basic Idea

  • The authors basic idea for maintaining cache freshness is to refresh the cached data in a distributed and hierarchical manner.
  • Particularly, the topology of DAT may change due to the expiration of cached data.
  • When node 𝐴 contacts node 𝐷 at time 𝑡6, 𝐴 updates the data cached at 𝐷 from 𝑑1 to 𝑑3.
  • Instead, 𝐴 has to transmit the complete data 𝑑3 to 𝐷 with 2The update 𝑢13 can only be calculated using 𝑑1 and 𝑑3.

IV. REFRESHING PATTERNS OF WEB CONTENTS

  • The authors investigate the refreshing patterns of realistic web contents, as well as their temporal variations during different time periods in a day.
  • These patterns highlight the homogeneity of data refreshing behaviors among different data sources and categories, and suggest appropriate calculation of utilities of data updates for refreshing cached data.

B. Distribution of Inter-Refreshing Time

  • The authors provide both empirical and analytical evidence of a dichotomy in the Complementary Cumulative Distribution Function (CCDF) of the inter-refreshing time, which is defined Fig. 1) Aggregate distribution: Figure 4 shows the aggregate CCDF of inter-refreshing time for all the RSS feeds, in loglog scale.
  • For the remaining 10% of inter-refreshing time with values larger than the boundary, the CCDF values exhibit linear decay which suggests a power-law tail.
  • A similar test is performed on the inter-contact times with larger values for the generalized Pareto distribution.
  • The significance levels (𝛼) for these null hypotheses being accepted are listed in Table II.

C. Temporal Variations

  • Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day.
  • The authors evaluate such temporal variation on the DieselNet trace.
  • In general, the temporal skewness can be found in all three evaluation metrics, and is determined by the temporal distributions of both node contacts and data updates available during different hours in a day.
  • As shown in Figure 14(a), the refreshing ratio during the time period between 8AM and 4PM is generally higher than the average refreshing ratio, because majority of node contacts have been generated during this time period according to [15].
  • In summary, the authors conclude that the transient performance of maintaining cache freshness differs a lot from the cumulative maintenance performance, and cache freshness can be further improved by appropriately exploiting the temporal variations of data refreshing pattern and node contact process.

A. Utility of Data Updates

  • In practice, the requirement of cache freshness may not be satisfied due to the limited nodes’ contact capability.
  • When a node 𝐵 in the DAT maintains the data update for its child 𝐷, it calculates the utility of this update which is equal to the probability that this update carried by 𝐵 satisfies the freshness requirement for data cached at𝐷.
  • According to Eq. (3), the utility should be calculated following Eq. (4) when the value of 𝑡−𝑡0−Δ is small.

B. Opportunistic Replication of Data Updates

  • If a node in the DAT finds out that the utility of the data update it carries is lower than the required probability 𝑝 for maintaining cache freshness, it opportunistically replicates the data update to other nodes outside of the DAT.
  • Such a replication process is illustrated in Figure 8.
  • 𝑅𝑘 outside of the DAT, it determines whether to replicate the data update for refreshing 𝐵 to 𝑅𝑘.
  • The replication when the utilities of data update at the 𝑘 selected relays satisfy 1− 𝑘∏ 𝑖=0 (1− 𝑈𝑅𝑖) ≥ 𝑝, (7) i.e., the probability that the requirement of cache freshness at 𝐵 is satisfied by at least one relay is equal to or larger than 𝑝.
  • Note that the selected relays are only able to refresh the specific data cached in the DAT, but are unable to provide data access to other nodes outside of the DAT.

VI. OPPORTUNISTIC REFRESHING

  • In addition to intentionally refreshing data cached at its children in the DAT, a node also refreshes other cached data with older versions whenever possible upon opportunistic contacts.
  • The authors propose a probabilistic approach to efficiently make cache refreshing decisions and optimize the tradeoff between cache freshness and network transmission overhead.

A. Probabilistic Decision

  • Opportunistic refreshing is generally more expensive because the complete data usually needs to be transmitted, and its size is much larger than that of data update.
  • As a result, it is important to make appropriate decisions on opportunistic refreshing, so as to optimize the tradeoff between cache freshness and network transmission overhead, and to avoid inefficient consumption of network resources.
  • The authors propose a probabilistic approach to efficiently refresh the cache data, and the data is only refreshed if its required freshness cannot be satisfied by intentional refreshing.
  • Hence, 𝑈𝐵𝐷(𝑡𝐶) can be calculated by 𝐷 and is available to 𝐴 when 𝐴 contacts 𝐷. Since additional relays may be used for delivering data updates in intentional refreshing as described in Section V-B, the utility 𝑈𝐵𝐷(𝑡𝐶) calculated by 𝐷 essentially provides a lower bound on the actual effectiveness of intentional refreshing.

B. Side-Effect of Opportunistic Refreshing

  • Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness.
  • Such side-effect is illustrated in Figure 9.
  • When 𝐴 opportunistically contacts node 𝐷 and refreshes 𝐷’s cached data from 𝑑1 to 𝑑3, it is unaware of the data cached at 𝐵 with a newer version 𝑑4.

VII. PERFORMANCE EVALUATIONS

  • The authors compare the performance of their proposed cache refreshing scheme with the following schemes: ∙ Passive Refreshing: a caching node only refreshes data cached at another node upon contact.
  • It is different from their opportunistic refreshing scheme in Section VI in that it does not consider the tradeoff between cache freshness and network transmission overhead.
  • Every time when the source updates data, it actively disseminates the date update to the whole network, also known as ∙ Active Refreshing.
  • The following metrics are used for evaluations.
  • Each simulation is repeated multiple times with random data sources and user queries for statistical convergence.

A. Simulation Setup

  • The authors evaluations are conducted on two realistic opportunistic mobile network traces, which record contacts among users carrying Bluetooth-enabled mobile devices.
  • These devices periodically detect their peers nearby, and a contact is recorded when two devices move close to each other.
  • The datasets described in Section IV are exploited to simulate the data being cached in the network, as well as the interrefreshing time of data.
  • Since the pairwise node contact frequency is generally lower than the data refreshing frequency, the authors pick up the 4 RSS feeds listed in Table I with average interrefreshing time longer than 0.5 hours for their evaluations.
  • Every time 𝑇 , each node determines whether to request data 𝑗 with probability 𝑃𝑗 .

B. Performance of Maintaining Cache Freshness

  • The authors first compare the performance of their proposed hierarchical refreshing scheme with other schemes by varying the lifetime (𝐿) of the cached data.
  • The evaluation results are shown in Figure 11.
  • Active Refreshing outperforms their scheme by 10%-15%, but Figure 11(c) shows that such performance is achieved at the cost of much higher refreshing overhead.
  • The parameter values are set by default as Δ = 1.5 hours and 𝑝 = 60%, and are varied during different simulations.
  • As described in Section V-B, increasing 𝑝 stimulates the caching nodes to replicate data updates, and hence increases the refreshing overhead as shown in Figure 13(b).

VIII. CONCLUSION

  • The authors focus on maintaining the freshness of cached data in opportunistic mobile networks.
  • The authors basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.
  • Based on the experimental investigation results on the refreshing patterns of real websites, the authors probabilistically replicate data updates, and analytically ensure that the freshness requirements of cached data are satisfied.
  • The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Distributed Maintenance of Cache Freshness in
Opportunistic Mobile Networks
Wei Gao and Guohong Cao
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802
{weigao,gcao}@cse.psu.edu
Mudhakar Srivatsa and Arun Iyengar
IBM T. J. Watson Research Center
Hawthorne, NY 10532
{msrivats, aruni}@us.ibm.com
Abstract—Opportunistic mobile networks consist of personal
mobile devices which are intermittently connected with each
other. Data access can be provided to these devices via cooperative
caching without support from the cellular network infrastructure,
but only limited research has been done on maintaining the
freshness of cached data which may be refreshed periodically
and is subject to expiration. In this paper, we propose a scheme
to efciently maintain cache freshness. Our basic idea is to let
each caching node be only responsible for refreshing a specic
set of caching nodes, so as to maintain cache freshness in a
distributed and hierarchical manner. Probabilistic replication
methods are also proposed to analytically ensure that the fresh-
ness requirements of cached data are satised. Extensive trace-
driven simulations show that our scheme signicantly improves
cache freshness, and hence ensures the validity of data access
provided to mobile users.
I. INT RODUCTION
In recent years, personal hand-held mobile devices such
as smartphones are capable of storing, processing and dis-
playing various types of digital media contents including
news, music, pictures or video clips. It is hence important
to provide efcient data access to m obile users with such
devices. Opportunistic mobile networks, which are also known
as Delay Tolerant Networks (DTNs) [13] or Pocket Switched
Networks (PSNs) [20], are exploited for providing such data
access without support of cellular network infrastructure. In
these networks, it is generally difcult to maintain end-to-
end communication links among mobile users. Mobile users
are only intermittently connected when they opportunistically
contact, i.e., moving into the communication range of the
short-range radio (e.g., Bluetooth, WiFi) of their smartphones.
Data access can be provided to mobile users via cooperative
caching. More specically, data is cached at mobile devices
based on the query history, so that queries for the data in
the future can be satised with less delay. Currently, research
efforts have been focusing on determining the appropriate
caching locations [27], [19], [17] or the optimal caching
policies for minimizing the data access d elay [28], [22].
However, there is only limited research on ma intaining the
freshness of cached data in the network, despite the fact that
media contents may be r efreshed periodically. I n practice, the
This work was supported in part by the US National Science Foundation
(NSF) under grant number CNS-0721479, and by Network Science CTA under
grant W911NF-09-2-0053.
refreshing frequency varies according to the specic content
characteristics. For example, the local weather report is usually
refreshed daily, but the media news at websites of CNN or
New York Times may be refreshed hourly. In such cases, the
versions of cached data in the network may be out-of-date, or
even be completely useless due to expiration.
The maintenance of cache freshness in opportunistic mo-
bile networks is challenging due to the intermittent network
connectivity and subsequent lack of information about cached
data. First, there may be multiple data copies being cached in
the network, so as to ensure timely response to user queries.
Without persistent network connectivity, it is generally difcult
for the data source to obtain information about the caching
locations or current versions of the cached data. It is therefore
challenging for the data source to determine “where to” and
“how to” refresh the cached data. Second, the opportunistic
network conn ectivity in creases the uncertainty of data trans-
mission and complicates the estimation of data transmission
delay. It is therefore difcult to determine whether the cached
data can be refreshed on time.
In this paper, we propose a scheme to address these chal-
lenges and to efciently maintain freshness of the cached data.
Our basic idea is to organize the caching nodes
1
as a tree
structure during data access, and let each caching node be
responsible for refreshing the data cached at its children in
a distributed and h ierarchical manner. The cache freshness
is also improved when the caching nodes opportunistically
contact each other. To the best of our knowledge, our work
is the rst which specically focuses on cache freshness in
opportunistic mobile networks.
Our detailed contributions are as follows:
We investigate the refreshing patterns of realistic web
contents. We observe that the distributions of inter-
refreshing time of the RSS feeds from major news
websites exhibit hybrid characteristics of exponential and
power-law, which have been validated by both empirical
and analytical evidences.
Based on the experimental investigation results, we ana-
lytically measure the utility of data updates for refreshing
the cached data via opportunistic node contacts. These
1
In the rest of this paper, the terms “devices” and “nodes” are used
interchangeably.

utilities are calculated based on a probabilistic model to
measure cache freshness. They are then used to oppor-
tunistically replicate data updates and analytically ensure
that the freshness requirements of cached data can be
satised.
The rest of this p aper is organized as follows. In Section
II we briey review the existing work. Section III provides
an overview about the models an d caching scenario we use,
and also highlights our basic idea. Section IV presents our
experimen tal investigation results on the refreshing patterns
of real web sites. Sections V and VI describe the d etails of
our proposed cache refreshing schemes. The results of trace-
driven performance evaluations are shown in Section VII, and
Section VIII concludes the paper.
II. R
ELATED WORK
Due to the intermittent network connectivity in opportunistic
mobile networks, data is forwarded in a “carry-and-forward”
manner. Node mobility is exploited to let nodes physically
carry data as relays, and forward data opportunistically when
contacting others. The key problem is hence how to select the
most appropriate nodes as relays, based o n the prediction o f
node contacts in the future. Some forwarding schemes do such
prediction based on node mobility patterns [9], [33], [14]. In
some other schemes [4], [1], stochastic node contact process is
exploited for better prediction accuracy. Social contact patterns
of mobile users, such as centr ality and community structures,
have also been exploited for relay selection [10], [21], [18].
Based on this opportunistic communication paradigm, data
access can be provided to mobile users in various ways. In
some schemes [23], [16], data is actively disseminated to
specic users based on their interest proles. Publish/subscribe
systems [32], [24] are also used for data dissemination by ex-
ploiting social commun ity structures to determine the brokers.
Caching is another way to provide data access. Determining
appropriate caching policies in opportunistic mobile networks
is complicated by the lack of global network information.
Some research efforts focus on improving data accessibility
from infrastructure networks such as WiFi [19] or Internet
[27], and some others study peer-to-peer data sharing among
mobile nodes. In [17], data is cached at specic nodes which
can be easily accessed by others. In [28], [22], caching policies
are dynamically determined based on data importance, so that
the aggregate utility of mobile nodes can be maximized.
When the versions of cached data in the network are het-
erogeneous and different from that of the source data, research
efforts have been focusing on maintaining the consistency of
these cache versions [7], [11], [5], [6]. Being different from
existing work, in this paper we focus on ensuring the freshness
of cached data, i.e., the version of any cached data should be
as close to that of the source data as possible. [22] discussed
the practical scenario in which data is periodically refreshed,
but did not provided specic solutions for maintaining cache
freshness. We propose methods to maintain cache freshness in
a distributed and hierarchical mann e r, and analytically ensure
that the freshness requirement of cached data can be satised.
Fig. 1. Data Access T ree (DAT). Each node in the DAT accesses data when
it contacts its parent node in the DAT.
III. OVERVIEW
A. Models
1) Network Model: Opportunistic contacts among nodes
are described by a network contact graph 𝐺(𝑉,𝐸),wherethe
contact process between a node pair 𝑖, 𝑗 𝑉 is modeled as
an edge 𝑒
𝑖𝑗
𝐸. The characteristics of an edge 𝑒
𝑖𝑗
𝐸
are determined by the properties of inter-contact time among
nodes. Similar to previous work [1], [34], we consider the
pairwise node inter-contact time as exponentially distributed.
Contacts between nodes 𝑖 and 𝑗 then form a Poisson process
with con tact rate 𝜆
𝑖𝑗
, which is calculated in real time from the
cumulative contacts b etween nodes 𝑖 and 𝑗.
2) Cache Freshness Model: We focus on ensuring the
freshness of cached data, i.e., the version of any cached data
should be as close to that of the source data as possible. Letting
𝑣
𝑡
𝑆
denote the version number of source data at time 𝑡 and 𝑣
𝑡
𝑗
denote that of data cached at node 𝑗, our requirement on cache
freshness is probabilistically described as
(𝑣
𝑡
𝑗
𝑣
𝑡Δ
𝑆
) 𝑝, (1)
for any time 𝑡 and any node 𝑗. The version number is
initialized as 0 when data is rst generated and monotonically
increased by 1 every time the data is refr eshed.
Higher network storage and transmission overhead is gen-
erally required for decreasing Δ or increasing 𝑝. Hence, our
proposed model provides the exib ility to tradeoff between
cache freshness and network maintenance overhead according
to the specic data characteristics and applications. For exam-
ple, news from CNN or the New York Times may be refreshed
frequently, and smaller Δ (e.g., 1 hour) should be applied
accordingly. In contrast, the local weather report may be
updated daily, and the requirement on Δ can hence be relaxed
to avoid unnecessary network cost. The value of 𝑝 may be
exible based on user interests in the d ata. However, th ere are
cases where an application might have specic requirements
on Δ and 𝑝 to achieve sufcient levels of data freshness.
3) Data Update Model: Whenever data is refreshed, the
data source computes the difference between the current and
previous versions and generates a data update. Cached data is
refreshed by such update instead of complete data for better
storage and transmission efciency. This technique is called
Delta encoding, which has been applied in web caching for
reducing Internet trafc[26].

(a) Intentional and opportunistic refreshing
1
12
3 3 3 1 1
23
3
5 3 3 3 3 1 3
1
13
3
34
1 1 1 1
1
2 2 2 1
4 3 3 3 1
6 4 3 4 3 1 3
(b) Temporal sequence of data access and refreshing operations
Fig. 2. Distributed and hierarchical maintenance of cache freshness
Letting 𝑢
𝑖𝑗
denote the update of data from version 𝑖 to
version 𝑗, we assume that any caching node is able to refresh
the cached data as 𝑑
𝑖
𝑢
𝑖𝑗
𝑑
𝑗
,where𝑑
𝑖
and 𝑑
𝑗
denote the
data with version 𝑖 and 𝑗, respectively. We also assume that
any node is able to compute 𝑢
𝑖𝑗
from 𝑑
𝑖
and 𝑑
𝑗
.
When d ata has been refreshed multiple times, various up-
dates for the same data may co-exist in the network. We
assume that any node is able to merge consecutive data
updates, i.e., 𝑢
𝑖𝑗
𝑢
𝑗𝑘
𝑢
𝑖𝑘
.However,𝑑
𝑗
cannot be refreshed
to 𝑑
𝑘
by 𝑢
𝑖𝑘
even if 𝑗>𝑖. For example, 𝑢
14
which is produced
by merging 𝑢
13
and 𝑢
34
cannot be used to refresh 𝑑
3
to 𝑑
4
.
B. Caching Scenario
Mobile nodes share data generated by themselves or ob-
tained from the Internet. In this paper, we consider a generic
caching scenario which is also used in [22]. The query
generated by a node is satised as soon as this node contacts
some other node caching the data. During the mean time,
the query is stored at the requesting node. After the query
is satised, the requesting node caches the data locally for
answering possible queries in the future. Each cached data
item is associated with a nite lifetime and is automatically
removed from cache when it expires. The data lifetime may
change each time the cached data is refreshed.
In practice, when multiple data items with varied popularity
compete for the limited buffer of caching nodes, more popular
data is prioritized to ensure that the cumulative data access
delay is minimized. Such prioritization is generally formulated
as a knapsack problem [17] and can be solved in pseudo-
polynomial time using a dynamic programming approach
[25]. Hence, the rest of this paper will focus on ensuring
the freshness of cached copies of a specicdataitem.The
consideration of multiple data items and limited node buffer
is orthogonal to the major focus of this paper.
In the above scenario, data is essentially disseminated
among nodes interested in the data when they contact each
other, and these nodes form a “Data Access Tree (DAT)” as
shown in Figure 1. Queries of nodes 𝐴 and 𝐵 are satised
when they contact the data source 𝑆. Data cached at 𝐴 and 𝐵
are then u sed for satisfying queries from nodes 𝐶, 𝐷 and 𝐸.
Due to intermittent network connectivity, each node in the
DAT only has knowledge about data cached at its children. For
example, after having its query satised by 𝑆, 𝐴 may lose its
connection with 𝑆 due to mobility, and hence 𝐴 is unaware of
the data cached at nodes 𝐵, 𝐷 and 𝐸. Similarly, 𝑆 may only
be aware o f data cached at nodes 𝐴 and 𝐵. Such limitation
makes it challenging to maintain cache freshness, because it
is difcult for the data source to determine “where to” and
“how to” refresh the cached data.
C. Basic Idea
Our b asic idea for maintaining cache freshness is to refresh
the cached data in a distributed and hierarchical manner. As
illustrated in Figure 2, this refreshing process is split into
two parts, i.e., the intentional refreshing and the opportunistic
refreshing, according to whether the refreshing node has the
knowledge about the cached data to be refreshed.
In intentional refreshing, each node is only responsible for
refreshing data cached at its children in the DAT. For example,
in Figure 2(a) node 𝑆 is only responsible for refreshing data
cached at 𝐴 and
𝐵.Since𝐴 and 𝐵 obtain their cached
data from 𝑆, 𝑆 has knowledge about the versions of their
cached data and is able to prepare the appropriate data updates
accordingly. In the example shown in Figure 2(b), 𝑆 refreshes
data cached at 𝐴 and 𝐵 using updates 𝑢
23
and 𝑢
13
,when𝑆
contacts 𝐴 and 𝐵 at tim e 𝑡
3
and 𝑡
4
respectively. In Section
V, these updates are also opportunistically replicated to ensure
that they can be delivered to 𝐴 and 𝐵 on time. Particularly,
the topology of DAT may change due to the expiration of
cached data. When 𝐴 is removed from the DAT due to cache
expiration, its child 𝐶 only re-connects to the DAT and gets
updated when 𝐶 contacts another node in the DAT.
In opportunistic refreshing, a node refreshes any cached
data with older versions whenever possible upon opportunistic
contact. For example in Figure 2(a), when node 𝐴 contacts
node 𝐷 at time 𝑡
6
, 𝐴 updates the data cached at 𝐷 from
𝑑
1
to 𝑑
3
.Since𝐴 does not know the version of the data
cached at 𝐷, it cannot prepare 𝑢
13
for 𝐷 in advance
2
.
Instead, 𝐴 has to transmit the complete data 𝑑
3
to 𝐷 with
2
The update 𝑢
13
can only be calculated using 𝑑
1
and 𝑑
3
.

(a) CNN Top Stories (b) BBC Politics (c) NYTimes Sports (d) Business W eek Daily
Fig. 3. CCDF of inter-refreshing time of individual RSS feeds
Avg. inter-
No. RSS feed Number of refreshing time
updates (hours)
1 CNN Top Stories 2051 0.2159
2 NYTimes US 4545 0.0954
3 CNN Politics 623 0.7166
4 BBC Politics 827 0.5429
5 ESPN Sports 2379 0.1856
6 NYTimes Sports 3344 0.1355
7 Business Week Daily 4783 0.0948
8 Google News Business 7266 0.061
9 Weather.com NYC 555 0.8247
10 Google News ShowBiz 5483 0.0808
11 BBC ShowBiz 531 0.8506
TABL E I
N
EWS UPDATES RETRI EVED FROM WEB RSS F EEDS
higher transmission overhead. In Section VI, we propose to
probabilistically determine whether to transmit the complete
data according to the chance of satisfying the requirement of
cache freshness, so as to optimize the tradeoff between cache
freshness and network transmission overhead.
IV. R
EFRESHING PATTERNS OF WEB CONTENTS
In this section, we investigate the refreshing patterns of real-
istic web contents, as well as their temporal variations during
different time periods in a day. These patterns highlight the
homogeneity of data refreshing behaviors among different data
sources and categories, and suggest appropriate calculation of
utilities of data updates for refreshing cached data.
A. Datasets
We investigate the refreshing patterns o f categorized web
news. We dynamically retrieved news updates from news
websites including CNN, New York Times, BBC, Google
News, etc, by subscribing to their public RSS feeds. During the
3-week experiment period between 10/3/2011 and 10/21/2011,
we have retrieved a total number of 32787 RSS updates from
11 RSS feeds in 7 news categories. The information about
these RSS feeds and retrieved news updates is summarized in
Table I, which shows that the RSS feeds differ in their numbers
of updates and the update frequencies.
B. Distribution of Inter-Refreshing Time
We provide both empirical and analytical evidence of a
dichotomy in the Complementary Cumulative Distribution
Function (CCDF) of the inter-refreshing time, which is dened
Fig. 4. Aggregate CCDF of the inter-refreshing time in log-log scale
as the time interval between two consecutive news updates
from the same RSS feed. Our results show that up to a
boundary on the order of several minutes, the decay of the
CCDF is well approximated as exponential. In contrast, the
decay exhibits power-law characteristics beyond this boundary.
1) Aggregate distribution: Figure 4 shows the aggregate
CCDF of inter-refreshing time for all the RSS feeds, in log-
log scale. The CCDF values exhibit slow decay over the range
spanning from a few seconds to 0.3047 hour. It suggests that
around 90% of inter-refreshing time falls into this range and
follows an exponential distribution. Figure 4 also shows that
the CCDF values of inter-refreshing time within this range is
accurately approximated by the random samples drawn from
an exponential distribution with the average inter-refreshing
time (0.1517 hours) as parameter.
For the remaining 10% of inter-refreshing time with values
larger than the boundary, the CCDF values exhibit linear decay
which suggests a power-law tail. To better examine such tail
characteristics, we also plot the CCDF of a generalized Pareto
distribution with the shape parameter 𝜉 =0.5, location param-
eter 𝜇 =0.1517 and scale parameter 𝜎 = 𝜇 𝜉 =0.0759.As
shown in Figure 4, the Pareto CCDF closely approximates that
of the inter-refreshing time beyond the boundary. Especially
when inter-refreshing time is longer than 1 hour, the two
curves almost overlap with each other.
2) Distributions of individual RSS feeds: Surprisingly,
we found that the distributions of inter-refreshing time of
individual RSS feeds exhibit similar characteristics with that
of the aggregate distribution. For example, for the two RSS

(a) NYTimes US (b) CNN Politics (c) ESPN Sports (d) Google News Business
Fig. 5. Temporal distribution of news updates during different hours in a day
No. Boundary Exponential generalized Pareto
RSS (hours) percent. of 𝛼 (%) percent. of 𝛼 (%)
feed updates (%) updates (%)
1 0.2178 91.07 4.33 9.93 5.37
2 0.3245 84.24 6.71 15.76 3.28
3 1.9483 88.12 7.24 11.88 3.65
4 1.6237 86.75 5.69 13.25 4.45
5 0.2382 93.37 6.54 6.63 4.87
6 0.2754 92.28 6.73 7.72 2.12
7 0.3112 87.63 5.26 12.37 3.13
8 0.2466 89.37 8.45 10.63 2.64
9 1.7928 90.22 11.62 9.78 8.25
10 0.1928 88.57 6.75 11.43 3.58
11 2.0983 83.32 7.44 16.68 3.23
TABL E II
N
UMERICAL RES ULTS F OR DIS TRIBUTIONS OF INTER-REFRESHING TIME
OF I NDIVIDUAL
RSS F EEDS
feeds in Figure 3 with different news categories, the CCDF
decay of each RSS feed is analogous to that of the aggregate
CCDF in Figure 4. Figure 3 shows that the boundaries for
different RSS feeds are heterogeneous and mainly determined
by the average inter-refreshing time. These boundaries are
summarized in Table II.
To quantitatively justify the characteristics of exponential
and power-law decay in the CCDF of individual RSS feeds, we
perform a Kolmogorov-Smirnov goodness-of-ttest[30]on
each of the 11 RSS feeds listed in Table I. For each RSS feed,
we collect the inter-contact times smaller than its boundary
and test whether the null hypothesis “these inter-contact times
are exponentially distributed” can be accepted. A similar test
is performed o n the inter-contact times with larger values for
the generalized Pareto distribution.
The signicance levels (𝛼) for these null hypotheses being
accepted are listed in Table II. The lower the signicance
level is, the more condent we are that the corresponding
hypothesis is statistically true. As shown in Table II, for all
the RSS feeds, the probability fo r erroneou sly accepting the
null hypotheses is lower than 10%, which is the signicance
level usually being used for statistical hypothesis testing [8].
Particularly, the signicance levels for accepting a generalized
Pareto distribution are generally better than those for accepting
an exponential distribution.
C. Temporal Variations
We are also interested in the temporal variations of the
RSS feeds’ updating patterns. Figure 5 shows the temporal
distribution of news updates from RSS feeds over d ifferent
Fig. 6. Standard deviation of the numbers of news updates during different
hours in a day
hours in a day. We observe that the characteristics of such
temporal variation are heterogeneous with different RSS feeds.
For example, the majority of news updates from NYTimes and
ESPN are generated during the time period from the afternoon
to the evening. Comparatively, the news updates from Google
News are evenly distributed among different hours in a day.
To better quantify the skewness of such temporal variation,
we calculate the standard deviation of the numbers of news
updates during different hours in a day for each of the 11 RSS
feeds listed in Table I, and the calculation results are shown in
Figure 6. By comparing Figure 6 with Figure 5, we conclude
that the temporal distributions of news updates from most RSS
feeds are highly skewed. The transient distribution of inter-
refresh ing time of a RSS feed during specic time periods
hence may differ a lot from its cumulative distribution. Such
temporal variation may affect the perfor mance of maintaining
cache freshness, and will be evaluated in detail via trace-driven
simulations in Section VII.
V. I
NTENTIONAL REFRESHING
In this section, we explain how to ensure that data updates
are delivered to the caching nodes on time, so that the
freshness requirements of cached data are satised. Based on
investigation results on the distribution of inter-refreshing time
in Section IV, we calculate the utility of each update which
estimates the chance for the requirement being satised by this
update. Such u tility is then used for opportunistic replication
of data updates.

Citations
More filters
Journal ArticleDOI
TL;DR: A survey of this emerging field of Mobile Social Networks from a temporal perspective is presented with focus on four aspects: social property, time-varying graph, temporal socialproperty, and temporal social properties-based applications.
Abstract: With the popularity of smart mobile devices, information exchange between users has become more and more frequent, and Mobile Social Networks (MSNs) have attracted significant attention in many research areas. Nowadays, discovering social relationships among people, as well as detecting the evolution of community have become hotly discussed topics in MSNs. One of the major features of MSNs is that the network topology changes over time. Therefore, it is not accurate to depict the social relationships of people based on a static network. In this paper, we present a survey of this emerging field from a temporal perspective. The state-of-the-art research of MSNs is reviewed with focus on four aspects: social property, time-varying graph, temporal social property, and temporal social properties-based applications. Some important open issues with respect to MSNs are discussed.

2 citations


Cites methods from "Distributed Maintenance of Cache Fr..."

  • ...Method Methods for Comparison Social Properties PodNet [73] No caching, Most solicited, Least solicited, Uniform, Inverse proportional Community Cooperative [77] Selfish Community, Friendship ContentPlace [74] [75] [76] MFV, MLN, F, P, US Community, Friendship Mixcommunity [78] Withincommunity Community, Similarity SocialCast [79] No Prediction Community, Friendship MOPS [80] Push, Pull, Neighbors Community, Closeness centrality Habit [81] Epidemic, Wait-For-Destination, Oracle Community, Friendship MF-RRWP [82] MF-ORWP, SODA, GADA Community Passive Refreshing [83] Active Refreshing, Publish/Subscribe, Hierarchical Rereshing Community...

    [...]

Posted Content
TL;DR: The trade-off between storing the files at the cache and directly obtaining the files from the source at the expense of additional transmission times is studied to maximize the overall freshness at the user.
Abstract: We consider a cache updating system with a source, a cache with limited storage capacity and a user. There are $n$ files. The source keeps the freshest versions of the files which are updated with known rates. The cache gets fresh files from the source, but it can only store the latest downloaded versions of $K$ files where $K\leq n$. The user gets the files either from the cache or from the source. If the user gets the files from the cache, the received files might be outdated depending on the file status at the source. If the user gets the files directly from the source, then the received files are always fresh, but the extra transmission times between the source and the user decreases the freshness at the user. Thus, we study the trade-off between storing the files at the cache and directly obtaining the files from the source at the expense of additional transmission times. We find analytical expressions for the average freshness of the files at the user for both of these scenarios. Then, we find the optimal caching status for each file (i.e., whether to store the file at the cache or not) and the corresponding file update rates at the cache to maximize the overall freshness at the user. We observe that when the total update rate of the cache is high, caching files improves the freshness at the user. However, when the total update rate of the cache is low, the optimal policy for the user is to obtain the frequently changing files and the files that have relatively small transmission times directly from the source.

2 citations

Kang Chen1
01 Jan 2014
TL;DR: This dissertation proposes two methods to enhance file sharing efficiency in MONs by creating replicas and by leveraging social network properties, respectively and introduces a new concept of resource for file replication, which considers both node storage and meeting frequency with other nodes.
Abstract: With the increasing popularity of portable digital devices (e.g., smartphones, laptops, and tablets), mobile opportunistic networks (MONs) [40,90] consisting of portable devices have attracted much attention recently. MONs are also known as pocket switched networks (PSNs) [52]. MONs can be regarded as a special form of mobile ad hoc networks (MANETs) [7] or delay tolerant networks (DTNs) [35, 56]. In such networks, mobile nodes (devices) move continuously and meet opportunistically. Two mobile nodes can communicate with each other only when they are within the communication range of each other in a peer-to-peer (P2P) manner (i.e., without the need of infrastructures). Therefore, such a network structure can potentially provide file sharing or packet routing services among portable devices without the support of network infrastructures. On the other hand, mobile opportunistic networks often experience frequent network partition, and no endto-end contemporaneous path can be ensured in the network. These distinctive properties make traditional file sharing or packet routing algorithms in Internet or mobile networks a formidable challenge in MONs. In summary, it is essential and important to achieve efficient file sharing and packet routing algorithms in MONs, which are the key for providing practical and novel services and applications over such networks. In this dissertation, we develop several methods to resolve the aforementioned challenges. Firstly, we propose two methods to enhance file sharing efficiency in MONs by creating replicas and by leveraging social network properties, respectively. In the first method, we investigate how to create file replicas to optimize file availability for file sharing in MONs. We introduce a new concept of resource for file replication, which considers both node storage and meeting frequency with other nodes. We theoretically study the influence of resource allocation on the average file access delay and derive a resource allocation rule to minimize the average file access delay. We also propose a distributed file replication protocol to realize the deduced optimal file replication rule. In

1 citations


Cites background from "Distributed Maintenance of Cache Fr..."

  • ...The incredibly rapid growth of mobile users is leading to a promising future, in which they can form a mobile opportunistic network (MON) [40,90], which is also known as pocket switched network (PSNs) [52], to freely share files or forward packets between each other without the support of cellular infrastructures....

    [...]

  • ..., smartphones, laptops, and tablets), mobile opportunistic networks (MONs) [40,90] consisting of portable devices have attracted much attention recently....

    [...]

01 Jan 2016
TL;DR: The typical content dissemination scenarios in MSNs are investigated, and a Bayesian framework is formulated to model the factors that influence users behavior on streaming video dissemination, and an effective dissemination path detection algorithm is derived to detect the reliable and efficient video transmission paths.
Abstract: Mobile social networking(MSN) has emerged as an effective platform for social network users to pervasively disseminate the contents such as news, tips, book information, music, video and so on. In content dissemination, mobile social network users receive content or information from their friends, acquaintances or neighbors, and selectively forward the content or information to others. The content generators and receivers have different motivation and requirements to disseminate the contents according to the properties of the contents, which makes it a challenging and meaningful problem to effectively disseminate the content to the appropriate users. In this dissertation, the typical content dissemination scenarios in MSNs are investigated. According to the content properties, the corresponding user requirements are analyzed. First, a Bayesian framework is formulated to model the factors that influence users behavior on streaming video dissemination. An effective dissemination path detection algorithm is derived to detect the reliable and efficient video transmission paths. Second, the authorized content is investigated. We analyze the characteristics of the authorized content, and model the dissemination problem as a new graph problem, namely, Maximum Weighted Connected subgraph with node Quota (MWCQ), and propose two effective algorithms to solve it. Third, the authorized content dissemination problem in Opportunistic Social Networks(OSNs) is studied, based on the prediction of social connection pattern. We then analyze the influence of social connections on the content acquirement, and propose a novel approach, User Set Selection(USS) algorithm, to help social users to achieve fast and accurate content acquirement through social connections. INDEX WORDS: Content Dissemination, Mobile Social Networks, Opportunistic Social Networks CONTENT DISSEMINATION IN MOBILE SOCIAL NETWORKS

1 citations


Cites background from "Distributed Maintenance of Cache Fr..."

  • ...al.[40] propose to efficiently maintain the cache freshness by organizing the caching users as a tree structure during content access....

    [...]

Posted ContentDOI
15 Jun 2021
TL;DR: A Hybrid and Adaptive Caching (HAC) approach to cache the data item based on the varying size, and, Time-to-Live (TTL) based invalidation of the data Item in a mobile computing environment and an adaptive cache replacement and cache invalidation technique are introduced.
Abstract: Caching is a well established technique to improve the efficiency of data access. This research paper introduces a Hybrid and Adaptive Caching (HAC) approach to cache the data item based on the varying size, and, Time-to-Live (TTL) based invalidation of the data item in a mobile computing environment. Mobile nodes establish single-hop communication with the base station and ad-hoc peer to peer communication with other neighbor nodes in the network to access data items. The proposed work adjusts the caching functionality level based on the size of the data item and stores the cached data item in two different storage systems. The cache of each node is separated into Temporary Buffer (TB) and Permanent Buffer (PB) to improve the data access efficiency. This approach is based on the fact that the smaller size data (e.g. stocks) are updated for shorter Time-to-Live (TTL) whereas the larger size data (e.g. video) are updated only for longer TTL. This proposed work also suggests an adaptive cache replacement and cache invalidation technique to resolve the issues regarding bandwidth utilization and data availability. In cache replacement technique, the cached data item is effectively replaced based on the size of the data item and TTL value. A timestamp-based cache invalidation strategy where the cached data is validated according to the update history of the data items has also been introduced in this paper. The threshold values have greater impact on the system performance. Therefore, the threshold values are fine tuned such that they do not affect the system performance. The proposed approaches significantly improve the query latency, cache hit ratio and efficiently utilize the broadcast bandwidth. The simulation result proves that the proposed work outperforms the existing caching techniques.

Cites background from "Distributed Maintenance of Cache Fr..."

  • ...A special approach is suggested to maintain the freshness of the cache [26]....

    [...]

  • ...[26] Wei Gao, Guohong Cao, Mudhakar Srivatsa, Arun Iyengar....

    [...]

References
More filters
Book
24 Apr 1990

6,235 citations

Journal ArticleDOI
TL;DR: There is a comprehensive introduction to the applied models of probability that stresses intuition, and both professionals, researchers, and the interested reader will agree that this is the most solid and widely used book for probability theory.
Abstract: The Seventh Edition of the successful Introduction to Probability Models introduces elementary probability theory and stochastic processes. This book is particularly well-suited to those applying probability theory to the study of phenomena in engineering, management science, the physical and social sciences, and operations research. Skillfully organized, Introduction to Probability Models covers all essential topics. Sheldon Ross, a talented and prolific textbook author, distinguishes this book by his effort to develop in students an intuitive, and therefore lasting, grasp of probability theory. Ross' classic and best-selling text has been carefully and substantially revised. The Seventh Edition includes many new examples and exercises, with the majority of the new exercises being of the easier type. Also, the book introduces stochastic processes, stressing applications, in an easily understood manner. There is a comprehensive introduction to the applied models of probability that stresses intuition. Both professionals, researchers, and the interested reader will agree that this is the most solid and widely used book for probability theory. Features: * Provides a detailed coverage of the Markov Chain Monte Carlo methods and Markov Chain covertimes * Gives a thorough presentation of k-record values and the surprising Ignatov's * theorem * Includes examples relating to: "Random walks to circles," "The matching rounds problem," "The best prize problem," and many more * Contains a comprehensive appendix with the answers to approximately 100 exercises from throughout the text * Accompanied by a complete instructor's solutions manual with step-by-step solutions to all exercises New to this edition: * Includes many new and easier examples and exercises * Offers new material on utilizing probabilistic method in combinatorial optimization problems * Includes new material on suspended animation reliability models * Contains new material on random algorithms and cycles of random permutations

4,945 citations

Amin Vahdat1
01 Jan 2000
TL;DR: This work introduces Epidemic Routing, where random pair-wise exchanges of messages among mobile hosts ensure eventual message delivery and achieves eventual delivery of 100% of messages with reasonable aggregate resource consumption in a number of interesting scenarios.
Abstract: Mobile ad hoc routing protocols allow nodes with wireless adaptors to communicate with one another without any pre-existing network infrastructure. Existing ad hoc routing protocols, while robust to rapidly changing network topology, assume the presence of a connected path from source to destination. Given power limitations, the advent of short-range wireless networks, and the wide physical conditions over which ad hoc networks must be deployed, in some scenarios it is likely that this assumption is invalid. In this work, we develop techniques to deliver messages in the case where there is never a connected path from source to destination or when a network partition exists at the time a message is originated. To this end, we introduce Epidemic Routing, where random pair-wise exchanges of messages among mobile hosts ensure eventual message delivery. The goals of Epidemic Routing are to: i) maximize message delivery rate, ii) minimize message latency, and iii) minimize the total resources consumed in message delivery. Through an implementation in the Monarch simulator, we show that Epidemic Routing achieves eventual delivery of 100% of messages with reasonable aggregate resource consumption in a number of interesting scenarios.

4,355 citations


"Distributed Maintenance of Cache Fr..." refers background in this paper

  • ...Since the data source does not maintain any information regarding the caching nodes, such dissemination is generally realized via Epidemic routing [31]....

    [...]

Book
01 Nov 1990
TL;DR: This paper focuses on the part of the knapsack problem where the problem of bin packing is concerned and investigates the role of computer codes in the solution of this problem.
Abstract: Introduction knapsack problem bounded knapsack problem subset-sum problem change-making problem multiple knapsack problem generalized assignment problem bin packing problem. Appendix: computer codes.

3,694 citations


"Distributed Maintenance of Cache Fr..." refers methods in this paper

  • ...Such prioritization is generally formulated as a knapsack problem [17] and can be solved in pseudopolynomial time using a dynamic programming approach [25]....

    [...]

Proceedings ArticleDOI
21 Mar 1999
TL;DR: This paper investigates the page request distribution seen by Web proxy caches using traces from a variety of sources and considers a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesse observed by proxies.
Abstract: This paper addresses two unresolved issues about Web caching. The first issue is whether Web requests from a fixed user community are distributed according to Zipf's (1929) law. The second issue relates to a number of studies on the characteristics of Web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces. In particular, the question is whether these properties are inherent to Web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facilitate both Web cache resource planning and cache hierarchy design. We show that the answers to the two questions are related. We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is only (i) a weak correlation between the access frequency of a Web page and its size and (ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviour that are consistent with the experimental observations, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.

3,582 citations


"Distributed Maintenance of Cache Fr..." refers methods in this paper

  • ...We assume that the query pattern follows a Zipf distribution which has been widely used for modelling web data access [3]....

    [...]

Frequently Asked Questions (7)
Q1. What are the contributions mentioned in the paper "Distributed maintenance of cache freshness in opportunistic mobile networks" ?

In this paper, the authors propose a scheme to efficiently maintain cache freshness. Extensive tracedriven simulations show that their scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users. 

Due to the intermittent network connectivity in opportunistic mobile networks, data is forwarded in a “carry-and-forward” manner. 

Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness. 

Their results show that up to a boundary on the order of several minutes, the decay of the CCDF is well approximated as exponential. 

since different values of 𝑝 do not affect the calculation of utilities of data updates, such increase of refreshing overhead is relatively smaller than that of decreasing Δ.Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day. 

The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces. 

From Figure 12 the authors observe that, when the value of Δ is small, the cache freshness is mainly constrained by the network contact capability, and the actual refreshing delay is much higher than the required Δ. Such inability to satisfy the cache freshness requirements leads to more replications of data updates as described in Section V-B, and makes caching nodes more prone to perform opportunistic refreshing. 

Trending Questions (1)
How to clear browser cache in Robot Framework?

In this paper, we propose a scheme to efficiently maintain cache freshness.