Proceedings Article•DOI•

Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks

Wei Gao¹, Guohong Cao¹, Mudhakar Srivatsa², Arun Iyengar²•Institutions (2)

18 Jun 2012-pp 132-141

TL;DR: The basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.

read less

Abstract: Opportunistic mobile networks consist of personal mobile devices which are intermittently connected with each other. Data access can be provided to these devices via cooperative caching without support from the cellular network infrastructure, but only limited research has been done on maintaining the freshness of cached data which may be refreshed periodically and is subject to expiration. In this paper, we propose a scheme to efficiently maintain cache freshness. Our basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner. Probabilistic replication methods are also proposed to analytically ensure that the freshness requirements of cached data are satisfied. Extensive trace driven simulations show that our scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users.

...read moreread less

Summary (5 min read)

Jump to: [Introduction] – [II. RELATED WORK] – [A. Models] – [B. Caching Scenario] – [C. Basic Idea] – [IV. REFRESHING PATTERNS OF WEB CONTENTS] – [B. Distribution of Inter-Refreshing Time] – [C. Temporal Variations] – [A. Utility of Data Updates] – [B. Opportunistic Replication of Data Updates] – [VI. OPPORTUNISTIC REFRESHING] – [A. Probabilistic Decision] – [B. Side-Effect of Opportunistic Refreshing] – [VII. PERFORMANCE EVALUATIONS] – [A. Simulation Setup] – [B. Performance of Maintaining Cache Freshness] and [VIII. CONCLUSION]

Introduction

In recent years, personal hand-held mobile devices such as smartphones are capable of storing, processing and displaying various types of digital media contents including news, music, pictures or video clips.
In these networks, it is generally difficult to maintain end-toend communication links among mobile users.
There is only limited research on maintaining the freshness of cached data in the network, despite the fact that media contents may be refreshed periodically.
The authors basic idea is to organize the caching nodes1 as a tree structure during data access, and let each caching node be responsible for refreshing the data cached at its children in a distributed and hierarchical manner.
The rest of this paper is organized as follows.

A. Models

Opportunistic contacts among nodes are described by a network contact graph 𝐺(𝑉,𝐸), where the contact process between a node pair 𝑖, 𝑗 ∈ 𝑉 is modeled as an edge 𝑒𝑖𝑗 ∈ 𝐸. Similar to previous work [1], [34], the authors consider the pairwise node inter-contact time as exponentially distributed, also known as 1) Network Model.
There are cases where an application might have specific requirements on Δ and 𝑝 to achieve sufficient levels of data freshness.
Letting 𝑢𝑖𝑗 denote the update of data from version 𝑖 to version 𝑗, the authors assume that any caching node is able to refresh the cached data as 𝑑𝑖⊗𝑢𝑖𝑗 → 𝑑𝑗 , where 𝑑𝑖 and 𝑑𝑗 denote the data with version 𝑖 and 𝑗, respectively.
𝑑𝑗 cannot be refreshed to 𝑑𝑘 by 𝑢𝑖𝑘 even if 𝑗 > 𝑖.

B. Caching Scenario

Mobile nodes share data generated by themselves or obtained from the Internet.
Each cached data item is associated with a finite lifetime and is automatically removed from cache when it expires.
In practice, when multiple data items with varied popularity compete for the limited buffer of caching nodes, more popular data is prioritized to ensure that the cumulative data access delay is minimized.
After having its query satisfied by 𝑆, 𝐴 may lose its connection with 𝑆 due to mobility, and hence 𝐴 is unaware of the data cached at nodes 𝐵, 𝐷 and 𝐸.

C. Basic Idea

The authors basic idea for maintaining cache freshness is to refresh the cached data in a distributed and hierarchical manner.
Particularly, the topology of DAT may change due to the expiration of cached data.
When node 𝐴 contacts node 𝐷 at time 𝑡6, 𝐴 updates the data cached at 𝐷 from 𝑑1 to 𝑑3.
Instead, 𝐴 has to transmit the complete data 𝑑3 to 𝐷 with 2The update 𝑢13 can only be calculated using 𝑑1 and 𝑑3.

IV. REFRESHING PATTERNS OF WEB CONTENTS

The authors investigate the refreshing patterns of realistic web contents, as well as their temporal variations during different time periods in a day.
These patterns highlight the homogeneity of data refreshing behaviors among different data sources and categories, and suggest appropriate calculation of utilities of data updates for refreshing cached data.

B. Distribution of Inter-Refreshing Time

The authors provide both empirical and analytical evidence of a dichotomy in the Complementary Cumulative Distribution Function (CCDF) of the inter-refreshing time, which is defined Fig. 1) Aggregate distribution: Figure 4 shows the aggregate CCDF of inter-refreshing time for all the RSS feeds, in loglog scale.
For the remaining 10% of inter-refreshing time with values larger than the boundary, the CCDF values exhibit linear decay which suggests a power-law tail.
A similar test is performed on the inter-contact times with larger values for the generalized Pareto distribution.
The significance levels (𝛼) for these null hypotheses being accepted are listed in Table II.

C. Temporal Variations

Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day.
The authors evaluate such temporal variation on the DieselNet trace.
In general, the temporal skewness can be found in all three evaluation metrics, and is determined by the temporal distributions of both node contacts and data updates available during different hours in a day.
As shown in Figure 14(a), the refreshing ratio during the time period between 8AM and 4PM is generally higher than the average refreshing ratio, because majority of node contacts have been generated during this time period according to [15].
In summary, the authors conclude that the transient performance of maintaining cache freshness differs a lot from the cumulative maintenance performance, and cache freshness can be further improved by appropriately exploiting the temporal variations of data refreshing pattern and node contact process.

A. Utility of Data Updates

In practice, the requirement of cache freshness may not be satisfied due to the limited nodes’ contact capability.
When a node 𝐵 in the DAT maintains the data update for its child 𝐷, it calculates the utility of this update which is equal to the probability that this update carried by 𝐵 satisfies the freshness requirement for data cached at𝐷.
According to Eq. (3), the utility should be calculated following Eq. (4) when the value of 𝑡−𝑡0−Δ is small.

B. Opportunistic Replication of Data Updates

If a node in the DAT finds out that the utility of the data update it carries is lower than the required probability 𝑝 for maintaining cache freshness, it opportunistically replicates the data update to other nodes outside of the DAT.
Such a replication process is illustrated in Figure 8.
𝑅𝑘 outside of the DAT, it determines whether to replicate the data update for refreshing 𝐵 to 𝑅𝑘.
The replication when the utilities of data update at the 𝑘 selected relays satisfy 1− 𝑘∏ 𝑖=0 (1− 𝑈𝑅𝑖) ≥ 𝑝, (7) i.e., the probability that the requirement of cache freshness at 𝐵 is satisfied by at least one relay is equal to or larger than 𝑝.
Note that the selected relays are only able to refresh the specific data cached in the DAT, but are unable to provide data access to other nodes outside of the DAT.

VI. OPPORTUNISTIC REFRESHING

In addition to intentionally refreshing data cached at its children in the DAT, a node also refreshes other cached data with older versions whenever possible upon opportunistic contacts.
The authors propose a probabilistic approach to efficiently make cache refreshing decisions and optimize the tradeoff between cache freshness and network transmission overhead.

A. Probabilistic Decision

Opportunistic refreshing is generally more expensive because the complete data usually needs to be transmitted, and its size is much larger than that of data update.
As a result, it is important to make appropriate decisions on opportunistic refreshing, so as to optimize the tradeoff between cache freshness and network transmission overhead, and to avoid inefficient consumption of network resources.
The authors propose a probabilistic approach to efficiently refresh the cache data, and the data is only refreshed if its required freshness cannot be satisfied by intentional refreshing.
Hence, 𝑈𝐵𝐷(𝑡𝐶) can be calculated by 𝐷 and is available to 𝐴 when 𝐴 contacts 𝐷. Since additional relays may be used for delivering data updates in intentional refreshing as described in Section V-B, the utility 𝑈𝐵𝐷(𝑡𝐶) calculated by 𝐷 essentially provides a lower bound on the actual effectiveness of intentional refreshing.

B. Side-Effect of Opportunistic Refreshing

Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness.
Such side-effect is illustrated in Figure 9.
When 𝐴 opportunistically contacts node 𝐷 and refreshes 𝐷’s cached data from 𝑑1 to 𝑑3, it is unaware of the data cached at 𝐵 with a newer version 𝑑4.

VII. PERFORMANCE EVALUATIONS

The authors compare the performance of their proposed cache refreshing scheme with the following schemes: ∙ Passive Refreshing: a caching node only refreshes data cached at another node upon contact.
It is different from their opportunistic refreshing scheme in Section VI in that it does not consider the tradeoff between cache freshness and network transmission overhead.
Every time when the source updates data, it actively disseminates the date update to the whole network, also known as ∙ Active Refreshing.
The following metrics are used for evaluations.
Each simulation is repeated multiple times with random data sources and user queries for statistical convergence.

A. Simulation Setup

The authors evaluations are conducted on two realistic opportunistic mobile network traces, which record contacts among users carrying Bluetooth-enabled mobile devices.
These devices periodically detect their peers nearby, and a contact is recorded when two devices move close to each other.
The datasets described in Section IV are exploited to simulate the data being cached in the network, as well as the interrefreshing time of data.
Since the pairwise node contact frequency is generally lower than the data refreshing frequency, the authors pick up the 4 RSS feeds listed in Table I with average interrefreshing time longer than 0.5 hours for their evaluations.
Every time 𝑇 , each node determines whether to request data 𝑗 with probability 𝑃𝑗 .

B. Performance of Maintaining Cache Freshness

The authors first compare the performance of their proposed hierarchical refreshing scheme with other schemes by varying the lifetime (𝐿) of the cached data.
The evaluation results are shown in Figure 11.
Active Refreshing outperforms their scheme by 10%-15%, but Figure 11(c) shows that such performance is achieved at the cost of much higher refreshing overhead.
The parameter values are set by default as Δ = 1.5 hours and 𝑝 = 60%, and are varied during different simulations.
As described in Section V-B, increasing 𝑝 stimulates the caching nodes to replicate data updates, and hence increases the refreshing overhead as shown in Figure 13(b).

VIII. CONCLUSION

The authors focus on maintaining the freshness of cached data in opportunistic mobile networks.
The authors basic idea is to let each caching node be only responsible for refreshing a specific set of caching nodes, so as to maintain cache freshness in a distributed and hierarchical manner.
Based on the experimental investigation results on the refreshing patterns of real websites, the authors probabilistically replicate data updates, and analytically ensure that the freshness requirements of cached data are satisfied.
The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces.

Did you find this useful? Give us your feedback

Figures (15)

Fig. 7. Calculating the utility of data updates

Fig. 1. Data Access Tree (DAT). Each node in the DAT accesses data when it contacts its parent node in the DAT.

Fig. 8. Opportunistic replication of data updates

Fig. 9. Side-effect of opportunistic refreshing

Fig. 14. Temporal variations of the performance of maintaining cache freshness

Fig. 2. Distributed and hierarchical maintenance of cache freshness

Fig. 11. Performance of maintaining cache freshness with varied lifetime of cached data

Fig. 12. Performance of maintaining cache freshness with different values of parameter Δ and 𝑝 = 60%

Fig. 13. Performance of maintaining cache freshness with different values of parameter 𝑝 and Δ = 1.5 hours

TABLE II NUMERICAL RESULTS FOR DISTRIBUTIONS OF INTER-REFRESHING TIME OF INDIVIDUAL RSS FEEDS

Fig. 5. Temporal distribution of news updates during different hours in a day

Fig. 6. Standard deviation of the numbers of news updates during different hours in a day

Content maybe subject to copyright Report

Distributed Maintenance of Cache Freshness in

Opportunistic Mobile Networks

Wei Gao and Guohong Cao

Department of Computer Science and Engineering

The Pennsylvania State University

University Park, PA 16802

{weigao,gcao}@cse.psu.edu

Mudhakar Srivatsa and Arun Iyengar

IBM T. J. Watson Research Center

Hawthorne, NY 10532

{msrivats, aruni}@us.ibm.com

Abstract—Opportunistic mobile networks consist of personal

mobile devices which are intermittently connected with each

other. Data access can be provided to these devices via cooperative

caching without support from the cellular network infrastructure,

but only limited research has been done on maintaining the

freshness of cached data which may be refreshed periodically

and is subject to expiration. In this paper, we propose a scheme

to efﬁciently maintain cache freshness. Our basic idea is to let

each caching node be only responsible for refreshing a speciﬁc

set of caching nodes, so as to maintain cache freshness in a

distributed and hierarchical manner. Probabilistic replication

methods are also proposed to analytically ensure that the fresh-

ness requirements of cached data are satisﬁed. Extensive trace-

driven simulations show that our scheme signiﬁcantly improves

cache freshness, and hence ensures the validity of data access

provided to mobile users.

I. INT RODUCTION

In recent years, personal hand-held mobile devices such

as smartphones are capable of storing, processing and dis-

playing various types of digital media contents including

news, music, pictures or video clips. It is hence important

to provide efﬁcient data access to m obile users with such

devices. Opportunistic mobile networks, which are also known

as Delay Tolerant Networks (DTNs) [13] or Pocket Switched

Networks (PSNs) [20], are exploited for providing such data

access without support of cellular network infrastructure. In

these networks, it is generally difﬁcult to maintain end-to-

end communication links among mobile users. Mobile users

are only intermittently connected when they opportunistically

contact, i.e., moving into the communication range of the

short-range radio (e.g., Bluetooth, WiFi) of their smartphones.

Data access can be provided to mobile users via cooperative

caching. More speciﬁcally, data is cached at mobile devices

based on the query history, so that queries for the data in

the future can be satisﬁed with less delay. Currently, research

efforts have been focusing on determining the appropriate

caching locations [27], [19], [17] or the optimal caching

policies for minimizing the data access d elay [28], [22].

However, there is only limited research on ma intaining the

freshness of cached data in the network, despite the fact that

media contents may be r efreshed periodically. I n practice, the

This work was supported in part by the US National Science Foundation

(NSF) under grant number CNS-0721479, and by Network Science CTA under

grant W911NF-09-2-0053.

refreshing frequency varies according to the speciﬁc content

characteristics. For example, the local weather report is usually

refreshed daily, but the media news at websites of CNN or

New York Times may be refreshed hourly. In such cases, the

versions of cached data in the network may be out-of-date, or

even be completely useless due to expiration.

The maintenance of cache freshness in opportunistic mo-

bile networks is challenging due to the intermittent network

connectivity and subsequent lack of information about cached

data. First, there may be multiple data copies being cached in

the network, so as to ensure timely response to user queries.

Without persistent network connectivity, it is generally difﬁcult

for the data source to obtain information about the caching

locations or current versions of the cached data. It is therefore

challenging for the data source to determine “where to” and

“how to” refresh the cached data. Second, the opportunistic

network conn ectivity in creases the uncertainty of data trans-

mission and complicates the estimation of data transmission

delay. It is therefore difﬁcult to determine whether the cached

data can be refreshed on time.

In this paper, we propose a scheme to address these chal-

lenges and to efﬁciently maintain freshness of the cached data.

Our basic idea is to organize the caching nodes

as a tree

structure during data access, and let each caching node be

responsible for refreshing the data cached at its children in

a distributed and h ierarchical manner. The cache freshness

is also improved when the caching nodes opportunistically

contact each other. To the best of our knowledge, our work

is the ﬁrst which speciﬁcally focuses on cache freshness in

opportunistic mobile networks.

Our detailed contributions are as follows:

∙ We investigate the refreshing patterns of realistic web

contents. We observe that the distributions of inter-

refreshing time of the RSS feeds from major news

websites exhibit hybrid characteristics of exponential and

power-law, which have been validated by both empirical

and analytical evidences.

∙ Based on the experimental investigation results, we ana-

lytically measure the utility of data updates for refreshing

the cached data via opportunistic node contacts. These

In the rest of this paper, the terms “devices” and “nodes” are used

interchangeably.

utilities are calculated based on a probabilistic model to

measure cache freshness. They are then used to oppor-

tunistically replicate data updates and analytically ensure

that the freshness requirements of cached data can be

satisﬁed.

The rest of this p aper is organized as follows. In Section

II we brieﬂy review the existing work. Section III provides

an overview about the models an d caching scenario we use,

and also highlights our basic idea. Section IV presents our

experimen tal investigation results on the refreshing patterns

of real web sites. Sections V and VI describe the d etails of

our proposed cache refreshing schemes. The results of trace-

driven performance evaluations are shown in Section VII, and

Section VIII concludes the paper.

II. R

ELATED WORK

Due to the intermittent network connectivity in opportunistic

mobile networks, data is forwarded in a “carry-and-forward”

manner. Node mobility is exploited to let nodes physically

carry data as relays, and forward data opportunistically when

contacting others. The key problem is hence how to select the

most appropriate nodes as relays, based o n the prediction o f

node contacts in the future. Some forwarding schemes do such

prediction based on node mobility patterns [9], [33], [14]. In

some other schemes [4], [1], stochastic node contact process is

exploited for better prediction accuracy. Social contact patterns

of mobile users, such as centr ality and community structures,

have also been exploited for relay selection [10], [21], [18].

Based on this opportunistic communication paradigm, data

access can be provided to mobile users in various ways. In

some schemes [23], [16], data is actively disseminated to

speciﬁc users based on their interest proﬁles. Publish/subscribe

systems [32], [24] are also used for data dissemination by ex-

ploiting social commun ity structures to determine the brokers.

Caching is another way to provide data access. Determining

appropriate caching policies in opportunistic mobile networks

is complicated by the lack of global network information.

Some research efforts focus on improving data accessibility

from infrastructure networks such as WiFi [19] or Internet

[27], and some others study peer-to-peer data sharing among

mobile nodes. In [17], data is cached at speciﬁc nodes which

can be easily accessed by others. In [28], [22], caching policies

are dynamically determined based on data importance, so that

the aggregate utility of mobile nodes can be maximized.

When the versions of cached data in the network are het-

erogeneous and different from that of the source data, research

efforts have been focusing on maintaining the consistency of

these cache versions [7], [11], [5], [6]. Being different from

existing work, in this paper we focus on ensuring the freshness

of cached data, i.e., the version of any cached data should be

as close to that of the source data as possible. [22] discussed

the practical scenario in which data is periodically refreshed,

but did not provided speciﬁc solutions for maintaining cache

freshness. We propose methods to maintain cache freshness in

a distributed and hierarchical mann e r, and analytically ensure

that the freshness requirement of cached data can be satisﬁed.

Fig. 1. Data Access T ree (DAT). Each node in the DAT accesses data when

it contacts its parent node in the DAT.

III. OVERVIEW

A. Models

1) Network Model: Opportunistic contacts among nodes

are described by a network contact graph 𝐺(𝑉,𝐸),wherethe

contact process between a node pair 𝑖, 𝑗 ∈ 𝑉 is modeled as

an edge 𝑒

𝑖𝑗

∈ 𝐸. The characteristics of an edge 𝑒

𝑖𝑗

∈ 𝐸

are determined by the properties of inter-contact time among

nodes. Similar to previous work [1], [34], we consider the

pairwise node inter-contact time as exponentially distributed.

Contacts between nodes 𝑖 and 𝑗 then form a Poisson process

with con tact rate 𝜆

𝑖𝑗

, which is calculated in real time from the

cumulative contacts b etween nodes 𝑖 and 𝑗.

2) Cache Freshness Model: We focus on ensuring the

freshness of cached data, i.e., the version of any cached data

should be as close to that of the source data as possible. Letting

𝑣

𝑡

𝑆

denote the version number of source data at time 𝑡 and 𝑣

𝑡

𝑗

denote that of data cached at node 𝑗, our requirement on cache

freshness is probabilistically described as

ℙ(𝑣

𝑡

𝑗

≥ 𝑣

𝑡−Δ

𝑆

) ≥ 𝑝, (1)

for any time 𝑡 and any node 𝑗. The version number is

initialized as 0 when data is ﬁrst generated and monotonically

increased by 1 every time the data is refr eshed.

Higher network storage and transmission overhead is gen-

erally required for decreasing Δ or increasing 𝑝. Hence, our

proposed model provides the ﬂexib ility to tradeoff between

cache freshness and network maintenance overhead according

to the speciﬁc data characteristics and applications. For exam-

ple, news from CNN or the New York Times may be refreshed

frequently, and smaller Δ (e.g., 1 hour) should be applied

accordingly. In contrast, the local weather report may be

updated daily, and the requirement on Δ can hence be relaxed

to avoid unnecessary network cost. The value of 𝑝 may be

ﬂexible based on user interests in the d ata. However, th ere are

cases where an application might have speciﬁc requirements

on Δ and 𝑝 to achieve sufﬁcient levels of data freshness.

3) Data Update Model: Whenever data is refreshed, the

data source computes the difference between the current and

previous versions and generates a data update. Cached data is

refreshed by such update instead of complete data for better

storage and transmission efﬁciency. This technique is called

Delta encoding, which has been applied in web caching for

reducing Internet trafﬁc[26].

(a) Intentional and opportunistic refreshing

3 3 3 1 1

5 3 3 3 3 1 3

1 1 1 1

2 2 2 1

4 3 3 3 1

6 4 3 4 3 1 3

(b) Temporal sequence of data access and refreshing operations

Fig. 2. Distributed and hierarchical maintenance of cache freshness

Letting 𝑢

𝑖𝑗

denote the update of data from version 𝑖 to

version 𝑗, we assume that any caching node is able to refresh

the cached data as 𝑑

𝑖

⊗ 𝑢

𝑖𝑗

→ 𝑑

𝑗

,where𝑑

𝑖

and 𝑑

𝑗

denote the

data with version 𝑖 and 𝑗, respectively. We also assume that

any node is able to compute 𝑢

𝑖𝑗

from 𝑑

𝑖

and 𝑑

𝑗

When d ata has been refreshed multiple times, various up-

dates for the same data may co-exist in the network. We

assume that any node is able to merge consecutive data

updates, i.e., 𝑢

𝑖𝑗

⊕𝑢

𝑗𝑘

→ 𝑢

𝑖𝑘

.However,𝑑

𝑗

cannot be refreshed

to 𝑑

𝑘

by 𝑢

𝑖𝑘

even if 𝑗>𝑖. For example, 𝑢

which is produced

by merging 𝑢

and 𝑢

cannot be used to refresh 𝑑

to 𝑑

B. Caching Scenario

Mobile nodes share data generated by themselves or ob-

tained from the Internet. In this paper, we consider a generic

caching scenario which is also used in [22]. The query

generated by a node is satisﬁed as soon as this node contacts

some other node caching the data. During the mean time,

the query is stored at the requesting node. After the query

is satisﬁed, the requesting node caches the data locally for

answering possible queries in the future. Each cached data

item is associated with a ﬁnite lifetime and is automatically

removed from cache when it expires. The data lifetime may

change each time the cached data is refreshed.

In practice, when multiple data items with varied popularity

compete for the limited buffer of caching nodes, more popular

data is prioritized to ensure that the cumulative data access

delay is minimized. Such prioritization is generally formulated

as a knapsack problem [17] and can be solved in pseudo-

polynomial time using a dynamic programming approach

[25]. Hence, the rest of this paper will focus on ensuring

the freshness of cached copies of a speciﬁcdataitem.The

consideration of multiple data items and limited node buffer

is orthogonal to the major focus of this paper.

In the above scenario, data is essentially disseminated

among nodes interested in the data when they contact each

other, and these nodes form a “Data Access Tree (DAT)” as

shown in Figure 1. Queries of nodes 𝐴 and 𝐵 are satisﬁed

when they contact the data source 𝑆. Data cached at 𝐴 and 𝐵

are then u sed for satisfying queries from nodes 𝐶, 𝐷 and 𝐸.

Due to intermittent network connectivity, each node in the

DAT only has knowledge about data cached at its children. For

example, after having its query satisﬁed by 𝑆, 𝐴 may lose its

connection with 𝑆 due to mobility, and hence 𝐴 is unaware of

the data cached at nodes 𝐵, 𝐷 and 𝐸. Similarly, 𝑆 may only

be aware o f data cached at nodes 𝐴 and 𝐵. Such limitation

makes it challenging to maintain cache freshness, because it

is difﬁcult for the data source to determine “where to” and

“how to” refresh the cached data.

C. Basic Idea

Our b asic idea for maintaining cache freshness is to refresh

the cached data in a distributed and hierarchical manner. As

illustrated in Figure 2, this refreshing process is split into

two parts, i.e., the intentional refreshing and the opportunistic

refreshing, according to whether the refreshing node has the

knowledge about the cached data to be refreshed.

In intentional refreshing, each node is only responsible for

refreshing data cached at its children in the DAT. For example,

in Figure 2(a) node 𝑆 is only responsible for refreshing data

cached at 𝐴 and

𝐵.Since𝐴 and 𝐵 obtain their cached

data from 𝑆, 𝑆 has knowledge about the versions of their

cached data and is able to prepare the appropriate data updates

accordingly. In the example shown in Figure 2(b), 𝑆 refreshes

data cached at 𝐴 and 𝐵 using updates 𝑢

and 𝑢

,when𝑆

contacts 𝐴 and 𝐵 at tim e 𝑡

and 𝑡

respectively. In Section

V, these updates are also opportunistically replicated to ensure

that they can be delivered to 𝐴 and 𝐵 on time. Particularly,

the topology of DAT may change due to the expiration of

cached data. When 𝐴 is removed from the DAT due to cache

expiration, its child 𝐶 only re-connects to the DAT and gets

updated when 𝐶 contacts another node in the DAT.

In opportunistic refreshing, a node refreshes any cached

data with older versions whenever possible upon opportunistic

contact. For example in Figure 2(a), when node 𝐴 contacts

node 𝐷 at time 𝑡

, 𝐴 updates the data cached at 𝐷 from

𝑑

to 𝑑

.Since𝐴 does not know the version of the data

cached at 𝐷, it cannot prepare 𝑢

for 𝐷 in advance

Instead, 𝐴 has to transmit the complete data 𝑑

to 𝐷 with

The update 𝑢

can only be calculated using 𝑑

and 𝑑

(a) CNN Top Stories (b) BBC Politics (c) NYTimes Sports (d) Business W eek Daily

Fig. 3. CCDF of inter-refreshing time of individual RSS feeds

Avg. inter-

No. RSS feed Number of refreshing time

updates (hours)

1 CNN Top Stories 2051 0.2159

2 NYTimes US 4545 0.0954

3 CNN Politics 623 0.7166

4 BBC Politics 827 0.5429

5 ESPN Sports 2379 0.1856

6 NYTimes Sports 3344 0.1355

7 Business Week Daily 4783 0.0948

8 Google News Business 7266 0.061

9 Weather.com NYC 555 0.8247

10 Google News ShowBiz 5483 0.0808

11 BBC ShowBiz 531 0.8506

TABL E I

EWS UPDATES RETRI EVED FROM WEB RSS F EEDS

higher transmission overhead. In Section VI, we propose to

probabilistically determine whether to transmit the complete

data according to the chance of satisfying the requirement of

cache freshness, so as to optimize the tradeoff between cache

freshness and network transmission overhead.

IV. R

EFRESHING PATTERNS OF WEB CONTENTS

In this section, we investigate the refreshing patterns of real-

istic web contents, as well as their temporal variations during

different time periods in a day. These patterns highlight the

homogeneity of data refreshing behaviors among different data

sources and categories, and suggest appropriate calculation of

utilities of data updates for refreshing cached data.

A. Datasets

We investigate the refreshing patterns o f categorized web

news. We dynamically retrieved news updates from news

websites including CNN, New York Times, BBC, Google

News, etc, by subscribing to their public RSS feeds. During the

3-week experiment period between 10/3/2011 and 10/21/2011,

we have retrieved a total number of 32787 RSS updates from

11 RSS feeds in 7 news categories. The information about

these RSS feeds and retrieved news updates is summarized in

Table I, which shows that the RSS feeds differ in their numbers

of updates and the update frequencies.

B. Distribution of Inter-Refreshing Time

We provide both empirical and analytical evidence of a

dichotomy in the Complementary Cumulative Distribution

Function (CCDF) of the inter-refreshing time, which is deﬁned

Fig. 4. Aggregate CCDF of the inter-refreshing time in log-log scale

as the time interval between two consecutive news updates

from the same RSS feed. Our results show that up to a

boundary on the order of several minutes, the decay of the

CCDF is well approximated as exponential. In contrast, the

decay exhibits power-law characteristics beyond this boundary.

1) Aggregate distribution: Figure 4 shows the aggregate

CCDF of inter-refreshing time for all the RSS feeds, in log-

log scale. The CCDF values exhibit slow decay over the range

spanning from a few seconds to 0.3047 hour. It suggests that

around 90% of inter-refreshing time falls into this range and

follows an exponential distribution. Figure 4 also shows that

the CCDF values of inter-refreshing time within this range is

accurately approximated by the random samples drawn from

an exponential distribution with the average inter-refreshing

time (0.1517 hours) as parameter.

For the remaining 10% of inter-refreshing time with values

larger than the boundary, the CCDF values exhibit linear decay

which suggests a power-law tail. To better examine such tail

characteristics, we also plot the CCDF of a generalized Pareto

distribution with the shape parameter 𝜉 =0.5, location param-

eter 𝜇 =0.1517 and scale parameter 𝜎 = 𝜇 ⋅ 𝜉 =0.0759.As

shown in Figure 4, the Pareto CCDF closely approximates that

of the inter-refreshing time beyond the boundary. Especially

when inter-refreshing time is longer than 1 hour, the two

curves almost overlap with each other.

2) Distributions of individual RSS feeds: Surprisingly,

we found that the distributions of inter-refreshing time of

individual RSS feeds exhibit similar characteristics with that

of the aggregate distribution. For example, for the two RSS

(a) NYTimes US (b) CNN Politics (c) ESPN Sports (d) Google News Business

Fig. 5. Temporal distribution of news updates during different hours in a day

No. Boundary Exponential generalized Pareto

RSS (hours) percent. of 𝛼 (%) percent. of 𝛼 (%)

feed updates (%) updates (%)

1 0.2178 91.07 4.33 9.93 5.37

2 0.3245 84.24 6.71 15.76 3.28

3 1.9483 88.12 7.24 11.88 3.65

4 1.6237 86.75 5.69 13.25 4.45

5 0.2382 93.37 6.54 6.63 4.87

6 0.2754 92.28 6.73 7.72 2.12

7 0.3112 87.63 5.26 12.37 3.13

8 0.2466 89.37 8.45 10.63 2.64

9 1.7928 90.22 11.62 9.78 8.25

10 0.1928 88.57 6.75 11.43 3.58

11 2.0983 83.32 7.44 16.68 3.23

TABL E II

UMERICAL RES ULTS F OR DIS TRIBUTIONS OF INTER-REFRESHING TIME

OF I NDIVIDUAL

RSS F EEDS

feeds in Figure 3 with different news categories, the CCDF

decay of each RSS feed is analogous to that of the aggregate

CCDF in Figure 4. Figure 3 shows that the boundaries for

different RSS feeds are heterogeneous and mainly determined

by the average inter-refreshing time. These boundaries are

summarized in Table II.

To quantitatively justify the characteristics of exponential

and power-law decay in the CCDF of individual RSS feeds, we

perform a Kolmogorov-Smirnov goodness-of-ﬁttest[30]on

each of the 11 RSS feeds listed in Table I. For each RSS feed,

we collect the inter-contact times smaller than its boundary

and test whether the null hypothesis “these inter-contact times

are exponentially distributed” can be accepted. A similar test

is performed o n the inter-contact times with larger values for

the generalized Pareto distribution.

The signiﬁcance levels (𝛼) for these null hypotheses being

accepted are listed in Table II. The lower the signiﬁcance

level is, the more conﬁdent we are that the corresponding

hypothesis is statistically true. As shown in Table II, for all

the RSS feeds, the probability fo r erroneou sly accepting the

null hypotheses is lower than 10%, which is the signiﬁcance

level usually being used for statistical hypothesis testing [8].

Particularly, the signiﬁcance levels for accepting a generalized

Pareto distribution are generally better than those for accepting

an exponential distribution.

C. Temporal Variations

We are also interested in the temporal variations of the

RSS feeds’ updating patterns. Figure 5 shows the temporal

distribution of news updates from RSS feeds over d ifferent

Fig. 6. Standard deviation of the numbers of news updates during different

hours in a day

hours in a day. We observe that the characteristics of such

temporal variation are heterogeneous with different RSS feeds.

For example, the majority of news updates from NYTimes and

ESPN are generated during the time period from the afternoon

to the evening. Comparatively, the news updates from Google

News are evenly distributed among different hours in a day.

To better quantify the skewness of such temporal variation,

we calculate the standard deviation of the numbers of news

updates during different hours in a day for each of the 11 RSS

feeds listed in Table I, and the calculation results are shown in

Figure 6. By comparing Figure 6 with Figure 5, we conclude

that the temporal distributions of news updates from most RSS

feeds are highly skewed. The transient distribution of inter-

refresh ing time of a RSS feed during speciﬁc time periods

hence may differ a lot from its cumulative distribution. Such

temporal variation may affect the perfor mance of maintaining

cache freshness, and will be evaluated in detail via trace-driven

simulations in Section VII.

V. I

NTENTIONAL REFRESHING

In this section, we explain how to ensure that data updates

are delivered to the caching nodes on time, so that the

freshness requirements of cached data are satisﬁed. Based on

investigation results on the distribution of inter-refreshing time

in Section IV, we calculate the utility of each update which

estimates the chance for the requirement being satisﬁed by this

update. Such u tility is then used for opportunistic replication

of data updates.

HTML Viewer

Frequently Asked Questions (7)

Q1. What are the contributions mentioned in the paper "Distributed maintenance of cache freshness in opportunistic mobile networks" ?

In this paper, the authors propose a scheme to efficiently maintain cache freshness. Extensive tracedriven simulations show that their scheme significantly improves cache freshness, and hence ensures the validity of data access provided to mobile users.

Q2. Why is data forwarded in a “carry-and-forward” manner?

Due to the intermittent network connectivity in opportunistic mobile networks, data is forwarded in a “carry-and-forward” manner.

Q3. What is the effect of intentional refreshing on the cache freshness of data?

Due to possible version inconsistency among different data copies cached in the DAT, opportunistic refreshing may have some side-effects on cache freshness.

Q4. How many times did the decay of the CCDF of the inter-refreshing time?

Their results show that up to a boundary on the order of several minutes, the decay of the CCDF is well approximated as exponential.

Q5. What is the effect of changing the value of p on the refreshing overhead?

since different values of 𝑝 do not affect the calculation of utilities of data updates, such increase of refreshing overhead is relatively smaller than that of decreasing Δ.Section IV-C shows that the refreshing patterns of web RSS data is temporally skewed, such that the majority of data updates are generated during specific time periods of a day.

Q6. How is the performance of the proposed scheme evaluated?

The performance of their proposed scheme on maintaining cache freshness is evaluated by extensive tracedriven simulations on realistic mobile traces.

Q7. What is the effect of reducing the refreshing delay?

From Figure 12 the authors observe that, when the value of Δ is small, the cache freshness is mainly constrained by the network contact capability, and the actual refreshing delay is much higher than the required Δ. Such inability to satisfy the cache freshness requirements leads to more replications of data updates as described in Section V-B, and makes caching nodes more prone to perform opportunistic refreshing.

Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks

Summary (5 min read)

Introduction

A. Models

B. Caching Scenario

C. Basic Idea

IV. REFRESHING PATTERNS OF WEB CONTENTS

B. Distribution of Inter-Refreshing Time

C. Temporal Variations

A. Utility of Data Updates

B. Opportunistic Replication of Data Updates

VI. OPPORTUNISTIC REFRESHING

A. Probabilistic Decision

B. Side-Effect of Opportunistic Refreshing

VII. PERFORMANCE EVALUATIONS

A. Simulation Setup

B. Performance of Maintaining Cache Freshness

VIII. CONCLUSION

Figures (15)

Citations

Cites background from "Distributed Maintenance of Cache Fr..."

Cites background from "Distributed Maintenance of Cache Fr..."

Cites background from "Distributed Maintenance of Cache Fr..."

References

Additional excerpts

"Distributed Maintenance of Cache Fr..." refers background in this paper

"Distributed Maintenance of Cache Fr..." refers background in this paper

"Distributed Maintenance of Cache Fr..." refers background in this paper

"Distributed Maintenance of Cache Fr..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (7)

Q1. What are the contributions mentioned in the paper "Distributed maintenance of cache freshness in opportunistic mobile networks" ?

Q2. Why is data forwarded in a “carry-and-forward” manner?

Q3. What is the effect of intentional refreshing on the cache freshness of data?

Q4. How many times did the decay of the CCDF of the inter-refreshing time?

Q5. What is the effect of changing the value of p on the refreshing overhead?

Q6. How is the performance of the proposed scheme evaluated?

Q7. What is the effect of reducing the refreshing delay?

Trending Questions (1)