Home
/
Authors
/
Carlos Cunha

Author

Carlos Cunha

Bio: Carlos Cunha is an academic researcher from Boston University. The author has contributed to research in topics: Cache & Inline caching. The author has an hindex of 8, co-authored 8 publications receiving 1036 citations.

Topics: Cache, Inline caching, Smart Cache, Information system, Web standards ...read more

Papers

PDF

Open Access

More filters

Characteristics of WWW Client-based Traces

[...]

Carlos Cunha¹, Azer Bestavros¹, Mark Crovella¹•Institutions (1)

Boston University¹

01 Apr 1995

TL;DR: This paper presents a descriptive statistical summary of the traces of actual executions of NCSA Mosaic, and shows that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, and the Distribution of user requests for documents.

...read moreread less

Abstract: The explosion of WWW traffic necessitates an accurate picture of WWW use, and in particular requires a good understanding of client requests for WWW documents. To address this need, we have collected traces of actual executions of NCSA Mosaic, reflecting over half a million user requests for WWW documents. In this paper we present a descriptive statistical summary of the traces we collected, which identifies a number of trends and reference patterns in WWW use. In particular, we show that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, the distribution of user requests for documents, and the number of references to documents as a function of their overall rank in popularity (Zipf''s law). In addition, we show how the power-law distributions derived from our traces can be used to guide system designers interested in caching WWW documents. --- Our client-based traces are available via FTP from http://www.cs.bu.edu/techreports/1995-010-www-client-traces.tar.gz http://www.cs.bu.edu/techreports/1995-010-www-client-traces.a.tar.gz

...read moreread less

624 citations

Proceedings Article•DOI•

Application-level document caching in the Internet

[...]

Azer Bestavros¹, Robert L. Carter¹, Mark Crovella¹, Carlos Cunha¹, Abdelsalam A. Heddaya¹, Sulaiman A. Mirdad¹ - Show less +2 more•Institutions (1)

Boston University¹

05 Jun 1995

TL;DR: The results suggest that distinguishing between documents produced locally and those produced remotely can provide useful leverage in designing caching policies, because of differences in the potential for sharing these two document types among multiple users.

...read moreread less

Abstract: With the increasing demand for document transfer services such as the World Wide Web comes a need for better resource management to reduce the latency of documents in these systems. To address this need, we analyze the potential for document caching at the application level in document transfer services. We have collected traces of actual executions of Mosaic, reflecting over half a million user requests for WWW documents. Using those traces, we study the tradeoffs between caching at three levels in the system, and the potential for use of application-level information in the caching system. Our traces show that while a high hit rate in terms of URLs is achievable, a much lower hit rate is possible in terms of bytes, because most profitably-cached documents are small. We consider the performance of caching when applied at the level of individual user sessions, at the level of individual hosts, and at the level of a collection of hosts on a single LAN. We show that the performance gain achievable by caching at the session level (which is straightforward to implement) is nearly all of that achievable at the LAN level (where caching is more difficult to implement). However, when resource requirements are considered, LAN level caching becomes muck more desirable, since it can achieve a given level of caching performance using a much smaller amount of cache space. Finally, we consider the use of organizational boundary information as an example of the potential for use of application-level information in caching. Our results suggest that distinguishing between documents produced locally and those produced remotely can provide useful leverage in designing caching policies, because of differences in the potential for sharing these two document types among multiple users. >

...read moreread less

177 citations

Journal Article•

Server-Initated Document Dissemination for the WWW.

[...]

Azer Bestavros, Carlos Cunha

01 Jan 1996-IEEE Data(base) Engineering Bulletin

TL;DR: Results of log analysis and trace driven simulations are presented that quantify the performance gains achievable through the use of a data dissemination mechanism that allows information to propagate from its producers to servers that are closer to its consumers.

...read moreread less

Abstract: In this paper we overview the merits of a data dissemination mechanism that allows information to propagate from its producers to servers that are closer to its consumers This dissemination reduces network tra c and balances load amongst servers by exploiting geographic and temporal locality of reference properties exhibited in client access patterns The level of dissemination depends on the relative popularity of documents and on the expected reduction in tra c that results from such dissemination We present results of log analysis and trace driven simulations that quantify the performance gains achievable through the use of such a protocol

...read moreread less

114 citations

Book Chapter•DOI•

Finding a Hidden Code by Asking Questions

[...]

Zhixiang Chen¹, Carlos Cunha², Steven Homer²•Institutions (2)

Southwest Minnesota State University¹, Boston University²

17 Jun 1996

TL;DR: It is shown that \(\left\lceil {\frac{m}{n}} \right\rceil\)+ 2nlogn + 2n + 2 queries are sufficient to find any hidden code if m ≥ n.

...read moreread less

Abstract: We study the problem of finding a hidden code k in the domain {1, , m}n in the presence of an oracle which, for any x in the domain, answers a pair of numbers a(x,k) and b(x,k) such that a(x,k) is the number of components coinciding in x and k and, b(x, k) is the sum of a(x, k) and the number of components occurring in both x and k but, not at the same position We show that \(\left\lceil {\frac{m}{n}} \right\rceil\)+ 2nlogn + 2n + 2 queries are sufficient to find any hidden code if m ≥ n

...read moreread less

41 citations

Characteristics of www client traces

[...]

Carlos Cunha, Azer Bestavros, Mark Crovella

01 Jan 1995

31 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Web caching and Zipf-like distributions: evidence and implications

[...]

Lee Breslau¹, Pei Cao², Li Fan², G. Phillips³, Scott Shenker³ - Show less +1 more•Institutions (3)

PARC¹, University of Wisconsin-Madison², University of Southern California³

21 Mar 1999

TL;DR: This paper investigates the page request distribution seen by Web proxy caches using traces from a variety of sources and considers a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesse observed by proxies.

...read moreread less

Abstract: This paper addresses two unresolved issues about Web caching. The first issue is whether Web requests from a fixed user community are distributed according to Zipf's (1929) law. The second issue relates to a number of studies on the characteristics of Web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces. In particular, the question is whether these properties are inherent to Web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facilitate both Web cache resource planning and cache hierarchy design. We show that the answers to the two questions are related. We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is only (i) a weak correlation between the access frequency of a Web page and its size and (ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviour that are consistent with the experimental observations, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.

...read moreread less

3,582 citations

Journal Article•DOI•

[...]

Mark Crovella¹, Azer Bestavros¹•Institutions (1)

Boston University¹

01 Dec 1997-IEEE ACM Transactions on Networking

TL;DR: It is shown that the self-similarity in WWW traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local-area network.

...read moreread less

Abstract: The notion of self-similarity has been shown to apply to wide-area and local-area network traffic. We show evidence that the subset of network traffic that is due to World Wide Web (WWW) transfers can show characteristics that are consistent with self-similarity, and we present a hypothesized explanation for that self-similarity. Using a set of traces of actual user executions of NCSA Mosaic, we examine the dependence structure of WWW traffic. First, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local-area network. To do this, we rely on empirically measured distributions both from client traces and from data independently collected at WWW servers.

...read moreread less

2,608 citations

Journal Article•DOI•

[...]

Mark Crovella¹, Azer Bestavros¹•Institutions (1)

Boston University¹

15 May 1996

...read moreread less

Abstract: Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we examine the dependence structure of WWW traffic. While our measurements are not conclusive, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.

...read moreread less

2,332 citations

Journal Article•DOI•

Summary cache: a scalable wide-area web cache sharing protocol

[...]

Li Fan¹, Pei Cao², Jussara M. Almeida¹, Andrei Z. Broder•Institutions (2)

University of Wisconsin-Madison¹, Cisco Systems, Inc.²

01 Jun 2000-IEEE ACM Transactions on Networking

TL;DR: This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.

...read moreread less

Abstract: The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we demonstrate the benefits of cache sharing, measure the overhead of the existing protocols, and propose a new protocol called "summary cache". In this new protocol, each proxy keeps a summary of the cache directory of each participating proxy, and checks these summaries for potential hits before sending any queries. Two factors contribute to our protocol's low overhead: the summaries are updated only periodically, and the directory representations are very economical, as low as 8 bits per entry. Using trace-driven simulations and a prototype implementation, we show that, compared to existing protocols such as the Internet cache protocol (ICP), summary cache reduces the number of intercache protocol messages by a factor of 25 to 60, reduces the bandwidth consumption by over 50%, eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP. Hence summary cache scales to a large number of proxies. (This paper is a revision of Fan et al. 1998; we add more data and analysis in this version.).

...read moreread less

2,174 citations

Proceedings Article•DOI•

Generating representative Web workloads for network and server performance evaluation

[...]

Paul Barford¹, Mark Crovella¹•Institutions (1)

Boston University¹

01 Jun 1998

TL;DR: This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server and addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream.

...read moreread less

Abstract: One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.

...read moreread less

1,549 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188

Collapse