File popularity characterisation
read more
Citations
A workload characterization study of the 1998 World Cup Web site
A Workload Characterization Study of the 7998 World Cup Web Site
Monitoring the application-layer DDoS attacks for popular websites
Workload Modeling for Computer Systems Performance Evaluation
Traffic analysis of a Web proxy caching hierarchy
References
Web caching and Zipf-like distributions: evidence and implications
Generating representative Web workloads for network and server performance evaluation
Strong Regularities in World Wide Web Surfing
Characteristics of WWW Client-based Traces
Related Papers (5)
Frequently Asked Questions (11)
Q2. How many requests have been obtained for 5 different caches?
Number of requests used to calculate exponentThe authors have been able to obtain samples in excess of 500000 file requests for 5 very different caches.
Q3. What is the way to compare data from different caches?
In order to compare data from different caches reliably it is necessary to ensure that differences are real and not due to insufficiently large samples.
Q4. How can the authors fit an inverse power law curve to cache popularity curves?
With appropriate care it is possible to fit an inverse power law curve to cache popularity curves, with an exponent of between -0.9 and -0.5, and with a high degree of confidence.
Q5. What is the likely explanation for the differences in the popularity curve?
From consideration of the work of Zipf on word use in different cultures, it seems likely that cultural differences will often be expressed through differences in the K factor in the power curve rather than the exponent.
Q6. What is the reason why the exponent of the locality curve is not a fit?
While filtering is one possible factor affecting the exponent of the locality curve, other factors possibly influence the exponent.
Q7. What is the importance of the analysis of cache popularity curves?
The analysis of cache popularity curves requires careful definition of what is to be analysed and, since the data displays significant long range dependency, very large sample sizes.
Q8. What is the significance of the power law curves?
The authors demonstrate for the first time in this paper that, with appropriate care in the analysis, it can be shown that whilst the power law curves are not strictly Zipf curves they are still culture independent.
Q9. How many months did the exponent range from -0.23 to -1.34?
Over these six months the fitted exponent ranged from -0.23 to -1.34 with a mean of -0.5958 and a variance of 0.03 (figure 4), using the 'averaged' ranking method mentioned above.
Q10. What is the exponent of the cache popularity curve?
The exponent does not appear to depend on cache size, on time, or on the culture of the cache users, but only depends on the topological position of the cache in the network.
Q11. What are the useful metrics in cache logs?
Cache logs can be extremely comprehensive, detailing time of request, bytes transferred, file name and other useful metrics [e.g. ftp://ircache.nlanr.net/Traces/].