scispace - formally typeset
Search or ask a question
Author

Carey Williamson

Bio: Carey Williamson is an academic researcher from University of Calgary. The author has contributed to research in topics: The Internet & Network packet. The author has an hindex of 39, co-authored 225 publications receiving 7277 citations. Previous affiliations of Carey Williamson include Stanford University & University of Saskatchewan.


Papers
More filters
Journal ArticleDOI
15 May 1996
TL;DR: This paper concludes with a discussion of caching and performance issues, using the invariants to suggest performance enhancements that seem most promising for Internet Web servers.
Abstract: The phenomenal growth in popularity of the World Wide Web (WWW, or the Web) has made WWW traffic the largest contributor to packet and byte traffic on the NSFNET backbone. This growth has triggered recent research aimed at reducing the volume of network traffic produced by Web clients and servers, by using caching, and reducing the latency for WWW users, by using improved protocols for Web interaction.Fundamental to the goal of improving WWW performance is an understanding of WWW workloads. This paper presents a workload characterization study for Internet Web servers. Six different data sets are used in this study: three from academic (i.e., university) environments, two from scientific research organizations, and one from a commercial Internet provider. These data sets represent three different orders of magnitude in server activity, and two different orders of magnitude in time duration, ranging from one week of activity to one year of activity.Throughout the study, emphasis is placed on finding workload invariants: observations that apply across all the data sets studied. Ten invariants are identified. These invariants are deemed important since they (potentially) represent universal truths for all Internet Web servers. The paper concludes with a discussion of caching and performance issues, using the invariants to suggest performance enhancements that seem most promising for Internet Web servers.

858 citations

Journal ArticleDOI
TL;DR: The paper concludes with a discussion of caching and performance issues, using the observed workload characteristics to suggest performance enhancements that seem promising for Internet Web servers.
Abstract: This paper presents a workload characterization study for Internet Web servers. Six different data sets are used in the study: three from academic environments, two from scientific research organizations, and one from a commercial Internet provider. These data sets represent three different orders of magnitude in server activity, and two different orders of magnitude in time duration, ranging from one week of activity to one year. The workload characterization focuses on the document type distribution, the document size distribution, the document referencing behavior, and the geographic distribution of server requests. Throughout the study, emphasis is placed on finding workload characteristics that are common to all the data sets studied. Ten such characteristics are identified. The paper concludes with a discussion of caching and performance issues, using the observed workload characteristics to suggest performance enhancements that seem promising for Internet Web servers.

771 citations

Journal ArticleDOI
TL;DR: This is the first work to use semi-supervised learning techniques for the traffic classification problem and allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows.

288 citations

Proceedings ArticleDOI
11 Sep 2006
TL;DR: The results show that port-based analysis is ineffective, being unable to identify 30%-70% of today's Internet traffic, and the transport-layer method seems promising, providing a robust means to assess aggregate P2P traffic.
Abstract: This paper focuses on network traffic measurement of Peer-to- Peer (P2P) applications on the Internet. P2P applications supposedly constitute a substantial proportion of today's Internet traffic. However, current P2P applications use several obfuscation techniques, including dynamic port numbers, port hopping, HTTP masquerading, chunked file transfers, and encrypted payloads. As P2P applications continue to evolve, robust and effective methods are needed for P2P traffic identification. The paper compares three methods to classify P2P applications: port-based classification, application-layer signatures, and transport-layer analysis. The study uses empirical network traces collected from the University of Calgary Internet connection for the past 2 years. The results show that port-based analysis is ineffective, being unable to identify 30%-70% of today's Internet traffic. Application signatures are accurate, but may not be possible for legal or technical reasons. The transport-layer method seems promising, providing a robust means to assess aggregate P2P traffic. The latter method suggests that 30%-70% of the campus Internet traffic for the past year was P2P.

233 citations

Journal ArticleDOI
TL;DR: The Visual Backchannel design provides an evolving, interactive, and multi-faceted visual overview of large-scale ongoing conversations on Twitter, and includes visual saliency for what is happening now and what has just happened in the context of the evolving conversation.
Abstract: We introduce the concept of a Visual Backchannel as a novel way of following and exploring online conversations about large-scale events. Microblogging communities, such as Twitter, are increasingly used as digital backchannels for timely exchange of brief comments and impressions during political speeches, sport competitions, natural disasters, and other large events. Currently, shared updates are typically displayed in the form of a simple list, making it difficult to get an overview of the fast-paced discussions as it happens in the moment and how it evolves over time. In contrast, our Visual Backchannel design provides an evolving, interactive, and multi-faceted visual overview of large-scale ongoing conversations on Twitter. To visualize a continuously updating information stream, we include visual saliency for what is happening now and what has just happened, set in the context of the evolving conversation. As part of a fully web-based coordinated-view system we introduce Topic Streams, a temporally adjustable stacked graph visualizing topics over time, a People Spiral representing participants and their activity, and an Image Cloud encoding the popularity of event photos by size. Together with a post listing, these mutually linked views support cross-filtering along topics, participants, and time ranges. We discuss our design considerations, in particular with respect to evolving visualizations of dynamically changing data. Initial feedback indicates significant interest and suggests several unanticipated uses.

229 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Book ChapterDOI
30 May 2018
TL;DR: Tata Africa Services (Nigeria) Limited as mentioned in this paper is a nodal point for Tata businesses in West Africa and operates as the hub of TATA operations in Nigeria and the rest of West Africa.
Abstract: Established in 2006, TATA Africa Services (Nigeria) Limited operates as the nodal point for Tata businesses in West Africa. TATA Africa Services (Nigeria) Limited has a strong presence in Nigeria with investments exceeding USD 10 million. The company was established in Lagos, Nigeria as a subsidiary of TATA Africa Holdings (SA) (Pty) Limited, South Africa and serves as the hub of Tata’s operations in Nigeria and the rest of West Africa.

3,658 citations

Proceedings Article
01 Jan 1991
TL;DR: It is concluded that properly augmented and power-controlled multiple-cell CDMA (code division multiple access) promises a quantum increase in current cellular capacity.
Abstract: It is shown that, particularly for terrestrial cellular telephony, the interference-suppression feature of CDMA (code division multiple access) can result in a many-fold increase in capacity over analog and even over competing digital techniques. A single-cell system, such as a hubbed satellite network, is addressed, and the basic expression for capacity is developed. The corresponding expressions for a multiple-cell system are derived. and the distribution on the number of users supportable per cell is determined. It is concluded that properly augmented and power-controlled multiple-cell CDMA promises a quantum increase in current cellular capacity. >

2,951 citations

Journal ArticleDOI
15 May 1996
TL;DR: It is shown that the self-similarity in WWW traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network.
Abstract: Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we examine the dependence structure of WWW traffic. While our measurements are not conclusive, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.

2,332 citations

Journal ArticleDOI
TL;DR: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications as mentioned in this paper, where preprocessing, pattern discovery, and pattern analysis are described in detail.
Abstract: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

2,227 citations