scispace - formally typeset
Open AccessProceedings ArticleDOI

Vivisecting YouTube: An active measurement study

TLDR
The design of YouTube video delivery system consists of a “flat” video id space, multiple DNS namespaces reflecting a multi-layered logical organization of video servers, and a 3-tier physical cache hierarchy.
Abstract
We deduce key design features behind the YouTube video delivery system by building a distributed active measurement infrastructure, and collecting and analyzing a large volume of video playback logs, DNS mappings and latency data. We find that the design of YouTube video delivery system consists of three major components: a “flat” video id space, multiple DNS namespaces reflecting a multi-layered logical organization of video servers, and a 3-tier physical cache hierarchy. We also uncover that YouTube employs a set of sophisticated mechanisms to handle video delivery dynamics such as cache misses and load sharing among its distributed cache locations and data centers.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Computer Networking: A Top-Down Approach

TL;DR: Computer Networking: A Top-Down Approach Featuring the Internet explains the engineering problems that are inherent in communicating digital information from point to point, and presents the mathematics that determine the best path, show some code that implements those algorithms, and illustrate the logic by using excellent conceptual diagrams.
Proceedings ArticleDOI

Unreeling netflix: Understanding and improving multi-CDN movie delivery

TL;DR: A measurement study of Netflix is performed to uncover its architecture and service strategy, and finds that Netflix employs a blend of data centers and Content Delivery Networks (CDNs) for content distribution.
Journal ArticleDOI

A Survey of Rate Adaptation Techniques for Dynamic Adaptive Streaming Over HTTP

TL;DR: This survey paper looks at emerging research into the application of client-side, server- side, and in-network rate adaptation techniques to support DASH-based content delivery and provides context and motivation for the application.
Proceedings ArticleDOI

Mapping the expansion of Google's serving infrastructure

TL;DR: In this paper, the authors use the EDNS-client-subnet DNS extension to measure which clients a service maps to which of its serving sites and devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-oflight constraints.
Patent

Adaptive multi-interface use for content networking

TL;DR: In this article, a hierarchical structured variable-length identifier (HSVLI) is used to indicate a piece of content and indicate a hierarchical structure of contiguous components ordered from a most general level to a most specific level.
References
More filters
Proceedings ArticleDOI

I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system

TL;DR: In this article, the authors analyzed YouTube, the world's largest UGC VoD system, and provided an in-depth study of the popularity life cycle of videos, intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content.
Proceedings ArticleDOI

Youtube traffic characterization: a view from the edge

TL;DR: This paper presents a traffic characterization study of the popular video sharing service, YouTube, and finds that as with the traditional Web, caching could improve the end user experience, reduce network bandwidth consumption, and reduce the load on YouTube's core server infrastructure.
Proceedings ArticleDOI

Statistics and Social Network of YouTube Videos

TL;DR: The social networking in YouTube videos is investigated, finding that the links to related videos generated by uploaders' choices have clear small-world characteristics, indicating that the videos have strong correlations with each other, and creates opportunities for developing novel techniques to enhance the service quality.
Proceedings ArticleDOI

Internet inter-domain traffic

TL;DR: The majority of inter-domain traffic by volume now flows directly between large content providers, data center / CDNs and consumer networks, and this analysis shows significant changes in inter-AS traffic patterns and an evolution of provider peering strategies.
Proceedings ArticleDOI

An investigation of geographic mapping techniques for internet hosts

TL;DR: Whether it is possible to build an IP address to geographic location mapping service for Internet hosts to enable a large and interesting class of location-aware applications is asked and three distinct techniques for determining the geographic location of Internet hosts are presented and evaluated.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "Vivisecting youtube: an active measurement study" ?

The authors deduce the key design features behind the YouTube video delivery system by collecting and analyzing a large amount of video playback logs, DNS mappings and latency data and by performing additional measurements to verify the findings. Further, YouTube employs a set of sophisticated mechanisms to handle video delivery dynamics such as cache misses and load sharing among its globally distributed cache locations and datacenters. 

Given the traffic volume, geographical span and scale of operations, the design of YouTube’s content delivery infrastructure is perhaps one of the most challenging engineering tasks (in the context of most recent Internet development). 

Their platform utilizes 471 PlanetLab nodes that are distributed at 271 geographical dispersed sites (university campuses, organization or companies), and 843 open recursive DNS servers located at various ISPs and organizations. 

The last two namespaces, cache and altcache, contain 64 DNS names representing 64 logical video servers; they are mapped to the tertiary cache locations in the YouTube physical cache hierarchy. 

About 10 of the primary cache locations are co-located within ISP networks (e.g., Comcast and Bell-Canada), which the authors refer to as non-Google cache locations. 

YouTube defines five (anycast) DNS namespaces, which are organized in multiple layers, each layer representing a collection of logical video servers with certain roles. 

This paper attempts to “reverse-engineer” the YouTube video delivery system through large-scale active measurement, data collection and analysis. 

In particular, when there is a failure on the server side, for example, due to the file is not available temporarily, or the server is overloaded and fails to retrieve the video from an upstream or back-end server, the emulated player records the HTTP error code sent by the server. 

The first two namespaces, lscache and nonxt, contain 192 DNS names representing 192 logical video servers; and as will be shown later, they are mapped to the primary cache locations in the YouTube physical cache hierarchy. 

In the second stage, after resolving the DNS name contained in the URL, their emulated video player connects to the YouTube Flash video server thus resolved, and follows the HTTP protocol to download the video object, and records a detailed log of the process. 

Through in-depth analysis of their datasets, the authors deduce that YouTube employs a 3-tier physical cache hierarchy with (at least) 38 primary cache locations, 8 secondary and 5 tertiary cache locations. 

The authors selected a set of 85 geographically dispersed PlanetLab nodes as proxy servers and a list of several hundreds YouTube videos with varying popularity crawled from the YouTube website. 

As described in Section 2.2, their globally distributed measurement platform consists of three key components: i) PlanetLab nodes that are used for crawling the YouTube website, performing DNS resolutions, and YouTube video playback; ii) open recursive DNS servers to provide additional vantage points to perform DNS resolutions; and iii) emulated YouTube Flash video players running on PlanetLab nodes and two 24-node compute clusters in their lab for downloading and “playing back” YouTube videos and for larges-scale data analysis. 

Due to the resource constraints on the PlanetLab nodes, the authors cannot run standard browsers directly on them to play those YouTube videos.