Q2. What is the challenging engineering task for YouTube?
Given the traffic volume, geographical span and scale of operations, the design of YouTube’s content delivery infrastructure is perhaps one of the most challenging engineering tasks (in the context of most recent Internet development).
Q3. How many planetlab nodes are distributed at different locations?
Their platform utilizes 471 PlanetLab nodes that are distributed at 271 geographical dispersed sites (university campuses, organization or companies), and 843 open recursive DNS servers located at various ISPs and organizations.
Q4. How many logical video servers are mapped to the YouTube physical cache hierarchy?
The last two namespaces, cache and altcache, contain 64 DNS names representing 64 logical video servers; they are mapped to the tertiary cache locations in the YouTube physical cache hierarchy.
Q5. How many primary cache locations are co-located within Google?
About 10 of the primary cache locations are co-located within ISP networks (e.g., Comcast and Bell-Canada), which the authors refer to as non-Google cache locations.
Q6. How many logical video servers are mapped to the YouTube cache?
YouTube defines five (anycast) DNS namespaces, which are organized in multiple layers, each layer representing a collection of logical video servers with certain roles.
Q7. What is the purpose of this paper?
This paper attempts to “reverse-engineer” the YouTube video delivery system through large-scale active measurement, data collection and analysis.
Q8. What is the problem with the emulated player?
In particular, when there is a failure on the server side, for example, due to the file is not available temporarily, or the server is overloaded and fails to retrieve the video from an upstream or back-end server, the emulated player records the HTTP error code sent by the server.
Q9. How many logical video servers are mapped to the primary cache locations?
The first two namespaces, lscache and nonxt, contain 192 DNS names representing 192 logical video servers; and as will be shown later, they are mapped to the primary cache locations in the YouTube physical cache hierarchy.
Q10. What is the process of resolving the DNS name in the URL?
In the second stage, after resolving the DNS name contained in the URL, their emulated video player connects to the YouTube Flash video server thus resolved, and follows the HTTP protocol to download the video object, and records a detailed log of the process.
Q11. How many primary and 5 secondary cache locations are there?
Through in-depth analysis of their datasets, the authors deduce that YouTube employs a 3-tier physical cache hierarchy with (at least) 38 primary cache locations, 8 secondary and 5 tertiary cache locations.
Q12. How many YouTube videos are crawled from the website?
The authors selected a set of 85 geographically dispersed PlanetLab nodes as proxy servers and a list of several hundreds YouTube videos with varying popularity crawled from the YouTube website.
Q13. What are the key components of the YouTube video delivery system?
As described in Section 2.2, their globally distributed measurement platform consists of three key components: i) PlanetLab nodes that are used for crawling the YouTube website, performing DNS resolutions, and YouTube video playback; ii) open recursive DNS servers to provide additional vantage points to perform DNS resolutions; and iii) emulated YouTube Flash video players running on PlanetLab nodes and two 24-node compute clusters in their lab for downloading and “playing back” YouTube videos and for larges-scale data analysis.
Q14. Why do the authors need to run standard browsers to play those YouTube videos?
Due to the resource constraints on the PlanetLab nodes, the authors cannot run standard browsers directly on them to play those YouTube videos.