The Internet architecture was not designed for delivering large-scale content. Therefore,
with the increase in popularity and demand for videos on the Internet, video content
providers(CPs) have to come up with a number of solutions to achieve high degree of
scalability, resilience and performance requirements of video content distribution. In this
thesis we aim to answer the following research questions: (a) how do large scale content
distribution systems currently work and what problems do they encounter, and (b) how
can we solve those problems both in the short term as well as in the long term. Towards
this end, this thesis makes the following contributions:
First, we study original YouTube architecture to understand how a video delivery
system with small number of large data centers handle scalability and performance challenges.
Specifically, we uncover the use of location-agnostic proportional load-balancing
strategy and how that affects its ISPs (Internet service providers).
Second, we investigate how a more distributed approach employed by current
YouTube improves the resilience of its delivery system. Using active measurement study,
we uncover the use of multiple namespaces, tiered cache hierarchy, dynamic and location
aware DNS (Domain Name System). Although this approach improves the resilience and
performance compared to location-agnostic approach, since YouTube uses its own content
delivery infrastructure, it is likely encounter scalability challenges as its content size
and popularity increases.
Third, to complement the two in-house content distribution architectures, we study
Netflix and Hulu. These services make use of multiple third party content delivery networks
(CDNs) to deliver their content. We find that their CDN selection and adaptation
strategies lead to suboptimal user experience. We then propose inexpensive
measurement-based CDN selection strategies that significantly improve the quality of
the video streaming. Additionally, we find that although CDN networks themselves
might be well designed the CDN selection mechanism and “intelligence” of the client
software can be improved upon to provide users with better quality of service.
Finally, building upon the results of these and other recent works on understanding
large scale content distribution systems, we propose a first step in the direction of an open CDN architecture that allows for better scalability and performance. The two
key ingredient of this proposal are to let any willing ISP to participate as CDNs and
instrument client software to make decisions based upon measurements. This proposal
is incrementally deployable. It is also economically more sustainable as it opens up new
sources of revenue for the ISPs. We also provide a proof of concept implementation for
this architecture using PlanetLab infrastructure.