Understanding and improving large-scale content distribution

Adhikari, Vijay Kumar2012-12-172012-12-172012-09https://hdl.handle.net/11299/141081University of Minnesota Ph.D. dissertation. September 2012. Major: Computer science. Advisor: Prof. Zhi-Li Zhang. 1 computer file (PDF); x, 106 pages.The Internet architecture was not designed for delivering large-scale content. Therefore, with the increase in popularity and demand for videos on the Internet, video content providers(CPs) have to come up with a number of solutions to achieve high degree of scalability, resilience and performance requirements of video content distribution. In this thesis we aim to answer the following research questions: (a) how do large scale content distribution systems currently work and what problems do they encounter, and (b) how can we solve those problems both in the short term as well as in the long term. Towards this end, this thesis makes the following contributions: First, we study original YouTube architecture to understand how a video delivery system with small number of large data centers handle scalability and performance challenges. Specifically, we uncover the use of location-agnostic proportional load-balancing strategy and how that affects its ISPs (Internet service providers). Second, we investigate how a more distributed approach employed by current YouTube improves the resilience of its delivery system. Using active measurement study, we uncover the use of multiple namespaces, tiered cache hierarchy, dynamic and location aware DNS (Domain Name System). Although this approach improves the resilience and performance compared to location-agnostic approach, since YouTube uses its own content delivery infrastructure, it is likely encounter scalability challenges as its content size and popularity increases. Third, to complement the two in-house content distribution architectures, we study Netflix and Hulu. These services make use of multiple third party content delivery networks (CDNs) to deliver their content. We find that their CDN selection and adaptation strategies lead to suboptimal user experience. We then propose inexpensive measurement-based CDN selection strategies that significantly improve the quality of the video streaming. Additionally, we find that although CDN networks themselves might be well designed the CDN selection mechanism and “intelligence” of the client software can be improved upon to provide users with better quality of service. Finally, building upon the results of these and other recent works on understanding large scale content distribution systems, we propose a first step in the direction of an open CDN architecture that allows for better scalability and performance. The two key ingredient of this proposal are to let any willing ISP to participate as CDNs and instrument client software to make decisions based upon measurements. This proposal is incrementally deployable. It is also economically more sustainable as it opens up new sources of revenue for the ISPs. We also provide a proof of concept implementation for this architecture using PlanetLab infrastructure.en-USComputer scienceUnderstanding and improving large-scale content distributionThesis or Dissertation