With the rapid growth of the amount of content published and shared in the cloud, as well as the increasingly high user demands for quick access of the content over the Internet, the existing content distribution system faces a variety of challenges in bringing a fast, scalable, and robust content delivery experience to the users. Via extensive measurement studies on the existing large-scale content distribution systems, we aim to understand the limitations in the current design framework, and seek solutions towards an improved end-to-end user experience. Our contributions are mainly three-fold. First, we study one of the largest online services provided by Yahoo!. Using the network traces collected at five major Yahoo! data centers, we make the first such effort in understanding the traffic dynamics among different back-end data centers within a content provider. Our results reveal the tiered structuring of the data centers at the back-end as used by Yahoo!. Different types of traffic including the inter-data center traffic, and the data center-to-client traffic are teased out from each other in the dataset using inference-based techniques. A deep investigation of the traffic patterns and correlations among different types of traffic has led to important insights for the distribution and replication strategies at the back-end. Second, using two of the largest search services Bing and Google as case studies, we conduct extensive active measurement analysis in characterizing the roles of the front-end edge servers in the end-to-end latency performance of dynamic content distribution. It highlights the trade-off between the user-to-edge last mile latency and the edge-to-data center fetch time when designing the placement strategies for front-end edge servers. Third, to complement the first two studies, one on the back-end data centers, the other on the front-end edge servers, we study how the user characteristics affect the overall performance, and how that factor affects the design decisions at the service providers. For this purpose, we collect detailed measurement data from one of the largest search service providers in US, and perform an extensive passive measurement study based on that. This study culminates with the design and deployment of an anomaly detection and diagnosis algorithm that proves to be essential in helping the content providers improve the robustness of the system. In summary, this thesis provides an extensive end-to-end study of the existing content distribution systems, from the back-end data centers, to the front-end edge servers, and to the user-side characteristics, and how different entities interplay with each other in driving the overall user experience. Although our study mainly focuses on Yahoo! and search services, we believe its findings and methodologies have important implications on other online services as well, as they share similar content distribution framework.
University of Minnesota Ph.D. dissertation. April 2013. Major: Computer science. Advisor: Zhi-Li Zhang. 1 computer file (PDF); ix, 98 pages.
Towards a fast, scalable, and robust end-to-end design of content distribution system.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.