Kumar, Kushagra2024-01-052024-01-052018-08https://hdl.handle.net/11299/259555University of Minnesota M.S. thesis. August 2018. Major: Computer Science. Advisor: Haiyang Wang. 1 computer file (PDF); vii, 58 pages + 1 supplementary .zip file.File synchronization plays an important role in distributing files across internet users. Commercial products, such as Dropbox and Google Drive, have rich data-center components, providing reliable data synchronization across different types of devices. It is, however, known that the synchronization latency of these systems is still far from being satisfactory. P2P(Peer-to-Peer) based file synchronization system, most notably Resilio Sync, is therefore widely suggested to provide high-performance synchronization in large-scale systems. However, the protocol details of Resilio Sync remain largely unknown to the general public, not to mention its performance as well as the potential bottlenecks. In this thesis, we aim to understand the framework design and performance bottleneck of P2P-based file synchronization system, via an initial measurement set-up. We present a distributed active measurement framework for deploying the commercial system on highly-distributed PlanetLab test-bed and investigating its protocol details and real-world performance. This framework can automatically deploy the P2P synchronization system on a distributed test-bed and then collect the related trace data, by triggering the bash scripts across PlanetLab nodes, to understand its detailed performance. Our measurement result indicates that Resilio Sync has certain fairness issues. For example, in a typical swarm, 79% of the peers can obtain a 500MB content within 3 minutes. On the other hand, the remaining 21% peers will suffer from a long synchronization latency for over 100 minutes. Such a problem is mainly due to the lack of tit-for-tat protocol in the peer selection stage. This is one of the issues that we will try to address in this thesis. We will also summarize the existing problems from our trace data and pinpoint the underlying reasons with the help of detailed system analysis.enAnalyzing Commercial Peer-to-Peer File Synchronization via Distributed Active MeasurementThesis or Dissertation