With the success of applications spurring the tremendous increase in the volume of data transfer, efficient and reliable content distribution has become a key issue. Peer-to-peer (P2P) technology has gained popularity as a promising approach to large-scale content distribution due to its benefits including self-organizing, load-balancing, and fault-tolerance. Despite these strengths, P2P systems also present several challenges such as performance guarantees, reliability, efficiency, and security. In P2P systems deployed on a large scale, these challenges are more difficult to deal with because of the large number of participants, unreliable user behaviors, and unexpected situations. This thesis explores solutions to improve the efficiency, robustness, and security of large-scale P2P content distribution systems, focusing on three particular issues: lookup, practical network coding, and secure network coding.
A distributed hash table (DHT) is a structured overlay network service that provides a decentralized lookup for mapping objects to locations. This thesis focuses on improving the lookup performance of Kademlia DHT protocol. Although many studies have proposed DHTs to provide a means of organizing and locating peers to many distributed systems, to the best of my knowledge, Kademlia is a unique DHT deployed on an Internet-scale in the real world. This study evaluates the lookup performance of Kad (a variation of Kademlia) deployed in one of the largest P2P file-sharing networks. The measurement study shows that lookup results are not consistent; only 18% of nodes located by storing and searching lookups are the same. This lookup inconsistency problem leads to poor performance and the inefficient use of resources during lookups. This study identifies the underlying reasons for this inconsistency problem and the poor performance of lookups, and proposes solutions to guarantee reliable lookup results while providing the efficient use of resources.
This thesis studies the practicality of network coding to facilitate cooperative content distribution. Network coding is a new data transmission technique which allows any nodes in a network to encode and distribute data. It is a good solution offering reliability and efficiency in distributing content, but the usefulness of network coding is still in dispute because of its dubious performance gains and coding overhead in practice. With the implementation of network coding in a real-world application, this thesis measures the performance and overhead of network coding for content distribution in practice. This study also provides a lightweight yet efficient encoding scheme which allows network coding to provide improved performance and robustness with negligible overhead.
Network coding is a promising data transmission technique. However, the use of network coding also poses security vulnerabilities by allowing untrusted nodes to produce new encoded data. Network coding is seriously vulnerable to pollution attacks where malicious nodes inject false corrupted data into a network. Because of the nature of the network coding, even a single unfiltered false data block may propagate widely in the network and disrupt correct decoding on many nodes, by being mixed with other correct blocks. Since blocks are re-coded in transit, traditional hash or signature schemes do not work with network coding. Thus, this thesis introduces a new homomorphic signature scheme which efficiently verifies encoded data on-the-fly and comes with desirable features appropriate for P2P content distribution. This scheme can protect network coding from pollution attacks without delaying downloading processes.