Browsing by Author "Jain, Sourabh"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Creating scalable, efficient and namespace independent routing framework for future networks.(2011-06) Jain, SourabhIn this thesis we propose VIRO -- a novel and paradigm-shifting approach to network routing and forwarding that is not only highly scalable and robust, but also is namespace- independent. VIRO provides several advantages over existing network routing architectures, including: i) VIRO directly and simultaneously addresses the challenges faced by IP networks as well as those associated with the traditional layer-2 technologies such as Ethernet -- while retaining its "plug-&-play" feature. ii) VIRO provides a uniform convergence layer that inte- grates and unifies routing and forwarding performed by the traditional layer-2 (data link layer) and layer-3 (network layer), as prescribed by the conventional local-area/wide-area network di- chotomy and layered architecture. iii) Perhaps more importantly, VIRO decouples routing from addressing, and thus is namespace-independent. Hence VIRO allows new (global or local) ad- dressing and naming schemes (e.g., HIP or flat-id namespace) to be introduced into networks without the need to modify core router/switch functions, and can easily and flexibly support inter-operability between existing and new addressing schemes/namespaces. In the second part of this thesis, we present Virtual Ethernet Id Layer, in short VEIL, a practical realization of VIRO routing protocol to create a large-scale Ethernet networks. VEIL is aimed at simplifying the management of large-scale enterprise networks by requiring minimal manual configuration overheads. It makes it tremendously easy to plug-in a new routing-node or a host-device in the network without requiring any manual configuration. It builds on top of a highly scalable and robust routing substrate provided by VIRO, and supports many advanced features such as seamless mobility support, built-in multi-path routing and fast-failure re-routing in case of link/node failures without requiring any specialized topologies. To demonstrate the feasibility of VEIL, we have built a prototype of VEIL, called veil-click, using Click Modular Router framework, which can be co-deployed with existing Ethernet switches, and does not require any changes to host-devices connecting to the network.Item Extracting the Textual and Temporal Structure of Supercomputing Logs(2009-06-01) Jain, Sourabh; Singh, Inderpreet; Chandra, Abhishek; Zhang, Zhi-Li; Bronevetsky, GregSupercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.Item Failure Classification and Inference in Large-Scale Systems: A Systematic Study of Failures in PlanetLab(2008-04-24) Jain, Sourabh; Prinja, Rohini; Chandra, Abhishek; Zhang, Zhi-LiLarge-scale distributed systems are prone to frequent failures, which could be caused by a variety of factors related to network, hardware, and software problems. Any downtime due to failures, whatever the cause, can lead to large disruptions and huge losses. Identifying the location and cause of a failure is critical for the reliability and availability of such systems. However, identifying the actual cause of failures in such systems is a challenging task due to their large scale and variety of failure causes. In this work, we try to understand failures in a large-scale system through a two-step methodology: (i) classifying failures based on their statistical properties, and (ii) using additional monitoring data to explain these failures. We illustrate our methodology through a systematic study of failures in PlanetLab over a 3-month period. Our results show that most of the failures that required restarting a node were of small size and lasted for long durations. We also found that incorporating geographic information into our analysis enabled us to find site-wise correlated failures. We were also able to explain some failures by using error-message information collected by the monitoring nodes, and some of short-lived failures by transient CPU overloads on machines.Item Vivisecting YouTube: An Active Measurement Study(2011-07-11) Adhikari, Vijay Kumar; Jain, Sourabh; Chen, Yingying; Zhang, Zhi-LiWe build a distributed active measurement infrastructure to uncover the internals of the YouTube video delivery system. We deduce the key design features behind the YouTube video delivery system by collecting and analyzing a large amount of video playback logs, DNS mappings and latency data and by performing additional measurements to verify the findings. We find that the design of the YouTube video delivery system consists of three major components: a "flat" video id space, multiple DNS namespaces reflecting a multi- layered logical organization of video servers, and a 3-tier physical cache hierarchy. Further, YouTube employs a set of sophisticated mechanisms to handle video delivery dynamics such as cache misses and load sharing among its globally distributed cache locations and datacenters.