Browsing by Subject "Distributed Systems"

Now showing 1 - 5 of 5

Balancing information mixing and optimality: a framework for robust and efficient distributed decision-making
(2025-01) Khatana, Vivek
Driven by the need for resiliency, scalability, and plug-and-play operation, distributed decision-making is becoming increasingly vital. This thesis develops a comprehensive framework to address the challenges in distributed decision-making, focusing on distributed optimization and control of multi-agent networks. It combines theoretical insights, algorithmic tools, and experimental validation to enhance decision-making in networked dynamical systems (NDS) with applications aimed at advancing clean and renewable energy systems. The proposed solutions emphasize distributed consensus, distributed optimization under practical constraints, and resilience against malicious agents and natural disasters in modern power systems. Achieving consensus in multi-agent systems is a cornerstone of distributed decision-making. The first part of this thesis makes a significant contribution in the development of distributed average consensus algorithms in multi-agent networks. By analyzing the geometry of the ratio consensus algorithm, this thesis introduces a finite-time distributed stopping criterion that guarantees convergence within any specified tolerance, regardless of the dimensionality of the state variables. The approach leverages the monotonicity of network state polytopes indexed by time. Additionally, the thesis presents a finite-time stopping criterion for networks with dynamic interconnection topologies, demonstrating that global maximum and minimum values remain strictly monotonic, even with dynamic links. The practicality of these algorithms is showcased through MATLAB simulations and experiments with Node.js-based agents. To address communication noise in the cyber-physical components of NDS, a resilient average consensus algorithm is proposed. Each agent updates its estimates using local information while incorporating weighted, noise-free initial values during iterations. The resilient average consensus algorithm has a geometric rate of convergence under noiseless conditions and almost-sure consensus under noisy communication. Numerical experiments confirm its effectiveness under varying noise scenarios and parameters. Part two of this thesis introduces a framework for distributed multi-agent optimization problems involving a common decision variable. A novel optimize then agree approach decouples optimization and consensus steps, ensuring disagreement between agents’ estimates remains below a predefined tolerance; existing algorithms do not provide such a guarantee which is required in many engineering scenarios. For the class of distributed optimization problems with local linear equality, inequality, and set constraints, we develop an algorithm that works over directed communication topologies and accrues all the benefits of the Alternating Direction Method of Multipliers approach. The algorithms synthesize distributively, communication overhead remains within a log factor of the lower bound, and guarantees strong convergence properties, achieving global geometric convergence rates for strongly convex and Lipschitz differentiable functions and global sublinear convergence rates for general convex functions. The efficacy of this framework is demonstrated through comparisons with state-of-the-art algorithms on distributed optimization and learning problems. The last part of the thesis focuses on developing methods for clean and renewable energy adoption in power systems. We develop a distributed controller for secondary control in microgrids with grid-forming inverter-based resources (GFM IBRs). The controller uses distributed optimization, enabling decentralized measurements and neighborhood information exchange to achieve voltage regulation and reactive power sharing. Additionally, a framework for distributed detection and isolation of maliciously behaving agents is proposed for resilient power apportioning between distributed energy resources (DERs). To address challenges posed by the absence of power grids in catastrophic events, this thesis introduces a net-load management engine (horizon of viability (HoV) engine) that ensures reliable power supply to critical infrastructure over a given time-horizon by generating cost-optimal portfolios of local generation sources and loads using mixed-integer convex programming. Controller-hardware-in-the-loop (CHIL) platforms validate the proposed secondary controller, the resilient power apportioning protocol, and the HoV engine across diverse DERs and loads, demonstrating the robustness of the developed methods.
Data-aware optimizations for efficient analytics over geo-distributed data
(2024-12) Wolfrath, Joel
Modern applications process large volumes of data, which are analyzed and used to improve user experience, guide business decisions, and drive innovation. This data is increasingly generated and persisted across multiple geographical locations, which presents several challenges for traditional analytics systems designed to operate in a centralized fashion. First, the wide-area network links that connect these locations can be exceedingly slow and less reliable than high-speed networks in the cloud. Second, the ubiquity of smart devices has driven an increase in the volume of data available for processing. Third, several application domains process real-time data and require low-latency responses for queries. Finally, data sovereignty laws can further constrain the ability to transfer data for analysis. New approaches are required to deliver low-latency analytics over large data volumes distributed across the country or the globe. This thesis proposes utilizing data-awareness--the ability to observe properties of the data being processed--to improve the efficiency of geo-distributed analytics systems. If the system has some knowledge of the underlying data distributions, queries can be optimized to reduce latency and improve resource efficiency. For example, we show that identifying devices with similar data distributions can accelerate model training in federated learning. Knowledge of data similarity across geo-distributed data sources can also be exploited to improve wide-area network efficiency and reduce query latency. We also show that inference serving systems can produce higher-accuracy inferences by dynamically selecting a model based on the data. By considering the properties of geo-distributed data sources, systems can optimally navigate trade-offs between resource usage, accuracy, and latency.
Multi-Tenant Geo-Distributed Data Analytics
(2019-07) Jonathan, Albert
Geo-distributed data analytics has gained much interest in recent years due to the need for extracting insights from geo-distributed data. Traditionally, data analytics has been done within a cluster/data center environment. However, analyzing geo-distributed data using existing cluster-based systems typically cannot satisfy the timeliness requirement of most applications and result in wasteful resource consumption due to the fundamental differences of the environments, especially due to the scarce, highly heterogeneous, and dynamic nature of the wide-area resources: compute power and network bandwidth. This thesis addresses the challenges faced by geo-distributed data analytics systems in ensuring high-performance and reliable execution of multiple data analytics applications/queries. Specifically, the focus is on sharing resources across multiple users, applications, and computing frameworks. Sharing resources is attractive as it increases resource utilization and reduces operational cost. However, ensuring high-performance execution of multiple applications in a shared environment is challenging as they may compete for the same resources, especially in a wide-area environment with scarce resources. Furthermore, dynamics such as workload variation, resource variation, stragglers, and failures are inevitable in large-scale distributed systems. These can cause large resource perturbation that significantly affect the performance of query executions. This thesis makes the following contributions. First, we present a resource sharing technique across multiple geo-distributed data analytics frameworks. The main challenge here is how to elastically partition resources while allowing high locality scheduling to each individual framework, which is critical to the execution performance of geo-distributed analytics queries. We then address the problem of how to identify and exploit common executions across multiple queries to mitigate wasteful resource consumption. We demonstrate that traditional multi-query optimization may degrade the overall query execution performance due to its lack of support for network awareness. Finally, we highlight the importance of adaptability in ensuring reliable query execution in the presence of dynamics, both for single and multiple query executions. We propose a systematic approach that can selectively determine which queries to adapt and how to adapt them based on the types of queries, dynamics, and optimization goals.
Optimizing Timeliness, Accuracy, and Cost in Geo-Distributed Data-Intensive Computing Systems
(2016-12) Heintz, Benjamin
Big Data touches every aspect of our lives, from the way we spend our free time to the way we make scientific discoveries. Netflix streamed more than 42 billion hours of video in 2015, and in the process recorded massive volumes of data to inform video recommendations and plan investments in new content. The CERN Large Hadron Collider produces enough data to fill more than one billion DVDs every week, and this data has led to the discovery of the Higgs boson particle. Such large scale computing is challenging because no one machine is capable of ingesting, storing, or processing all of the data. Instead, applications require distributed systems comprising many machines working in concert. Adding to the challenge, many data streams originate from geographically distributed sources. Scientific sensors such as LIGO span multiple sites and generate data too massive to process at any one location. The machines that analyze these data are also geo-distributed; for example Netflix and Facebook users span the globe, and so do the machines used to analyze their behavior. Many applications need to process geo-distributed data on geo-distributed systems with low latency. A key challenge in achieving this requirement is determining where to carry out the computation. For applications that process unbounded data streams, two performance metrics are critical: WAN traffic and staleness (i.e., delay in receiving results). To optimize these metrics, a system must determine when to communicate results from distributed resources to a central data warehouse. As an additional challenge, constrained WAN bandwidth often renders exact computation infeasible. Fortunately, many applications can tolerate inaccuracy, albeit with diverse preferences. To support diverse applications, systems must determine what partial results to communicate in order to achieve the desired staleness-error tradeoff. This thesis presents answers to these three questions--where to compute, when to communicate, and what partial results to communicate--in two contexts: batch computing, where the complete input data set is available prior to computation; and stream computing, where input data are continuously generated. We also explore the challenges facing emerging programming models and execution engines that unify stream and batch computing.
Transaction and data consistency models for cloud applications
(2014-02) Padhye, Vinit A.
The emergence of cloud computing and large-scale Internet services has given rise to new classes of data management systems, commonly referred to as NoSQL systems. The NoSQL systems provide high scalability and availability, however they provide only limited form of transaction support and weak consistency models. There are many applications that require more useful transaction and data consistency models than those currently provided by the NoSQL systems. In this thesis, we address the problem of providing scalable transaction support and appropriate consistency models for cluster based as well as geo-replicated NoSQL systems. The models we develop in this thesis are founded upon the snapshot isolation (SI) model which has been recognized as attractive for scalability. In supporting transactions on cluster-based NoSQL systems, we introduce a notion of decoupled transaction management in which transaction management functions are decoupled from storage system and integrated with the application layer. We present two system architectures based on this concept. In the first system architecture all transaction management functions are executed in a fully decentralized manner by the application processes. The second architecture is based on a hybrid approach in which the conflict detection functions are performed by a dedicated service. Because the SI model can lead to non-serializable transaction executions, we investigate two approaches for ensuring serializability. We perform a comparative evaluation of the two architectures and approaches for guaranteeing serializability and demonstrate their scalability. For transaction management in geo-replicated systems, we propose an SI based transaction model, referred to as causal snapshot isolation (CSI), which provides causal consistency using asynchronous replication. The causal consistency model provides more useful consistency guarantees than the eventual consistency model. We build upon the CSI model to provide an efficient transaction model for partially replicated databases, addressing the unique challenges raised due to partial replication in supporting snapshot isolation and causal consistency. Through experimental evaluations, we demonstrate the scalability and performance of our mechanisms.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Distributed Systems"