Browsing by Subject "locally repairable codes"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Data Estimation and Recovery for Distributed Systems(2018-06) Agarwal, AbhishekThis thesis is devoted to a range of data estimation problems in applied mathematics and signal processing motivated by applications in data storage, broadcast transmission, and group testing. We use a common toolset of diverse information theoretic, combinatorial and probabilistic arguments to attack these problems. There are two primary themes in the data estimation problems we look at : 1. Estimation of data when part of the data is lost 2. Data estimation when constraints on input sparsity are known The first estimation problem is well known in classical coding theory. We look at a more recently emerging variant of the classical problem in the context of handling constraints related to data storage and transmission. Large data storage networks encounter frequent server failures which need to be repaired in a cost effective manner. A simple technique of data replication, although inefficient in terms of storage, provides suprisingly low repair costs. Error-correcting codes, on the other hand, are efficient in terms of storage but require large repair cost for single server failures. The trade-off between storage and repair cost has been studied intensively in the past few years due to its significance to big data storage. We study a particular class of codes, called Locally Repairable Codes, that optimize this trade-off for a specific cost metric, locality, that denotes the number of other servers accessed to repair a single failure. For traditional error-correcting codes trade-offs (bounds) between their parameters (minimum distance, code size, and length) have been well studied. Even for Locally Repairable Codes, analogous bounds have been discovered especially for constructions of the codes in a large field. We derive bounds on the size of Locally Repairable Codes for small alphabets. Another issue that is crucial for the design of distributed systems is their resilience to adversarial attacks. We want to guarantee security of the data from an eavesdropper that can access any subset of a fixed number of storage nodes. We analyze the limits and constructions of secure codes for distributed storage systems. Our techniques are readily applicable to a variety of distributed systems such as systems with co-operative local repair (i.e. more than one failures occur simultaneously) and distributed systems represented by arbitrary graphs with vertices corresponding to storage servers. We also study an interesting practical problem related to repairable codes on graphs known as Index Coding. Consider a broadcast transmitter that wants to send a separate stream of data to all the users in a network such that users may have information about data corresponding to some of the other users. We can represent this side information with a graph on the set of users as vertices, where the neighbors of the vertices represents their side information. Our aim is to minimize the broadcast transmissions while satisfying the requirements of each user. For the same graph representing the storage nodes and the users in the distributed storage and Index Coding problems, the optimal linear schemes are known to be dual to each other. We provide a solution to the Index Coding problem in terms of existing graph theoretic quantities, combining several existing notions into a single generalized solution. The second part of the thesis focuses on data estimation problems where we use known sparsity information about the input to reduce the number of measurements required for its estimation. In particular, we focus on the Group Testing problem. In this problem, we want to identify a sparse set of defective items using a small number of tests by exploiting the information about the sparsity of the defectives. Besides its intrinsic appeal as a fundamental estimation problem, this problem has a variety of diverse applications, such as in bioinformatics, wireless communications, and pattern finding. First, we focus on the problem of identifying only the number of defectives (sparsity) with zero-error. We analyze bounds on the number of tests needed to estimate the sparsity. Further, we focus on the problem of identifying the defective set with a small probability of error. There is a large gap in the number of tests required between existing construction schemes and impossibility results when the number of defectives grows linearly with the number of items. We exploit correlation between tests to improve the converse bounds in this case.