Between Dec 19, 2024 and Jan 2, 2025, datasets can be submitted to DRUM but will not be processed until after the break. Staff will not be available to answer email during this period, and will not be able to provide DOIs until after Jan 2. If you are in need of a DOI during this period, consider Dryad or OpenICPSR. Submission responses to the UDC may also be delayed during this time.
 

Data Permutation Recovery from Noisy Data: Error Probability and Privacy

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Data Permutation Recovery from Noisy Data: Error Probability and Privacy

Published Date

2024-05

Publisher

Type

Thesis or Dissertation

Abstract

This doctoral thesis investigates the data permutation recovery problem from noisy observations, a fundamental challenge at the intersection of data science and privacy-preserving technologies. The main question is: Given a noisy observation of data, according to which permutation was the original data sorted? This thesis addresses both the exact and approximate permutation recovery problems under various noise conditions. It begins by formulating the permutation recovery problem within the statistical hypothesis testing framework, and characterizes the linear regime, where the true permutation can be optimally estimated by only a linear transformation of the observation and a sorting operation. In particular, under Gaussian data and noise, this thesis derives the necessary and sufficient conditions for the linear regime in terms of the noise covariance matrix. Subsequently, this thesis shifts the focus to the examination of the error probability associated with linear decoders in the presence of Gaussian noise, but arbitrary data distribution. This analysis reveals the noise-dominated nature of the permutation recovery problem, illustrating how the error probability scales in both low- and high-noise regimes, and providing insights into the behavior of the error probability under different noise and data distributions. Specifically, the error probability of the permutation recovery problem grows linearly in the noise standard deviation σ in the low-noise regime, implying the noise-dominated nature of the problem. Advancing into the realm of data privacy, the thesis explores the private rank- ing recovery problem, aiming to establish fundamental trade-offs between estimation accuracy and privacy. Leveraging differential privacy metrics, it evaluates the effectiveness of various privacy-preserving mechanisms while ensuring the fidelity of the ranking recovery. Regarding the generalized permutation recovery problem, this thesis also proposes an approximate version of it. It demonstrates that this approximate version leads to an error probability having sub-linear behavior in σ in the low-noise regime, which is a notable difference with respect to the exact recovery. This finding highlights the potential of approximate recovery in enhancing the robustness and efficiency of permutation recovery. In summary, the thesis contributes significantly to our understanding of permutation recovery in noisy and privacy-sensitive environments. The thesis not only advances theoretical foundations but also provides practical strategies for tackling the challenges inherent in recovering data permutations, thereby offering valuable perspectives for advancements in data processing and privacy-preserving techniques.

Description

University of Minnesota Ph.D. dissertation.May 2024. Major: Electrical/Computer Engineering. Advisor: Martina Cardone. 1 computer file (PDF); xi, 165 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Jeong, Minoh. (2024). Data Permutation Recovery from Noisy Data: Error Probability and Privacy. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/264317.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.