Sparsity control for robustness and social data analysis.

The information explosion propelled by the advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. The ability to mine valuable information from unprecedented volumes of data will facilitate preventing or limiting the spread of epidemics and diseases, identifying trends in global financial markets, protecting critical infrastructure including the smart grid, and understanding the social and behavioral dynamics of emergent social-computational systems. Along with data that adhere to postulated models, present in large volumes of data are also those that do not – the so-termed outliers. This thesis contributes in several issues that pertain to resilience against outliers, a fundamental aspect of statistical inference tasks such as estimation, model selection, prediction, classification, tracking, and dimensionality reduction, to name a few. The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present thesis introduces a neat link between sparsity and robustness against outliers, even when the signals involved are not sparse. It is argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. Even though focus is placed first on robustifying linear regression, the universality of the developed framework is highlighted through diverse generalizations that pertain to: i) the information used for selecting the sparsity-controlling parameters; ii) the nominal data model; and iii) the criterion adopted to fit the chosen model. Explored application domains include preference measurement for consumer utility function estimation in marketing, and load curve cleansing – a critical task in power systems engineering and management. Finally, robust principal component analysis (PCA) algorithms are developed to extract the most informative low-dimensional structure, from (grossly corrupted) high-dimensional data. Beyond its ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.

Keywords

Big data

Compressed sensing

Electrical Engineering

Description

University of Minnesota Ph.D. dissertation. May 2012. Major: Electrical Engineering. Advisor: Professor Georgios B. Giannakis. 1 computer file (PDF); ix, 126 pages, appendices p. 110 115.

Collections

Dissertations

Suggested citation

Mateos Buckstein, Gonzalo. (2012). Sparsity control for robustness and social data analysis.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/129576.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Sparsity control for robustness and social data analysis.

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Sparsity control for robustness and social data analysis.

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation