Browsing by Subject "Sparsity"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item Leveraging Sparsity and Low Rank for Large-Scale Networks and Data Science(2015-05) Mardani, MortezaWe live in an era of ``data deluge," with pervasive sensors collecting massive amounts of information on every bit of our lives, churning out enormous streams of raw data in a wide variety of formats. While big data may bring ``big blessings," there are formidable challenges in dealing with large-scale datasets. The sheer volume of data makes it often impossible to run analytics using central processors and storage units. Network data are also often geographically spread, and collecting the data might be infeasible due to communication costs or privacy concerns. Disparate origin of data also makes the datasets often incomplete, and thus a sizable portion of entries are missing. Moreover, large-scale data are prone to contain corrupted measurements, communication errors, and even su ffer from anomalies due to cyberattacks. Moreover, as many sources continuously generate data in real time, analytics must often be performed online as well as without an opportunity to revisit past data. Last but not least, due to variety, data is typically indexed by multiple dimensions. Towards our vision to facilitate learning, this thesis contributes to cope with these challenges via leveraging the low intrinsic-dimensionality of data by means of sparsity and low rank. To build a versatile model capturing various data irregularities, the present thesis focuses first on a low-rank plus compressed-sparse matrix model, which proves successful in unveiling trffia c anomalies in backbone networks. Leveraging the nuclear and \ell_1-norm, exact reconstruction guarantees are established for a convex estimator of the unknowns. Inspired by the crucial task of network tra ffic monitoring, the scope of this model and recovery task is broaden to a tomographic task of jointly mapping out nominal and anomalous tra ffic from undersampled linear measurements. Despite the success of nuclear-norm minimization in capturing the data low dimensionality, it scales very poorly with the data size mainly due to its tangled nature. This indeed hinders decentralized and streaming analytics. To mitigate this computational challenge, this thesis puts forth a neat framework which permeates benefits from a bilinear characterization of nuclear-norm to bring separability at the expense of nonconvexity. Notwithstanding, it is proven that under certain conditions stationary points of nonconvex program coincide with the optimum of the convex counterpart. Using this idea along with theory of alternating minimization we develop lightweight algorithms with low communication-overhead for in-network processing; and provably convergent online ones suitable for streaming analytics. All in all, the major innovative claim is that even with the budget of distributed computation and sequential acquisition one can hope to achieve accurate reconstruction guarantees o ffered by the batch nuclear-norm minimization. Finally, inspired by the k-space data interpolation task appearing in dynamic magnetic resonance imaging, a novel tensor subspace learning framework is introduced to handle streaming multidimensional data. It capitalizes on the PARAFAC decomposition and e effects low tensor rank by means of the Tykhonov regularization, that enjoys separability and offers real-time MRI reconstruction tailoring e.g., image-guided radiation therapy applications.Item Leveraging sparsity for genetic and wireless cognitive networks(2013-08) Bazerque, Juan AndresSparse graphical models can capture uncertainty of interconnected systems while promoting parsimony and simplicity - two attributes that can be utilized to identify the topology and control processes defined on networks. This thesis advocates such models in the context of learning the structure of gene-regulatory networks, for which it is argued that single nucleotide polymorphisms can be seen as perturbation data that are critical to identify edge directionality. Applied to the immune-related gene network, these models facilitate the discovery of new regulation pathways. Learning gene-regulating interactions is critical not only to understand how cells differentiate and behave, but also to decipher mechanisms triggering diseases with a genetic component. The impact here is on the development of a new generation of drugs designed to target specific genes. In particular, the genetic interactions of an uncharacterized chemical compound are identified by comparing its effect on the fitness of Saccharomyces cerevisiae (yeast) to that of double-deletion knockouts. As drug targeting is limited by expensive and time-involving laboratory tests, a judicious design of experiments is instrumental in order to reduce the required number of diagnostic mutant strains. During in-vitro experiments with 82 test-drugs, an orderly data reduction of 30% was shown possible without altering the identification of the primary chemical-genetic interactions. Sparsity in wireless cognitive networks emerges due to the geographical distribution of sources, and also due to the scarcity of the radio frequency spectrum used for transmission. In this context, sparsity is leveraged for mapping the interference temperature across space while identifying unoccupied frequency bands. This is achieved by a novel so-terms nonparametric basis pursuit (NBP) method, which entails a basis expansion model with coefficients belonging to a function space. The spatial awareness markedly impacts spectral efficiency, especially when cognitive radios collaborate to reach consensus in a decentralized manner. Tested in a simulated communication setting, NBP captures successfully both shadowing as well as path-loss effects. In additional tests with real-field RF measurements, the spectrum maps reveal the frequency bands utilized for transmission and also reveal the position of the sources. Finally, a blind NBP alternative is introduced to yield a Bayesian nuclear-norm regularization approach for matrix completion. In this context, it becomes possible to incorporate prior covariance information which enables smoothing and prediction. Blind NBP can be further applied to impute missing entries of third- or higher-order data arrays (tensors). These attracted features of blind NBP are illustrated for network flow prediction and imputation of missing entries in three-way ribonucleic-acid (RNA) sequencing arrays and magnetic-resonance-imaging (MRI) tensors.Item Non-Convex Phase Retrieval Algorithms and Performance Analysis(2018-04) Wang, GangHigh-dimensional signal estimation plays a fundamental role in various science and engineering applications, including optical and medical imaging, wireless communications, and power system monitoring. The ability to devise solution procedures that maintain high computational and statistical efficiency will facilitate increasing the resolution and speed of lensless imaging, identifying artifacts in products intended for military or national security, as well as protecting critical infrastructure including the smart power grid. This thesis contributes in both theory and methods to the fundamental problem of phase retrieval of high-dimensional (sparse) signals from magnitude-only measurements. Our vision is to leverage exciting advances in non-convex optimization and statistical learning to devise algorithmic tools that are simple, scalable, and easy-to-implement, while being computationally and statistically (near-)optimal. Phase retrieval is approached from a non-convex optimization perspective. To gain statistical and computational efficiency, the magnitude data (instead of the intensities) are fitted based on the least-squares or maximum likelihood criterion, which leads to optimization models that trade off smoothness for ‘low-order’ non-convexity. To solve the resultant challenging nonconvex and non-smooth optimization, the present thesis introduces a two-stage algorithmic framework, that is termed amplitude flow. The amplitude flows start with a careful initialization, which is subsequently refined by a sequence of regularized gradient-type iterations. Both stages are lightweight, and they scale well with problem dimensions. Due to the highly non-convex landscape, judicious gradient regularization techniques such as trimming (i.e., truncation) and iterative reweighting are devised to boost the exact phase recovery performance. It is shown that successive iterates of the amplitude flows provably converge to the global optimum at a geometric rate, corroborating their efficiency in terms of computational, storage, and data resources. The amplitude flows are also demonstrated to be stable vis-a-vis additive noise. Sparsity plays a instrumental role in many scientific fields - what has led to the upsurge of research referred to as compressive sampling. In diverse applications, the signal is naturally sparse or admits a sparse representation after some known and deterministic linear transformation. This thesis also accounts for phase retrieval of sparse signals, by putting forth sparsity-cognizant amplitude flow variants. Although analysis, comparisons, and corroborating tests focus on non-convex phase retrieval in this thesis, a succinct overview of other areas is provided to highlight the universality of the novel algorithmic framework to a number of intriguing future research directions.Item Robust hybrid linear modeling and its applications.(2012-08) Wang, YiHybrid Linear Modeling (HLM) uses a set of affine subspaces to model data and has been widely used in computer vision. However, many segmentation algorithms need to know d and K as a priori. Therefore, determining the dimension d and the number K of subspaces is an important problem in HLM. In this manuscript, we suggest two automatic ways to empirically find d and K. One obtains local estimation of the dimension by examining the geometric structure of a neighborhood. The other finds K or both d and K by detecting the "elbow" of the least square error. We provide a partial justification of the elbow method for special cases. We also demonstrate the accuracy and speed of our methods on synthetic and real hybrid linear data. Another challenge in HLM is to deal with highly corrupted data. We study the related problems of denoising images corrupted by impulsive noise and blind inpainting (i.e., inpainting when the deteriorated region is unknown). Our basic approach is to model the set of patches of pixels in an image as a union of low dimensional subspaces, corrupted by sparse but perhaps large magnitude noise. For this purpose, we develop a robust and iterative method for single subspace modeling and extend it to an iterative algorithm for modeling multiple subspaces. We prove convergence for both algorithms and carefully compare our methods with other recent ideas for such robust modeling. We demonstrate state of the art performance of our method for both imaging problems.Item Signal approximation via the Gopher Fast Fourier Transform(University of Minnesota. Institute for Mathematics and Its Applications, 2010-07) Segal, I. Ben; Iwen, M.A.Item Sparse spectrum fitting in array processing(2013-02) Zheng, JimengIn this thesis, we focus on the application of sparsity-encouraging regularization techniques to the problem of Direction-Of-Arrival (DOA) estimation using sensor arrays. By developing the sparse representation models for the spatial covariance matrix of correlated or uncorrelated sources respectively, the DOA estimation problem is reformulated under the framework of Sparse Signal Reconstruction (SSR). The L1-Norm regularization technique is employed to formulate sparsity-exploiting DOA estimation algorithms, which can estimate DOAs and signal powers simultaneously. The algorithm specialized for uncorrelated sources, Sparse Spectrum Fitting (SpSF), is attractive for its computational complexity, resolution capability, low large error threshold and robustness with respect to source correlation (when combined with spatial smoothing and forward-backward smoothing). Despite the similarity between the formulation of SpSF and ordinary SSR algorithms, such as Lasso [1], SpSF can perform much better than predicted by the existing theoretical results in SSR literature, because of the extra positivity constraints in its formulation (which are included since the optimization variables of SpSF represent the signal powers). Thus, we begin this thesis by developing and justifying its formulations. This is followed by a discussion of the existence and uniqueness of the solutions of SpSF, which provides an explicit formula for the maximum number of sources whose DOAs can be reliably estimated by SpSF. Although it is hard to directly relate this maximum number of sources to the number of sensors M, we recognize that this maximum number is actually the degree of freedom of the co-array and, as an exception, we prove that by using Uniform Linear Arrays (ULA), SpSF can work with M-by-1 sources. Further, by combining with the uniqueness result, the estimation consistency of SpSF with respect to infinitely many snapshots and sensors is obtained. Another focus of this thesis is the selection of the regularization parameter (RP) of SpSF. Inspired by the theoretical analysis in the first part of this thesis, Diagonal Loading technique is used to significantly reduce the sensitivity of SpSF with respect to its RP, which further enables the control of such sensitivity. Based on the Discrepancy Principle, an iterative version of SpSF, Iterative-SpSF (I-SpSF), is formulated as a parameter-free optimization algorithm and shown to achieve similar or even better performance than SpSF using exhaustively searched but fixed RPs. Following that, by analyzing SpSF's probability of perfect support recovery, an upper bound on such probability is proposed. This upper bound can accurately approximate the probability of perfect DOA estimates of SpSF (perfect DOA estimates is only possible in the context of DOAs belonging to the grid of candidate directions). Based on this upper bound and a Monte Carlo evaluation process, an automatic and data-adaptive RP selector is proposed for the purpose of DOA estimation when the number of snapshots is finite. This technique is robust with respect to its parameters and can help SpSF to achieve almost the same or even better performance than exhaustively searched and fixed RPs for uncorrelated and correlated sources respectively. The extensions of SpSF to the cases of o -grid DOAs and moving sources are considered and two algorithms SpSF with Modeling Uncertainty (SpSFMU) and Persistently Active Block Sparsity (PABS) are proposed for the two cases, respectively. By using extra variables to parameterize the modeling errors caused by o -grid DOAs, SpSFMU is formulated as a convex optimization problem that can provide continuous DOA estimates. Under the presence of o -grid DOAs, SpSFMU can be used either to improve the estimation performance of SpSF or to reduce its computational complexity. In contrast, PABS is introduced as a general estimator for inconsistent but persistently active sparse models. PABS is formulated by using a novel objective function promoting Block-Level Sparsity and Element-Level Sparsity simultaneously. Then, on the basis of the sparse representation model of array snapshots, PABS is applied to the DOA estimation of moving sources and shown to exhibit significant performance improvement over C-HiLasso and SPICE. Intensive simulation results are included in each chapter to illustrate the effectiveness and performance of the presented algorithms and theoretical analyses.Item Sparsity control for robustness and social data analysis.(2012-05) Mateos Buckstein, GonzaloThe information explosion propelled by the advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. The ability to mine valuable information from unprecedented volumes of data will facilitate preventing or limiting the spread of epidemics and diseases, identifying trends in global financial markets, protecting critical infrastructure including the smart grid, and understanding the social and behavioral dynamics of emergent social-computational systems. Along with data that adhere to postulated models, present in large volumes of data are also those that do not – the so-termed outliers. This thesis contributes in several issues that pertain to resilience against outliers, a fundamental aspect of statistical inference tasks such as estimation, model selection, prediction, classification, tracking, and dimensionality reduction, to name a few. The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present thesis introduces a neat link between sparsity and robustness against outliers, even when the signals involved are not sparse. It is argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. Even though focus is placed first on robustifying linear regression, the universality of the developed framework is highlighted through diverse generalizations that pertain to: i) the information used for selecting the sparsity-controlling parameters; ii) the nominal data model; and iii) the criterion adopted to fit the chosen model. Explored application domains include preference measurement for consumer utility function estimation in marketing, and load curve cleansing – a critical task in power systems engineering and management. Finally, robust principal component analysis (PCA) algorithms are developed to extract the most informative low-dimensional structure, from (grossly corrupted) high-dimensional data. Beyond its ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.