We live in an era of ``data deluge," with pervasive sensors collecting massive amounts of information on every bit of our lives, churning out enormous streams of raw data in a wide variety of formats. While big data may bring ``big blessings," there are formidable challenges in dealing with large-scale datasets. The sheer volume of data makes it often impossible to run analytics using central processors and storage units. Network data are also often geographically spread, and collecting the data might be infeasible due to communication costs or privacy concerns. Disparate origin of data also makes the datasets often incomplete, and thus a sizable portion of entries are missing. Moreover, large-scale data are prone to contain corrupted measurements, communication errors, and even su ffer from anomalies due to cyberattacks. Moreover, as many sources continuously generate data in real time, analytics must often be performed online as well as without an opportunity to revisit past data. Last but not least, due to variety, data is typically indexed by multiple dimensions. Towards our vision to facilitate learning, this thesis contributes to cope with these challenges via leveraging the low intrinsic-dimensionality of data by means of sparsity and low rank. To build a versatile model capturing various data irregularities, the present thesis focuses first on a low-rank plus compressed-sparse matrix model, which proves successful in unveiling trffia c anomalies in backbone networks. Leveraging the nuclear and \ell_1-norm, exact reconstruction guarantees are established for a convex estimator of the unknowns. Inspired by the crucial task of network tra ffic monitoring, the scope of this model and recovery task is broaden to a tomographic task of jointly mapping out nominal and anomalous tra ffic from undersampled linear measurements. Despite the success of nuclear-norm minimization in capturing the data low dimensionality, it scales very poorly with the data size mainly due to its tangled nature. This indeed hinders decentralized and streaming analytics. To mitigate this computational challenge, this thesis puts forth a neat framework which permeates benefits from a bilinear characterization of nuclear-norm to bring separability at the expense of nonconvexity. Notwithstanding, it is proven that under certain conditions stationary points of nonconvex program coincide with the optimum of the convex counterpart. Using this idea along with theory of alternating minimization we develop lightweight algorithms with low communication-overhead for in-network processing; and provably convergent online ones suitable for streaming analytics. All in all, the major innovative claim is that even with the budget of distributed computation and sequential acquisition one can hope to achieve accurate reconstruction guarantees o ffered by the batch nuclear-norm minimization. Finally, inspired by the k-space data interpolation task appearing in dynamic magnetic resonance imaging, a novel tensor subspace learning framework is introduced to handle streaming multidimensional data. It capitalizes on the PARAFAC decomposition and e effects low tensor rank by means of the Tykhonov regularization, that enjoys separability and offers real-time MRI reconstruction tailoring e.g., image-guided radiation therapy applications.
University of Minnesota Ph.D. dissertation. May 2015. Major: Electrical/Computer Engineering. Advisor: Georgios B. Giannakis. 1 computer file (PDF); xii, 241 pages.
Leveraging Sparsity and Low Rank for Large-Scale Networks and Data Science.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.