Browsing by Subject "Dictionary learning"

Now showing 1 - 7 of 7

A convex model for matrix factorization and dimensionality reduction on physical space and its application to blind hyperspectral unmixing
(University of Minnesota. Institute for Mathematics and Its Applications, 2010-10) Möller, Michael; Esser, Ernie; Osher, Stanley; Sapiro, Guillermo; Xin, Jack
Dictionary learning and sparse coding for unsupervised clustering
(University of Minnesota. Institute for Mathematics and Its Applications, 2009-09) Sprechmann, Pablo; Sapiro, Guillermo
Distributed and robust techniques for statistical learning.
(2012-05) Forero, Pedro Andrés
The last decade has been marked by the advent of networked systems able to gather tremendous amounts of data. Testimony of this trend are data collection projects and digital services such as Google books, Internet marketing, and social networking sites. To fully exploit the potential benefits hidden in large collections of data, this thesis argues that more emphasis must be placed on data processing. Statistical learning approaches seeking to uncover the “right” information within the data while able to deal with their complexities are needed. Dealing with vast amounts of data, possibly distributed across multiple locations and often contaminated with outliers (inconsistent data) and missing entries, poses formidable processing challenges. This thesis takes a step forth towards overcoming the aforementioned challenges by proposing novel problem formulations and capitalizing on contemporary tools from optimization and compressive sampling. Power-limited networked systems deployed for data acquisition can elongate their service life by collaboratively processing data in-situ rather than transmitting all data back to a centralized processing unit. With this premise in mind, the viability of a fully distributed framework for clustering and classification is explored. Capitalizing on the idea of consensus, algorithms with performance guarantees equivalent to the ones achieved by a centralized algorithm having access to all network data are developed. Due to their wide applicability and popularity, focus is placed on developing alternatives for support vector machines, K-means and expectation-maximization algorithms. Managing the quality of data poses a major challenge. Outliers are hard to identify, especially in high dimensional data. The presence of outliers can be due to faulty sensors, malicious sources, model mismatch, or rarely seen events. In all cases, ill-handled outliers can deteriorate the performance of any information processing and management scheme. Robust clustering algorithms relying on a data model that explicitly captures outliers are developed. The outlier-aware data model translates the rare occurences of outliers in data to sparsity of pertinent outliers variables, thereby establishing a neat link between clustering and the area of compressive sampling. A similar outlier-aware model is used to derive robust versions of multidimensional scaling algorithms for high-dimensional data visualization. In this context, a robust multidimensional scaling algorithm able to cope with a common structured outlier contamination is also developed. Using data with missing entries is also challenging. Missing data can occur due to faulty sensors, privacy concerns, and limited measurement budgets. Specifically, prediction of a dynamical process evolving on a network based on observations at a few nodes is explored. Here, tools from semi-supervised learning and dictionary learning are leveraged to develop batch and online topology- and data-driven prediction algorithms able to cope with missing data.
Hierarchical dictionary learning for invariant classification
(University of Minnesota. Institute for Mathematics and Its Applications, 2009-09) Bar, Leah; Sapiro, Guillermo
Similarity search in visual data
(2013-01) Cherian, Anoop
Contemporary times have witnessed a significant increase in the amount of data available on the Internet. Organizing such big data so that it is easily and promptly accessible, is a necessity that has been growing in importance. Among the various data modalities such as text, audio, etc., visual data (in the form of images and videos) constitute a major share of this available content. Contrary to other data modalities, visual data pose several significant challenges to storage and retrieval, namely (i) choosing an appropriate representation that can capture the essence of visual data is often non-trivial, and (ii) visual search and retrieval are often subjective, as a result computing semantically meaningful results is hard. On the other hand, visual data possesses rich structure. Exploiting this structure might help address these challenges. Motivated by these observations, this thesis explores new algorithms for efficient similarity search in structured visual data; “structure” is synonymous with the mathematical representation that captures desirable data properties. We will deal with two classes of such structures that are common in computer vision, namely (i) symmetric positive definite matrices as covariances, and (ii) sparse data representations in a dictionary learned from the data. Covariance valued data has found immense success in several mainstream computer vision applications such as visual surveillance, emotion recognition, face recognition, etc. Moreover, it is of fundamental importance in several other disciplines such as magnetic resonance imaging, speech recognition, etc. A technical challenge in computing similarities on such matrix valued data is their non-Euclidean nature. These matrices belong to a curved manifold where distances between data points are no more along straight lines, but along curved geodesics. As a result, state-of-the-art measures for comparing covariances tend to be slow. To address this issue, we propose a novel similarity measure on covariance matrices-the Jensen-Bregman LogDet divergence-which is fast, but at the same time preserves the accuracy of retrieval compared to natural distances on the manifold. To scale our retrieval framework for large covariance datasets, we propose a metric tree data structure on this new measure. Next, as clustering forms an important ingredient for several search algorithms, we investigate this component independently and propose a novel unsupervised algorithm based on the Dirichlet process mixture model for clustering covariance valued data. The second part of this thesis addresses similarity search problems for high dimensional vector valued data. Such data is ubiquitous not only in computer vision, but also in several other disciplines including data mining, machine learning, and robotics. As the dimensionality of the data increases, computing meaningful similarities becomes increasingly difficult due to the curse of dimensionality. Our approach to deal with this problem is inspired from the principles of dictionary learning and sparse coding. Our main idea is to learn an overcomplete dictionary of subspaces from the data so that each data point can be approximated by a sparse linear combination of these subspaces. We introduce a tuple based data descriptor on these sparse combinations-Subspace Combination Tuple-that is storage efficient, fast in retrieval, and provides superior accuracy for NN retrieval against the state-of-the-art. These benefits come at a price; the sparse representations are often sensitive to data perturbations. To circumvent this issue, we propose several algorithms for robust dictionary learning and sparse coding. Extending the sparse coding framework to matrix valued data for hashing covariances forms the content for the third part of this thesis. Towards this end, we propose our novel Generalized dictionary learning framework. We describe the theoretical motivations and provide extensive experimental evidence for demonstrating the benefits of our algorithms.
Sparse coding and dictionary learning based on the MDL principle
(University of Minnesota. Institute for Mathematics and Its Applications, 2010-10) Ramírez, Ignacio; Sapiro, Guillermo
Structured sparse models with applications
(2012-10) Sprechmann, Pablo G.
Sparse models assume minimal prior knowledge about the data, asserting that the signal has many coefficients close or equal to zero when represented in a given domain. From a data modeling point of view, sparsity can be seen as a form of regularization, that is, as a device to restrict or control the set of coefficient values which are allowed in the model to produce an estimate of the data. In this way, flexibility of the model (that is, the ability of a model to fit given data) is reduced, and robustness is gained by ruling out unrealistic estimates of the coefficients. Implicitly, standard sparse models give the same relevance to all of the very large number of subsets of sparse nonzero coefficients (a number which grows exponentially with the number of atoms in the dictionary). This assumption can be easily proved false in many practical cases. Signals have in general a richer underlying structure that is merely disregarded by the model. In many situations, standard sparse models represent a very good trade off between model simplicity and accuracy. However, many practical situations could greatly benefit from exploiting the structure present in the data, potentially for interpretability purposes, improve performance and faster processing. The main goal of this thesis is to explore different ways of including data structure into sparse models and to evaluate them in real image and signal processing applications. The main directions of research are: (i) extending sparse models through imposing structure in the sparsity patterns of non-zero coefficients in order to stabilize the estimation and account for valuable prior knowledge of the signals; (ii) analyzing how this impacts in challenging real applications where the problem of estimating the model coefficients is very ill-posed. As a fundamental example, the problem of monaural source separation will be extensively evaluated throughout the thesis; (iii) studying ways of exploiting the underlying structure of the data in order to speed up the coding process. One of the most important challenges in sparse modeling is the relatively high computational complexity of the inference algorithms, which is of critical importance when dealing with very large scale (modern-size) applications as well as real-time processing.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Dictionary learning"