Numerical linear algebra techniques for effective data analysis.

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Numerical linear algebra techniques for effective data analysis.

Published Date

2010-09

Publisher

Type

Thesis or Dissertation

Abstract

Data analysis is a process of inspecting and obtaining useful information from the data, with the goal of knowledge and scientific discovery. It brings together several disciplines in mathematics and computer science, including statistics, machine learning, database, data mining, and pattern recognition, to name just a few. A typical challenge with the current era of information technology is the availability of large volumes of data, together with ``the curse of dimensionality''. From the computational point of view, such a challenge urges efficient algorithms that can scale with the size and the dimension of the data. Numerical linear algebra lays a solid foundation for this task via its rich theory and elegant techniques. There are a large amount of examples which show that numerical linear algebra consists of a crucial ingredient in the process of data analysis. In this thesis, we elaborate on the above viewpoint via four problems, all of which have significant real-world applications. We propose efficient algorithms based on matrix techniques for solving each problem, with guaranteed low computational costs and high quality results. In the first scenario, a set of so called Lanczos vectors are used as an alternative to the principal eigenvectors/singular vectors in some processes of reducing the dimension of the data. The Lanczos vectors can be computed inexpensively, and they can be used to preserve the latent information of the data, resulting in a quality as good as by using eigenvectors/singular vectors. In the second scenario, we consider the construction of a nearest-neighbors graph. Under the framework of divide and conquer and via the use of the Lanczos procedure, two algorithms are designed with sub-quadratic (and close to linear) costs, way more efficient than existing practical algorithms when the data at hand are of very high dimension. In the third scenario, a matrix blocking algorithm for reordering and finding dense diagonal blocks of a sparse matrix is adapted, to identify dense subgraphs of a sparse graph, with broad applications in community detection for social, biological and information networks. Finally, in the fourth scenario, we visit the classical problem of sampling a very high dimensional Gaussian distribution in statistical data analysis. A technique of computing a function of a matrix times a vector is developed, to remedy traditional techniques (such as via the Cholesky factorization of the covariance matrix) that are limited to mega-dimensions in practice. The new technique has a potential for sampling Guassian distributions in tera/peta-dimensions, which is typically required for large-scale simulations and uncertainty quantifications./

Description

University of Minnesota Ph.D. dissertation. September 2010. Major: Computer Science. Advisor: Yousef Saad. 1 computer file (PDF); xi, 162 pages. Ill. (some col.)

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Chen, Jie. (2010). Numerical linear algebra techniques for effective data analysis.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/95933.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.