Numerical linear algebra techniques for effective data analysis.

Data analysis is a process of inspecting and obtaining useful information from the data, with the goal of knowledge and scientific discovery. It brings together several disciplines in mathematics and computer science, including statistics, machine learning, database, data mining, and pattern recognition, to name just a few. A typical challenge with the current era of information technology is the availability of large volumes of data, together with ``the curse of dimensionality''. From the computational point of view, such a challenge urges efficient algorithms that can scale with the size and the dimension of the data. Numerical linear algebra lays a solid foundation for this task via its rich theory and elegant techniques. There are a large amount of examples which show that numerical linear algebra consists of a crucial ingredient in the process of data analysis. In this thesis, we elaborate on the above viewpoint via four problems, all of which have significant real-world applications. We propose efficient algorithms based on matrix techniques for solving each problem, with guaranteed low computational costs and high quality results. In the first scenario, a set of so called Lanczos vectors are used as an alternative to the principal eigenvectors/singular vectors in some processes of reducing the dimension of the data. The Lanczos vectors can be computed inexpensively, and they can be used to preserve the latent information of the data, resulting in a quality as good as by using eigenvectors/singular vectors. In the second scenario, we consider the construction of a nearest-neighbors graph. Under the framework of divide and conquer and via the use of the Lanczos procedure, two algorithms are designed with sub-quadratic (and close to linear) costs, way more efficient than existing practical algorithms when the data at hand are of very high dimension. In the third scenario, a matrix blocking algorithm for reordering and finding dense diagonal blocks of a sparse matrix is adapted, to identify dense subgraphs of a sparse graph, with broad applications in community detection for social, biological and information networks. Finally, in the fourth scenario, we visit the classical problem of sampling a very high dimensional Gaussian distribution in statistical data analysis. A technique of computing a function of a matrix times a vector is developed, to remedy traditional techniques (such as via the Cholesky factorization of the covariance matrix) that are limited to mega-dimensions in practice. The new technique has a potential for sampling Guassian distributions in tera/peta-dimensions, which is typically required for large-scale simulations and uncertainty quantifications./

Keywords

Data analysis

Data mining

Numerical linear algebra

Pattern recognition

Statistics

Computer Science

Description

University of Minnesota Ph.D. dissertation. September 2010. Major: Computer Science. Advisor: Yousef Saad. 1 computer file (PDF); xi, 162 pages. Ill. (some col.)

Collections

Dissertations

Suggested citation

Chen, Jie. (2010). Numerical linear algebra techniques for effective data analysis.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/95933.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Numerical linear algebra techniques for effective data analysis.

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Numerical linear algebra techniques for effective data analysis.

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation