Browsing by Subject "Numerical linear algebra"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Low dimensional approximations: problems and algorithms(2014-05) Ngo, Thanh TrungHigh dimensional data usually have intrinsic low rank representations. These low rank representations not only reveal the hidden structure of the data but also reduce the computational cost of data analysis. Therefore, finding low dimensional approximations of the data is an essential task in many data mining applications.Classical low dimensional approximations rely on two universal tools: the eigenvalue decomposition and the singular value decomposition. These two different but related decompositions are of high importance in a large number of areas in science and engineering. As a result, research in numerical linear algebra has been conducted to derive efficient algorithms for solving eigenvalue and singular value problems. Because available solvers for these problems are so well developed, they are often used as black boxes in data analysis.This thesis explores numerical linear algebra techniques and extends well-known methods for low rank approximations to solve new problems in data analysis. Specifically, we carefully analyze the trace ratio optimization and argue that solving this problem can be done efficiently. We also propose efficient algorithms for low rank matrix approximations with missing entries. We also reformulate and analyze classical problems from a different perspective. This reveals the connection between the proposed methods and traditional methods in numerical linear algebra. The performance of the proposed algorithms is established both theoretically and through extensive experiments in dimension reduction and collaborative filtering.Item Numerical linear algebra techniques for effective data analysis.(2010-09) Chen, JieData analysis is a process of inspecting and obtaining useful information from the data, with the goal of knowledge and scientific discovery. It brings together several disciplines in mathematics and computer science, including statistics, machine learning, database, data mining, and pattern recognition, to name just a few. A typical challenge with the current era of information technology is the availability of large volumes of data, together with ``the curse of dimensionality''. From the computational point of view, such a challenge urges efficient algorithms that can scale with the size and the dimension of the data. Numerical linear algebra lays a solid foundation for this task via its rich theory and elegant techniques. There are a large amount of examples which show that numerical linear algebra consists of a crucial ingredient in the process of data analysis. In this thesis, we elaborate on the above viewpoint via four problems, all of which have significant real-world applications. We propose efficient algorithms based on matrix techniques for solving each problem, with guaranteed low computational costs and high quality results. In the first scenario, a set of so called Lanczos vectors are used as an alternative to the principal eigenvectors/singular vectors in some processes of reducing the dimension of the data. The Lanczos vectors can be computed inexpensively, and they can be used to preserve the latent information of the data, resulting in a quality as good as by using eigenvectors/singular vectors. In the second scenario, we consider the construction of a nearest-neighbors graph. Under the framework of divide and conquer and via the use of the Lanczos procedure, two algorithms are designed with sub-quadratic (and close to linear) costs, way more efficient than existing practical algorithms when the data at hand are of very high dimension. In the third scenario, a matrix blocking algorithm for reordering and finding dense diagonal blocks of a sparse matrix is adapted, to identify dense subgraphs of a sparse graph, with broad applications in community detection for social, biological and information networks. Finally, in the fourth scenario, we visit the classical problem of sampling a very high dimensional Gaussian distribution in statistical data analysis. A technique of computing a function of a matrix times a vector is developed, to remedy traditional techniques (such as via the Cholesky factorization of the covariance matrix) that are limited to mega-dimensions in practice. The new technique has a potential for sampling Guassian distributions in tera/peta-dimensions, which is typically required for large-scale simulations and uncertainty quantifications./