Browsing by Author "Jeon, Moongu"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Dimension Reduction Based on Centroids and Least Squares for Efficient Processing of Text Data(2001-02-08) Jeon, Moongu; Park, Haesun; Rosen, J. BenDimension reduction in today's vector space based information retrieval system is essential for improving computational efficiency in handling massive data. In our previous work we proposed a mathematical framework for lower dimensional representations of text data in vector space based information retrieval, and a couple of dimension reduction methods using minimization and matrix rank reduction formula. One of our proposed methods is CentroidQR method which utilizes orthogonal transformation on centroids, and the test results showed that its classification results were exactly the same as those of classification with full dimension when a certain classification algorithm is applied. In this paper we discuss in detail the CentroidQR method, and prove mathematically its classification properties with two different similarity measures of L2 and cosine.Item Dimension Reduction for Text Data Representation Based on Cluster Structure Preserving Projection(2001-03-05) Park, Haesun; Jeon, Moongu; Howland, PegIn today's vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A limitation of discriminant analysis is that one of its scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of our data. We also show that, for k clusters, the right generalized singular vectors that correspond to the k-1 largest generalized singular values are all we need to compute the optimal transformation to the reduced dimension. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.Item Lower Dimensional Representation of Text Data in Vector Space Based Information Retrieval(2000-12-06) Park, Haesun; Jeon, Moongu; Rosen, J. BenDimension reduction in today's vector space based information retrieval system is essen-tial for improving computational efficiency in handling massive data.In this paper, we propose a mathematical framework for lower dimensional representa-tion of text data in vector space based in-formation retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent Semantic Indexing based on Singular Value Decom-position (LSI/SVD) can be derived as a method for dimension reduction from our mathematical framework. Then we propose a new approach which is more efficient and effective than LSI/SVD when we have a pri-ori information on the cluster structure of the data. Several advantages of the new meth-ods are discussed over the LSI/SVD in terms of computational efficiency and data representation in the reduced dimensional space.Experimental results are presented to illus-trate the effectiveness of our approach in certain classification problem in reduced di-mensional space. These results were com-puted using an information retrieval test sys-tem we are now developing. The results in-dicate that for a successful lower dimen-sional representation of data, it is important to incorporate a priori knowledge on data in dimension reduction.