Browsing by Author "Howland, Peg"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Dimension Reduction for Text Data Representation Based on Cluster Structure Preserving Projection(2001-03-05) Park, Haesun; Jeon, Moongu; Howland, PegIn today's vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A limitation of discriminant analysis is that one of its scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of our data. We also show that, for k clusters, the right generalized singular vectors that correspond to the k-1 largest generalized singular values are all we need to compute the optimal transformation to the reduced dimension. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.Item Equivalence of Several Two-stage Methods for Linear Discriminant Analysis(2003-09-25) Howland, Peg; Park, HaesunLinear discriminant analysis (LDA) has been used for decades to extract features that preserve class separability. It is classically defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular restricts its application to data sets in which the dimension of the data does not exceed the sample size. Recently, the applicability of LDA has been extended by using the generalized singular value decomposition (GSVD) to circumvent the nonsingularity requirement. Alternatively, many studies have taken a two-stage approach in which the first stage reduces the dimension of the data enough so that it can be followed by classical LDA. In this paper, we justify the two-stage approach by establishing its equivalence to the single-stage LDA/GSVD method, provided either principal component analysis or latent semanticindexing is used in the first stage over a certain range ofintermediate dimensions. We also present a computationally simpler choice for the first stage, and conclude with a discussion of the relative merits of each approach.Item Extension of Discriminant Analysis based on the Generalized Singular Value Decomposition(2002-05-30) Howland, Peg; Park, HaesunDiscriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be utilized in application areas such as information retrieval to reduce the dimension of data while preserving its cluster structure. In the process, we establish relationships between the solutions obtained by various methods, which allow us to refine the optimization criteria and to improve the algorithms for achieving them.Item Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition(2003-02-21) Howland, Peg; Park, HaesunDiscriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from thereduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits.Item Text Classification using Support Vector Machines with Dimension Reduction(2003-02-21) Kim, Hyunsoo; Howland, Peg; Park, HaesunSupport vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent ofthe dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel dimension reduction methods to reduce the dimension of the document vectors dramatically. We also introduce decision functions for the centroid-based classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with severaldimension reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the dimension of the input space is significantly reduced.