Browsing by Author "Park, Haesun"

Now showing 1 - 20 of 24

A Balanced Term-Weighting Scheme for Effective Document Matching
(2001-02-07) Jung, Yunjae; Park, Haesun; Du, Ding-Zhu
A new weighting scheme for vector space model is presented to improve retrieval performance for an information retrieval system. In addition, a dimension compression method is introduced to reduce the computational cost of the weighting approach. The main idea of this approach is to consider not only occurrence terms but also absent terms in finding similarity patterns among document and query vectors. With a basic information retrieval development system which we are now developing, we evaluate the effect of the balanced weighting scheme and compare it with various combinations of weighting schemes in terms of retrieval performance. The experimental results show that the proposed scheme produces similar recall-precision results to the cosine measure, but more importantly enhances retrieval effectiveness. Since the scheme is based on the cosine measure, it is certain that it has insensitivity to weight variance. The results have convincingly illustrated that the new approach is effective and applicable.
A Comparison of Generalized LDA Algorithms for Undersampled Problems
(2003-12-11) Park, Cheonghee; Park, Haesun
Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the between-class scatter and minimizes the within-class scatter. In undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA applicable for undersampled problems, several generalizations of the LDA have been proposed recently. In this paper, we present the theoretical and algorithmic relationships among several generalized LDA algorithms and compare their computational complexities and performances in text classification and face recognition. Towards a practical dimension reduction method for high dimensional data, an efficient algorithm is also proposed.
A Comparitive Study of Linear and Nonlinear Feature Extraction Methods
(2004-11-17) Park, Cheonghee; Park, Haesun; Pardalos, Panos
Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the between-class scatter and minimizes the withinclass scatter. However, in undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA applicable, several generalizations of the LDA have been proposed. This paper presents theoretical and algorithmic relationships among several generalized LDA algorithms. Utilizing the relationships among them, computationally efficient approaches to these algorithms are proposed. We also present nonlinear extensions of these LDA algorithms. The original data space is mapped to a feature space by an implicit nonlinear mapping through kernel methods. A generalized eigenvalue problem is formulated in the transformed feature space and generalized LDA algorithms are applied to solve the problem. Performances and computational complexities of these linear and nonlinear discriminant analysis algorithms are compared theoretically and experimentally.
A Fast Dimension Reduction Algorithm with Applications on Face Recognition and Text Classification
(2003-12-19) Park, Cheonghee; Park, Haesun
In undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply Linear Discriminant Analysis (LDA) due to the singularity of scatter matrices caused by high dimensionality. We propose a fast dimension reduction method based on a simple modification of Principal Component Analysis (PCA) and the orthogonal decomposition. The proposed algorithm is an efficient way to perform LDA for undersampled problems. Our experimental results in face recognition and text classification demonstrate the effectiveness of our proposed method.
A new optimization criterion for generalized discriminant analysis on undersampled problems
(2003-06-10) Ye, Jieping; Janardan, Ravi; Park, Cheonghee; Park, Haesun
We present a new optimization criterion for discriminant analysis. The new criterion extends the optimization criteria of the classical linear discriminant analysis (LDA) by introducing the pseudo-inverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size,overcoming a limitation of the classical LDA. Recently, a new algorithm called LDA/GSVD for structure-preserving dimension reduction has been introduced, which extends the classical LDA to very high-dimensional undersampled problems by using the generalized singular value decomposition (GSVD). The solution from the LDA/GSVD algorithm is a special case of the solution for our generalized criterion in this paper, which is also based on GSVD. We also present an approximate solution for our GSVD-based solution, which reduces computational complexity by finding sub-clusters of each cluster, and using their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices of which the GSVD can be applied efficiently. Experiments on text data, with up to 7000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm.
An Effective Term-Weighting Scheme for Information Retrieval
(2000-02-07) Jung, Yunjae; Park, Haesun; Du, Ding-Zhu
The retrieval performance of the information retrieval systems is largely dependent on similarity measures. Furthermore, a term-weighting scheme plays an important role for the similarity measure. In this paper, we investigate existing weighting approaches and similarity measures to propose a new term-weighting scheme supporting the cosine similarity measure to retrieve relevant documents effectively on the basis of vector space model. The new weighting technique considers not only occurrence terms but also absence terms in finding similarity among document vector representations. The consideration of negatively weighted terms resolves the masking by zero problem of the inner product operation. According to the experimental results, the balanced term-weighting scheme performs more effectively especially when the vector space of data collection is relatively dense. Consequently, the balanced weighting scheme promises significant contribution to the relevance effectiveness for the information retrieval systems.
An Efficient Algorithm for LDA Utilizing the Relationship between LDA and the generalized Minimum Squared Error Solution
(2004-03-03) Park, Cheonghee; Park, Haesun
In this paper, we study the relationship between Linear Discriminant Analysis(LDA) and the generalized Minimum Squared Error (MSE) solution. We show that the generalized MSE solution is equivalent to applying a certain classification rule in the space transformed by LDA. The relationship of the MSE solution with Fisher Discriminant Analysis (FDA) is extended to multi-class problems and also undersampled problems where the classical LDA is not applicable due to the singularity of scatter matrices. We propose an efficient algorithm for LDA that can be performed through the relationship with the MSE procedure without solving the eigenvalue problem. Extensive experiments verify the theoretical results and also demonstrate that the classification rule induced by MSE procedure can be effectively applied in the dimension reduced space by LDA.
Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model
(2003-09-22) Kim, Hyunsoo; Park, Haesun
A major drawback of support vector machines is that the computational complexity for finding an optimal solution scales as $O(n^3)$, where $n$ is the number of training data points. In this paper, we introduce a novel ionic interaction model for data reduction in support vector machines. It is applied to select data points and exclude outliers in the kernel feature space and produce a data reduction algorithm with computational complexity of about $n^3/4$ floating point operations. The instance-based learning algorithm has been successfully applied for data reduction in high dimensional feature spaces obtained by kernel functions. We also present a data reduction method based on the kernelized instance based algorithm. We test the performances of our new methods which illustrate thatthe computation time can be significantly reduced without any significant decrease in the prediction accuracy.
Dimension Reduction Based on Centroids and Least Squares for Efficient Processing of Text Data
(2001-02-08) Jeon, Moongu; Park, Haesun; Rosen, J. Ben
Dimension reduction in today's vector space based information retrieval system is essential for improving computational efficiency in handling massive data. In our previous work we proposed a mathematical framework for lower dimensional representations of text data in vector space based information retrieval, and a couple of dimension reduction methods using minimization and matrix rank reduction formula. One of our proposed methods is CentroidQR method which utilizes orthogonal transformation on centroids, and the test results showed that its classification results were exactly the same as those of classification with full dimension when a certain classification algorithm is applied. In this paper we discuss in detail the CentroidQR method, and prove mathematically its classification properties with two different similarity measures of L2 and cosine.
Dimension Reduction for Text Data Representation Based on Cluster Structure Preserving Projection
(2001-03-05) Park, Haesun; Jeon, Moongu; Howland, Peg
In today's vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A limitation of discriminant analysis is that one of its scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of our data. We also show that, for k clusters, the right generalized singular vectors that correspond to the k-1 largest generalized singular values are all we need to compute the optimal transformation to the reduced dimension. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.
Equivalence of Several Two-stage Methods for Linear Discriminant Analysis
(2003-09-25) Howland, Peg; Park, Haesun
Linear discriminant analysis (LDA) has been used for decades to extract features that preserve class separability. It is classically defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular restricts its application to data sets in which the dimension of the data does not exceed the sample size. Recently, the applicability of LDA has been extended by using the generalized singular value decomposition (GSVD) to circumvent the nonsingularity requirement. Alternatively, many studies have taken a two-stage approach in which the first stage reduces the dimension of the data enough so that it can be followed by classical LDA. In this paper, we justify the two-stage approach by establishing its equivalence to the single-stage LDA/GSVD method, provided either principal component analysis or latent semanticindexing is used in the first stage over a certain range ofintermediate dimensions. We also present a computationally simpler choice for the first stage, and conclude with a discussion of the relative merits of each approach.
Exponential Modeling with Unknown Model Order Using Structured Nonlinear Total Least Norm
(2000-04-07) Zhang, Lei; Park, Haesun; Rosen, J. Ben
A method based on Structured Nonlinear Total Least Norm is presented for estimating the parameters of exponentially damped sinusoidal signals in noise when the model order is unknown. It is compared to two other existing methods to show its robustness in recovering correct values of parameters when the model order is unknown, in spite of some large errors in measured data.
Extension of Discriminant Analysis based on the Generalized Singular Value Decomposition
(2002-05-30) Howland, Peg; Park, Haesun
Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be utilized in application areas such as information retrieval to reduce the dimension of data while preserving its cluster structure. In the process, we establish relationships between the solutions obtained by various methods, which allow us to refine the optimization criteria and to improve the algorithms for achieving them.
Fingerprint Classification Using Nonlinear Discriminant Analysis
(2003-09-16) Park, Cheonghee; Park, Haesun
We present a new approach for fingerprint classification based on nonlinear feature extraction. Utilizing the Discrete Fourier Transform, we construct reliable and efficient directional images to contain the representative part of local ridge orientations in fingerprints and apply kernel discriminant analysis to the constructed directionalimages, reducing the dimension dramatically and extracting most discriminant features. Kernel Discriminant Analysis is a nonlinear extension of Linear Discriminant Analysis (LDA) based on kernel functions. It performs LDA in the feature space transformed by a kernel-based nonlinear mapping,extracting optimal features to maximize class separability in the reduced dimensional space. We show the effectiveness of the feature extraction method in fingerprint classification. Experimental results show the proposed method demonstrates competitive performance compared with other published results.
Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition
(2003-02-21) Howland, Peg; Park, Haesun
Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from thereduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits.
Kernel Discriminant Analysis based on Generalized Singular Value Decomposition
(2003-03-28) Park, Cheonghee; Park, Haesun
In Linear Discriminant Analysis (LDA), a dimension reducing linear transformation is found in order to better distinguish clusters from each other in the reduced dimensional space. However, LDA has a limitation that one of the scatter matrices is required to be nonsingular and the nonlinearly clustered structure is not easily captured. We propose a nonlinear discriminant analysis based on kernel functions and the generalized singular value decomposition called KDA/GSVD, which is a nonlinear extension of LDA and works regardless of the nonsingularityof the scatter matrices in either the input space or feature space. Our experimental results show that our method is a very effective nonlinear dimension reduction method.
Low Rank Approximation of a Hankel Matrix by Structured Total Least Norm
(1997) Park, Haesun; Zhang, Lei; Rosen, J. Ben
The structure preserving rank reduction problem arises in many important applications. The singular value decomposition (SVD), while giving the best low rank approximation to a given matrix, may not be appropriate for these applications since it does not preserve the given structure. We present a new method for structure preserving low rank approximation of a matri.x, which is based on Structured Total Least Norm (STLN). The STLN is an efficient method for obtaining an approximate solution to the overdetermined linear system AX ~ B preserving the given linear structure in .4. or (A I BJ, where errors can occur in both the right hand side matrix B and the matrix A. The approximate solution can be obtained to minimize the error in the Lp norm, where p = l, 2, or oo. An algorithm is described for Hankel structure preserving low rank approximation using STLN with Lp norm. Computational results are presented, which compare performances of the STLN based method for L1 and L2 norms and other existing methods for reduced rank approximation for Hankel matrices.
Lower Dimensional Representation of Text Data in Vector Space Based Information Retrieval
(2000-12-06) Park, Haesun; Jeon, Moongu; Rosen, J. Ben
Dimension reduction in today's vector space based information retrieval system is essen-tial for improving computational efficiency in handling massive data.In this paper, we propose a mathematical framework for lower dimensional representa-tion of text data in vector space based in-formation retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent Semantic Indexing based on Singular Value Decom-position (LSI/SVD) can be derived as a method for dimension reduction from our mathematical framework. Then we propose a new approach which is more efficient and effective than LSI/SVD when we have a pri-ori information on the cluster structure of the data. Several advantages of the new meth-ods are discussed over the LSI/SVD in terms of computational efficiency and data representation in the reduced dimensional space.Experimental results are presented to illus-trate the effectiveness of our approach in certain classification problem in reduced di-mensional space. These results were com-puted using an information retrieval test sys-tem we are now developing. The results in-dicate that for a successful lower dimen-sional representation of data, it is important to incorporate a priori knowledge on data in dimension reduction.
Matrix Rank Reduction for Data Analysis and Feature Extraction
(2003-02-28) Park, Haesun; Elden, Lars
Numerical techniques for data analysis and feature extraction are discussed using the framework of matrix rank reduction. The singular value decomposition (SVD) and its properties are reviewed, and the relation to Latent Semantic Indexing (LSI) and Principal Component Analysis (PCA) is described. Methods that approximate the SVD are reviewed. A few basic methods for linear regression, in particular the Partial Least Squares (PLS) method, arepresented, and analyzed as rank reduction methods. Methods for feature extraction, based on centroids and the classical Linear Discriminant Analysis (LDA), as well as an improved LDA based on the generalized singular value decomposition (LDA/GSVD) are described. The effectiveness of these methods are illustrated using examples from information retrieval, and 2 dimensional representation of clustered data.
Nonlinear Feature Extraction based on Centroids and Kernel Functions
(2002-12-19) Park, Cheonghee; Park, Haesun
A nonlinear feature extraction method is presented which canreduce the data dimension down to the number of clusters, providing dramatic savings in computational costs. The dimension reducing nonlinear transformation is obtained by implicitly mapping the input data into a feature space using a kernel function, and then finding a linear mappingbased on an orthonormal basis of centroids in the feature space that maximally separates the between-cluster relationship. The experimental results demonstrate that our method is capable of extracting nonlinear features effectively so that competitive performance of classification can be obtained with linear classifiers in the dimension reduced space.

University Digital Conservancy

Browse by Author

Browsing by Author "Park, Haesun"