Browsing by Author "Kim, Hyunsoo"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model(2003-09-22) Kim, Hyunsoo; Park, HaesunA major drawback of support vector machines is that the computational complexity for finding an optimal solution scales as $O(n^3)$, where $n$ is the number of training data points. In this paper, we introduce a novel ionic interaction model for data reduction in support vector machines. It is applied to select data points and exclude outliers in the kernel feature space and produce a data reduction algorithm with computational complexity of about $n^3/4$ floating point operations. The instance-based learning algorithm has been successfully applied for data reduction in high dimensional feature spaces obtained by kernel functions. We also present a data reduction method based on the kernelized instance based algorithm. We test the performances of our new methods which illustrate thatthe computation time can be significantly reduced without any significant decrease in the prediction accuracy.Item Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-range Interaction 3D Local Descriptor(2003-01-31) Kim, Hyunsoo; Park, HaesunThe prediction of protein relative solvent accessibility gives us helpful information for the prediction of tertiary structure of a protein. The SVMpsi method which uses support vector machines (SVMs) and the position specific scoring matrix (PSSM) generated from PSI-BLAST has been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three dimensional local descriptor which contains information about the expected remote contacts by the long-range interaction matrix as well as neighbor sequences. Moreover, we applied feature weights to kernels in support vector machines in order to consider the degree of significance that depends on the distance from the specific amino acid. Relative solvent accessibility based on a two state-model, for 25%, 16%, 5%, and 0% accessibility are predicted at 78.7%, 80.7%, 82.4%, and 87.4% accuracy respectively. Three state prediction results provide a 64.5% accuracy with 9%;36% threshold. The support vector machine approach has successfully been applied for solvent accessibility prediction by considering long-range interaction and handling unbalanced data.Item Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach(2003-01-27) Kim, Hyunsoo; Park, HaesunThe prediction of protein secondary structure is an important step in the prediction of protein tertiary structure. While the neural network approach has been improved by the use of position specific scoring matrices, e.g., (PSSM) generated from PSI-BLAST, the support vector machine approach has recently been introduced. A new protein secondary structure prediction method SVMpsi is developed to improve the current level of prediction by incorporating new tertiary classifiers and their jury decision system, efficient methods to handle unbalanced data, a new optimization strategy for maximizing the $Q_3$ measure, and the PSI-BLAST PSSM profiles. The SVMpsi produces the highest published $Q_3$ and SOV94 scores on both the RS126 and CB513 data sets to date. For a new KP480 set, the prediction accuracy of SVMpsi was $Q_3=78.5$% and SOV94 = 82.8%. Moreover, the blind test results for 136 non-redundant protein sequences which do not contain homologues of training data sets, were $Q_3=77.2$% and SOV94 = 81.8%. From the cross validation tests and CASP5 experiment, this shows that the SVMpsi is another competitive method to predict the protein secondary structure. Multi-classification strategies based on the one-versus-one scheme and directed acyclic graph scheme (DAG scheme) are also investigated.Item Relationships Between Support Vector Classifiers and Generalized Linear Discriminant Analysis on Support Vectors(2004-02-15) Kim, Hyunsoo; Park, HaesunThe linear discriminant analysis based on the generalized singular value decomposition (LDA/GSVD) has recently been introduced to circumvents the nonsingularity restriction that occur in the classical LDA so that a dimension reducing transformation can be effectively obtained for undersampled problems. In this paper, relationships between support vector machines (SVMs) and the generalized linear discriminant analysis applied to the support vectors are studied. Based on the GSVD, the weight vector of the hard margin SVM is proved to be equivalent to the dimension reducing transformation vector generated by LDA/GSVD applied to the support vectors of the binary class.It has also been shown that dimension reducing transformation vector and the weight vector of soft margin SVMs are related when a subset of support vectors are considered. These results can be generalized when kernelized SVMs and the kernelized KDA/GSVD are considered.Illustrating the relationship, it is shown that a classification problem can be interpreted as a data reduction problem.Item Text Classification using Support Vector Machines with Dimension Reduction(2003-02-21) Kim, Hyunsoo; Howland, Peg; Park, HaesunSupport vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent ofthe dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel dimension reduction methods to reduce the dimension of the document vectors dramatically. We also introduce decision functions for the centroid-based classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with severaldimension reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the dimension of the input space is significantly reduced.