Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised learning applications, training data contains additional information not reflected in training pairs . Examples include: (1) time series prediction where future samples can be observed in the training data, (2) handwritten digit recognition where training examples are provided by several persons, and this group information is not utilized during training, (3) medical diagnosis where predictive (diagnostic) model, say for lung cancer, is estimated using a training set of male and female patients. The gender can be considered as additional group information.
Incorporating this additional information into learning may improve generalization. Recently, Vapnik proposed a general approach for incorporating additional information into learning, known as Learning Using Privileged Information (LUPI) and learning with structured data (LWSD) which utilizes group information (Vapnik, 2006). A SVM based methodology SVM+ was proposed under LUPI and LWSD setting (Vapnik, 2006). In this thesis, we will first introduce a SVM+ based feature selection system. Then we extend SVM+ to multi-task learning (MTL) setting, where both training and test data can be naturally partitioned into several groups. SVM+ based MTL (SVM+MTL) method for both classification and regression are proposed and analyzed. SVM+MTL estimates multiple models simultaneously, i.e. one model for each group/task. Task inter-dependency is modeled by sharing a common part of the decision function among different groups. Connections and differences between SVM+ and SVM+MTL are discussed. Practical parameter tuning strategies are proposed for SVM+MTL. Empirical comparisons show that SVM+MTL works very well on data sets with group information. Finally, generalized sequential minimal optimization (GSMO) methods are proposed for SVM+MTL training, for both classification and regression settings.