In this dissertation we propose penalized likelihood estimators for use in statistical classification problems. Our main focus is on the development of fusion penalties. Chapter 1 presents an introduction to penalized likelihood estimation using fusion penalties. Chapter 2 introduces the ridge fusion method for jointly estimating precision matrices for use in quadratic discriminant analysis and semi-supervised model based clustering. A ridge and ridge fusion penalty are used to introduce shrinkage and promote similarity between the estimates. Blockwise coordinate descent is used for the optimization. Tuning parameter selection is also addressed for both the supervised and semi-supervised settings using cross validation with validation likelihood. Chapter 3 presents a second method for jointly estimating multiple precision matrices for use in quadratic discriminant analysis, where a common correlation matrix exists between classes. The correlation decomposition of the precision matrix is penalized to create sparse estimates of the common inverse correlation matrix, and a ridge fusion penalty is used to promote similarity of the estimates of the inverse standard deviations of each variable. A two step algorithm is proposed which simultaneously selects tuning parameters for the two penalties. The merits of this method are show through simulations.In Chapter 4 the we turn from fusion penalties in quadratic discriminant analysis to fusion penalties in multinomial logistic regression. We propose group fused multinomial regression, a novel method for reducing the number of response categories in multinomial logistic regression. An ADMM algorithm is used for optimization and convergence results are established. A line search algorithm and an AIC criterion are developed to select the group structure. A simulation study is presented to show the ability of group fused multinomial regression to select the correct group structure. Chapter 5 summarizes our work and discusses some future directions. We also provide some insight into the connection between the ridge fusion method proposed in Chapter 2 and fusion penalties in multinomial logistic regression.