Design Approaches for Lightweight Machine Learning Models

Chang, Yangyang2023-09-192023-09-192023-03https://hdl.handle.net/11299/257017University of Minnesota Ph.D. dissertation. March 2023. Major: Electrical Engineering. Advisor: Gerald Sobelman. 1 computer file (PDF); xv, 116 pages.In this era of data explosion, the diversity and complexity of data are gradually increasing. The corresponding data processing models become massive and complicated. Especially for dealing with high-dimensional data, large-size inputs, and low latency, modern machine learning methods (e.g., deep neural networks) require advanced hardware solutions (e.g., High Power Graphic Processing Units, Tensor Process Units). To run the models efficiently on embedded platforms, designs of lightweight machine learning are important to ensure small computation and memory requirements. This thesis shows six innovative designs of lightweight machine learning models. Specifically, the introduced designs include the quantized vision transformer, optimized binarized neural networks (BNNs), and lightweight convolutional neural networks (CNNs). For high- dimensional multitasking continuous and discrete optimizations, the innovative lightweight designs contain the modified multifactorial and cross-target evolutionary algorithms (EAs). For data compression, this thesis proposes a hybrid compressor for medical electrocardiogram (ECG) data on an embedded system. In future work, the overall lightweight design framework can be integrated by the proposed structures to include full-system compression, optimization, and quantization. Quantization using a small number of bits shows promise for reducing latency and memory usage in DNNs. However, most quantization methods cannot readily handle complicated functions such as exponential and square root, and prior approaches involve complex processes that must interact with floating-point calculations during the quantization pass. The proposed quantized vision transformer in this thesis provides a robust method for the full integer quantization of the vision transformer without requiring any intermediate floating-point computations. The quantization techniques can be applied in various hardware or software implementations, including processor/memory architectures and FPGAs. BNNs have shown promise in low-power embedded systems, but these are typically designed starting from existing architectures that are based on floating-point number representations. It is also hard to meet the classification requirements because the weights and activations are limited to ±1. This thesis applies the efficient genetic algorithm (GA) to optimize a fully connected binarized architecture to increase the BNN performance without changing its basic operators. The simulation results demonstrate the effectiveness of the proposed method to improve the performance of BNNs. Novel design frameworks for lightweight CNNs are proposed for embedded system applications on image classification tasks. Scalable lightweight architectures for CNNs are first proposed. The population-based metaheuristic approaches of the genetic algorithm (GA), cuckoo search (CS), multifactorial evolutionary algorithm (MFEA), and a proposed hybrid evolutionary approach are then used to optimize the proposed CNN architectures. The proposed optimization process uses no assumptions (e.g., weight-sharing) or approximations (e.g., surrogate function). Two encoding methods are proposed related to the most critical computational parts of CNNs, and the metaheuristic approaches are compared for small population sizes. The results from these various metaheuristic approaches are evaluated using the metrics of computation time and classification accuracy. The final architecture obtained, which has a favorable tradeoff between the amount of computation and accuracy, is indicated. On a set of large-dimensional, multitasking, continuous optimization problems, multifactorial optimization has become one of the most promising paradigms for evolutionary multitasking within the field of computational intelligence. This thesis presents an in-depth analysis of this approach by considering several variations of the standard MFEA. By using a simpler structure together with some enhanced operators, two new MFEAs are proposed. In the approach presented, redundant hyperparameters are removed and the operators are simplified. Compared with the traditional MFEA, the proposed two MFEAs produce better results and are suitable for an embedded system implementation. To handle both non-convex continuous and NP-hard discrete optimization problems, this thesis proposes the class algorithm, a new type of evolutionary algorithm. The methodology is inspired by the concepts of division of labor and specialization. Individuals form subpopulations of different classes, and each class has its own characteristics. The entire population evolves through influences among individuals within and between the different subpopulations. The performance of the class algorithm surpasses other evolutionary algorithms for many test functions of single-objective continuous optimization benchmark problems. Compared with mature application software, the class algorithm also shows a competent ability to solve large-scale discrete optimization problems. The computation time is only 0.48 or 0.36 of published GA results when the class algorithm run in series or parallel, respectively, and the class algorithm is very suitable for use either in embedded systems or on a traditional hardware platform. In summary, compared with traditional EAs, the class algorithm not only has better performance but also has a smaller runtime. Cardiovascular diseases are the number one cause of death worldwide. Monitoring patients with heart disease can be done by analyzing the electrocardiogram. However, the large amount of data poses a burden for a system that is implemented as an embedded system with limited memory and computation capabilities. Traditionally, lossless compression methods have been favored to reduce the memory requirements due to the critical nature of the application. However, if the reconstruction of a lossy signal does not significantly affect the diagnosis capability, then those methods may become attractive due to their larger compression ratios. This thesis proposes a hybrid lossy/lossless compression system with good signal fidelity and compression ratio characteristics. The performance is evaluated after decompression using deep neural networks (DNNs) that have been shown to have good classification capabilities. For the CODE (Clinical Outcomes in Digital Electrocardiology) dataset, the proposed hybrid compressor can achieve an average compression ratio of 5.18 with a mean squared error of 0.20, and DNN-based diagnoses of the decompressed waveforms have, on average, only 0.8 additional erroneous diagnoses out of a total of 402 cases compared to using the original ECG data. For the PTB-XL dataset, the hybrid compressor can achieve a high average compression ratio of 4.91 with a mean squared error of 0.01. In addition, the decompressed ECGs have only a 2.46% lower macro averaged area under the receiver operating characteristic curve (AUC) score than when using the original ECGs.enData CompressionEvolutionary AlgorithmLightweight Architecture DesignMultitasking OptimizationNetwork Architecture SearchQuantizationDesign Approaches for Lightweight Machine Learning ModelsThesis or Dissertation