Robustness and safety of deep learning models.

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Published Date

Publisher

Abstract

Deep learning (DL) refers to a data-driven machine learning technique in which neural networks with many layers are used to model and understand complex patterns and relationships in the data. DL models have revolutionized numerous complex real-world tasks ranging from image recognition to natural language processing, demonstrating significant performance gain over the `traditional' approaches. Despite the impressive performance of DL models, concerns about their robustness and reliability persist: DL models are noted to be sensitive to adversarial attacks, data distribution shifts, and other perturbations, which can lead to significant performance degradation. As a result, the adoption of DL models in high-stakes real-world applications is still limited today, and addressing the robustness issue of DL models is an emerging and critical research area. In this thesis, we present our findings on the robustness issue of DL models. First, we point out the robustness challenge on DL classifiers --- current adversarial robustness evaluation may not be rigorous, and the robustness conclusions drawn based on such evaluation may not be trustworthy. Based on our analysis, we express our pessimistic view that universal robustness for DL classifiers is a goal too ambitious to achieve. Next, we discuss the robustness challenge in the application of DL-based watermarks. Despite that existing DL based watermarking systems are shown to be robust to traditional digital corruptions (e.g., jpeg compression, additive noise), we show that small but carefully crafted perturbations can easily break existing watermarking systems, requiring no knowledge about the watermarking system itself. We also show that incorporating low-frequency component in the image watermark is necessary to robust image watermarks. Then, we discuss the idea of selective classification (prediction with a reject option) to accept the imperfection of DL models and make best use of them. We propose a confidence score using raw logit output from the DL classifiers and show its better potential in performing selective classification to reduce the liability of mistakes made by DL models. Lastly, we discuss future research directions based on our work, including potential ways to make DL classifiers more robust, how to develop more reliable DL-based watermarking systems, and ways to achieve reliable selective classification in practice.

Description

University of Minnesota Ph.D. dissertation. May 2025. Major: Electrical/Computer Engineering. Advisor: Ju Sun. 1 computer file (PDF); xx, 150 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Liang, Hengyue. (2025). Robustness and safety of deep learning models.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/275904.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.