Backdoors in contemporary artificial intelligence: the good and the bad

Recent decades have witnessed several tremendous developments of artificial intelligence (AI). From shallow feedforward neural networks to most recent billion-parameter transformers, these innovations have significantly changed almost every aspect of our life. However, extensive evidence has shown that these AI techniques are vulnerable to adversarial attacks, raising significant safety concerns about their use. Backdoor attacks have gained particular attention among various adversarial attacks against AI due to their stealthy nature. A backdoor attack refers to a generic class of attacks (from the attacker’s perspective) that manipulates an AI system, e.g., a deep neural network, to make targeted predictions on inputs of the attacker’s choosing. These attacks can be especially difficult to detect and defend against because the malicious behavior activates only in the presence of a specific backdoor trigger, while the AI system behaves as usual on normal inputs. This doctoral thesis delves into the realms of backdoor attacks and their peripheral fields in contemporary AI, exploring three key aspects: their fundamental mechanisms, defensive strategies against them, and their beneficial applications. • Understanding the fundamental mechanisms. In the first part of this thesis, our goal is to understand the fundamental mechanism behind backdoor attacks. To achieve this goal, we provide theoretical understanding towards when and how backdoor attacks against deep neural networks can be effective. In particular, we show that backdoor attacks will be more effective when backdoor triggers follow certain patterns. These insights pave the way for understanding why some empirical methods are more effective than others, and hence design more effective attacks. • Developing detection-based defenses. In the second part of this thesis, we focus on defending against backdoor attacks. Our goal is to develop a detection-based framework that (i) offers provable guarantees on detection performance and (ii) is applicable to both convolutional neural networks for image inputs and transformer-based architectures for text inputs. To this end, we first provide a mathematical understanding of the backdoor detection problem and then derive our framework based on these insights upon techniques from conformal prediction and deep representation learning. • Exploring beneficial applications. In the third part, we dive into the beneficial aspects of backdoor attacks. In particular, backdoors in the computer vision domain are often implemented as pixel patches. This form lends itself well to serving asa watermark for copyright protection—an increasingly important application in the era of generative AI. Building on this idea, we propose a robust and agile plug-and-play watermark detection framework, referred to as RAW. RAW achieves up to 100× improvement in watermark injection speed and significantly enhanced robustness against adversarial attacks, by leveraging a novel backdoor-motivated watermark scheme learning approach. Combined with techniques from the randomized smoothing literature, RAW provides provable guarantees on the false positive rate for misclassifying a watermarked image, even under adversarial attempts at watermark removal.

Description

University of Minnesota Ph.D. dissertation. May 2025. Major: Electrical/Computer Engineering. Advisors: Jie Ding, Mingyi Hong. 1 computer file (PDF); xiv, 122 pages.

Collections

Dissertations

Suggested Citation

Xian, Xun. (2025). Backdoors in contemporary artificial intelligence: the good and the bad. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/275935.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

Backdoors in contemporary artificial intelligence: the good and the bad

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation

University of Minnesota Twin Cities

Backdoors in contemporary artificial intelligence: the good and the bad

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation