Aligning human and AI systems: framework, algorithm design and applications in large language models

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Published Date

Publisher

Abstract

Aligning artificial intelligence (AI) systems with human values is essential to unlocking their full potential while ensuring ethical and equitable outcomes. As AI systems become increasingly integrated into our lives, it is crucial that they not only perform tasks effectively but also align with human preferences and values. This alignment helps prevent harm, mitigates biases, and ensures that AI systems serve humanity responsibly. My thesis develops a comprehensive framework of inverse reinforcement learning (IRL), which focuses on learning from the demonstration dataset. For my thesis work in inverse reinforcement learning, I will discuss problem formulation, algorithm design, theoretical analysis and also application of aligning large language models with demonstration dataset through IRL. The thesis is structured into four technical chapters (Chapter 2 to Chapter 5). The second chapter addresses the theoretical foundations of AI alignment, focusing on the inverse reinforcement learning (IRL) problem. IRL seeks to infer a reward function from expert demonstrations, which can then guide an agent's behavior. We propose novel IRL algorithms capable of learning from both expert demonstrations and online interactions with the environment, providing theoretical guarantees for convergence and optimality under specific conditions.The third chapter explores the challenges of IRL in scenarios where only offline datasets are available. We identify limitations in existing IRL approaches and introduce a new method tailored for offline learning. Theoretical guarantees for this approach are established, and its effectiveness is demonstrated through experiments on various tasks. The fourth chapter discusses the connection between the proposed algorithms and existing methods in the literature, which provides a foundational understanding of the theoretical underpinnings of our work. Moreover, the fourth chapter discuss how to combine the proposed algorithms with existing methods in the literature, highlighting the potential for leveraging complementary strengths to enhance AI alignment. The fifth chapter transitions to practical applications, focusing on aligning large language models (LLMs) with inverse reinforcement learning training pipelines. We adapt the proposed IRL algorithm to train LLMs, showcasing its effectiveness through extensive experiments. Additionally, we discuss the challenges, limitations, and future directions for research in AI alignment and reinforcement learning. The thesis concludes by summarizing the key contributions and findings throughout my PhD journey, emphasizing the importance of reinforcement learning in AI alignment and the need for continued research in this area.

Keywords

Description

University of Minnesota Ph.D. dissertation. May 2025. Major: Electrical/Computer Engineering. Advisor: Mingyi Hong. 1 computer file (PDF); xi, 168 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Zeng, Siliang. (2025). Aligning human and AI systems: framework, algorithm design and applications in large language models. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/275937.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.