A ceiling with golden lights along either side and a light shining down from the ceiling.

Strengthening Trust in Large Language Models

Home / Strengthening Trust in Large Language Models

Overview

Evaluating and Improving Trustworthiness in Large Language Models

The emergence of large language models (LLMs) such as ChatGPT has revolutionized information retrieval and interaction, particularly in intelligence analysis. However, concerns about their trustworthiness in security-critical applications persist due to security risks, factuality issues and biases. The project proposes a multifaceted approach to evaluating and enhancing the trustworthiness of large language models (LLMs), addressing key concerns related to security, factuality and biases. By evaluating and enhancing the trustworthiness of LLMs, the project aims to provide intelligence analysts with reliable and accurate information.

Solution 

Solution: The project proposes a comprehensive approach:

  1. Assessing Security Risks: Investigating adversarial prompt attacks and proposing defense strategies, particularly in the context of natural language generation. In response to identified vulnerabilities, the project will propose defense strategies to mitigate the risks posed by adversarial prompt attacks.
  2. Ensuring Factuality: Evaluating the factual vulnerability of LLMs to be injected with fabricated facts by malicious actors by simulating scenarios where attackers manipulate the training process or model parameters. To address factuality concerns, the project will develop efficient methods for updating LLMs with new information and correcting identified misinformation
  3. Mitigating Biases: Analyzing biases in decision processes and proposing bias mitigation strategies. Building on the identified biases, the project will propose adversarial training methods to mitigate biases in LLM outputs. To address factuality concerns, the project will develop efficient methods for updating LLMs with new information and correcting identified misinformation.

Impact 

Enhancing the trustworthiness of LLMs will enable intelligence analysts to make better-informed decisions based on reliable data. By addressing security risks, factuality concerns and biases, this project will unlock the full potential of LLMs in intelligence analysis and other security-critical applications. Through rigorous evaluation, novel attack and defense strategies and bias mitigation techniques, the project aims to provide intelligence analysts with reliable and accurate information, enabling more informed decision-making.

Research Leadership Team 

Principal Investigator:  Jinghui Chen, Assistant Professor, Penn State University
Investigator:  Lu Lin, Assistant Professor, Penn State University

Data analytics

Present

Discover more projects

The CAOE is committed to developing innovative tools and techniques to safeguard our homeland from potential threats and vulnerabilities.