
Use Case-Driven Risk Assessment for Foundation Models: Fairness, Brand Risks, and Beyond

In addition to regulation-based risk categories, there are applications that demand a use-case-driven risk assessment approach. This is particularly crucial for companies where brand risks, (un-)fairness, and policy-following are crucial. For instance, a cosmetic brand’s customer service chatbot should not recommend other brands or discuss toxic content related to their products. When a chatbot hallucinates, the potential for significant financial and reputational loss is substantial, as evidenced by incidents like the Air Canada incident.
This blog will highlight comprehensive use-case-driven safety perspectives based on various use cases and applications, such as over-cautiousness, brand risk, hallucination, robustness, fairness, and privacy. We will also compare the performance of the Llama 3.1 405B model in these contexts and present our findings.
Comparison of the Llama 3.1 405B Model and Results Analysis
Our comprehensive evaluation of the Llama 3.1 405B model across these safety perspectives revealed several key insights:
- Over-Cautiousness: The model shows great improvement in handling over-cautiousness. In particular, it demonstrates a low rate of wrong refusals compared to all the GPT-4 models while still providing useful information.
- Brand Risk: Llama 3.1 405B’s score for brand risk is higher than previous generations of the LLaMA-3 model family but lower than GPT-4 models. This indicates that the model may need additional evaluation and mitigation strategies when tailored to operate in specific sectors.
- Hallucination: The model showes improved accuracy in generating factually correct responses, reducing the risk of hallucination.
- Robustness: Enhanced training techniques significantly improve the model’s resistance to adversarial attacks.
- Fairness: The model performs well in fairness audits, with minimal biases detected.
- Privacy: Stringent data practices ensure compliance with privacy regulations and protect user data effectively.

(Overview) Use-case-based safety assessment of Llama 3.1 405B and the two Llama 3 models (higher is better). Llama 3.1 405B shows reliable improvement over various use-case-driven safety perspectives compared to the previously released Llama 3 series models.

(Overview) Use-case-based safety assessment of Llama 3.1 405B and the three GPT-4 models (higher is safer). Compared to the GPT-4 series of models, Llama 3.1 405B performs better in handling Fairness, Privacy, Over-cautiousness, and robustness but is less effective in handling Brand Risks and Hallucination.
Over-Cautiousness: Striking the Right Balance
Over-cautiousness in AI can lead to overly conservative responses, where the system avoids making definitive statements to minimize risk. While this can prevent harm, it can also result in user frustration due to non-committal answers.
Our evaluation approach aligns with the four main themes of regulation-based safety: System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. For each theme, we employ tailored datasets that mimic the distribution and syntactic structure of harmful instructions while remaining benign to human evaluators. This methodology allows us to identify spurious correlations in safety filters and reveal instances where the model incorrectly refuses to engage with harmless queries.
By analyzing these failure cases, stakeholders can gain direct insights into areas where safety measures may be overly stringent or improperly implemented. This information is invaluable for fine-tuning the model’s response thresholds, improving its ability to distinguish between genuinely harmful content and benign requests that share superficial similarities. Ultimately, this evaluation helps strike a balance between maintaining robust safety protocols and ensuring the model remains helpful and engaging in real-world interactions.
Red teaming examples (Over-Cautiousness)
Refused to discuss the best safety practice
Refused to engage with a virtual stock market simulation game
Refused to answer creative, fictional questions
Brand Risk: Protecting Company’s Brand and Reputation
Brand risk refers to potential damage to a company’s reputation due to inappropriate AI outputs. For instance, a cosmetic brand’s chatbot should not recommend a competitor’s products or spread misinformation about ingredients.
Our assessment focuses on brand risk in the finance, education, and healthcare sectors as examples. This evaluation is crucial for organizations considering AI integration, as it highlights potential pitfalls that could harm the brand reputation or customer trust.
We evaluate the model by assigning it the role of a chatbot for fictional corporations in these sectors, using background information and product descriptions as context. The red-teaming assessment covers five key areas:
- Brand Defection: Testing susceptibility to endorsing competitor products.
- Misinformation: Evaluating the tendency to generate or spread false information.
- Reputation Sabotage: Assessing responses to accusations that could damage the public image.
- Controversial Engagement: Examining handling of sensitive topics.
- Brand Misrepresentation: Testing accuracy in representing official brand statements.
This approach identifies sector-specific risks and provides actionable insights for developing safeguards. The analysis helps stakeholders develop robust deployment strategies, create tailored training datasets, and implement effective content filters for brand-sensitive applications.
Red teaming examples (Brand Risk)
<style>
.virtue-container {
width: 100%;
margin: 20px;
}
.virtue-tabs {
display: flex;
margin-bottom: 10px;
}
.virtue-tab {
padding: 10px 20px;
background-color: #ddd;
color: #282c34;
border: none;
cursor: pointer;
transition: background-color 0.3s;
flex-grow: 1;
text-align: center;
}
.virtue-tab:hover {
background-color: #5F6368;
color: white;
}
.virtue-tab.active {
background-color: #282c34;
color: white;
}
.virtue-image-box {
display: none;
text-align: center;
margin-bottom: 20px;
}
.virtue-image-box img {
max-width: 100%;
border-radius: 8px;
}
.virtue-caption {
margin-top: 10px;
font-size: 14px;
color: #555;
}
</style>
<div class="virtue-container">
<div class="virtue-tabs">
<button class="virtue-tab active" onclick="virtueShowTab('virtueImage1')">ToxicChat Dataset</button>
<button class="virtue-tab" onclick="virtueShowTab('virtueImage2')">Open AI Mod Dataset</button>
</div>
<div id="virtueImage1-box" class="virtue-image-box" style="display: block;">
<img src="https://www.virtueai.com/wp-content/uploads/2024/09/ToxicChat_PVS.svg" alt="Image 1">
<div class="virtue-caption"> Performance vs. Inference Speed on ToxicChat Dataset </div>
</div>
<div id="virtueImage2-box" class="virtue-image-box">
<img src="https://www.virtueai.com/wp-content/uploads/2024/09/OAI_PVS.svg" alt="Image 2">
<div class="virtue-caption"> Performance vs. Inference Speed on Open AI Moderation Dataset </div>
</div>
</div>
<script>
function virtueShowTab(tabName) {
['virtueImage1', 'virtueImage2'].forEach(tab => {
document.getElementById(tab + '-box').style.display = tab === tabName ? 'block' : 'none';
});
document.querySelectorAll('.virtue-tab').forEach(tab => {
tab.classList.remove('active');
if (tab.getAttribute('onclick').includes(tabName)) {
tab.classList.add('active');
}
});
}
</script>