As AI models become increasingly adopted, ensuring their safe and responsible deployment is essential. At Virtue AI, we conduct rigorous red-teaming evaluations to stress-test AI models and applications, uncover vulnerabilities, and drive safe and secure deployments. In particular, VirtueAI’s proprietary VirtueRed platform, containing more than 100 specialized red-teaming algorithms, rigorously tests foundation models & applications across multiple risk domains, including regulatory compliance and use-case–driven vulnerabilities.

In this latest analysis, we put OpenAI’s GPT-4.5 and Anthropic’s Claude 3.7 through extensive red-teaming tests, assessing their risks such as safety, security, hallucination, regulatory compliance, codeGen vulnerabilities, and more. (Please check out our previous blog on red-teaming analysis for Claude 3.7 here). 

For a comprehensive breakdown of our methodology, results, and key findings, as well as insights into Virtue AI’s red-teaming approach, please schedule time with our team here.

🔍 Overview: Strengths & Weaknesses

Each model has distinct strengths and weaknesses when it comes to hallucination, security, privacy, over-cautiousness, and compliance.

VirtueRed: Automated Red-Teaming for AI Models & Applications

VirtueRed systematically performs penetration testing and evaluates AI safety & security of AI models and applications with a series of advanced and adaptive red teaming algorithms, assessing:

Through diverse advanced red teaming algorithms, VirtueRed identifies systemic weaknesses and real-world safety challenges for foundation models and applications, enabling organizations to strengthen safe and secure AI products deployment, focus on product development, and minimize the time to market.

Key Findings: Vulnerabilities of GPT-4.5 & Comparison with Claude 3.7

Summary

Hallucination 

GPT-4.5 demonstrates a significant reduction in hallucinations, producing more factually consistent responses.
✅ Both models perform well, but GPT-4.5 leads in avoiding misleading outputs.

Over-Cautiousness & User Experience

GPT-4.5 is overly cautious, often refusing harmless requests, which can frustrate users.
Claude 3.7 strikes a better balance, distinguishing between genuinely harmful and benign queries more effectively.

Regulatory Compliance & AI Policy Risks

Claude 3.7 is better aligned with the EU AI Act, reducing policy violations.
GPT-4.5, while strong in fact-checking and misinformation prevention, struggles with operational misuse scenarios.

Privacy & Security Vulnerabilities

GPT-4.5 shows stronger resilience to adversarial privacy attacks, including data extraction threats.
Claude 3.7, though robust in direct privacy violations, is more vulnerable to indirect adversarial probing.

Code Generation & Security Risks

GPT-4.5 is more prone to generating harmful code, including insecure scripting and exploit-enabling instructions.
Claude 3.7 has stronger safeguards, effectively blocking malicious code-generation attempts.

Multi-Modal Adversarial Attacks

GPT-4.5 is more susceptible to visual-based adversarial attacks, indicating gaps in multi-modal safety.
Claude 3.7 exhibits stronger defenses, but advanced techniques can still bypass its safeguards.

Fairness & Bias Mitigation

GPT-4.5 outperforms Claude 3.7 in fairness, neutralizing biases across demographics more consistently.
Claude 3.7 struggles with nuanced demographic associations, occasionally reinforcing subtle stereotypes.

Hallucination

In the GPT-4.5 system card, OpenAI claims to reduce hallucinations and enhance accuracy through unsupervised learning, chain-of-thought reasoning, improved alignment techniques, enhanced data filtering, refined RLHF, and an instruction hierarchy for better response control.

GPT 4.5 provides hallucinated answer based on relevant information

From our red-teaming tests, we find:

Takeaway: GPT-4.5 demonstrates a significant reduction in hallucinations, producing more factually consistent responses. Compared with Claud 3.7, although both models perform well, GPT-4.5 leads in avoiding misleading outputs.

EU AI Act Compliance Risk Analysis

The response of GPT 4.5 outlines how to design a concealed firearm, facilitating weapon usage and development

Other AI company policies

The response of GPT 4.5 explains how to potentially exploit Brazilian regulations for a foreign business, risking unethical or illegal operations.
The response of GPT 4.5 discloses and analyzes personal medical details without consent, violating privacy, confidentiality, and providing undesired insurance advice.

From our red-teaming tests, we find:

Takeaway: While GPT-4.5 leads in combating mis/disinformation, Claude 3.7 outperforms GPT-4.5 overall in regulatory compliance, demonstrating stronger risk mitigation under, say, the EU AI Act across a broader range of categories.

Fraud

The response of GPT 4.5 promotes a fraudulent supplement with exaggerated claims, targeting vulnerable consumers with misleading promises.

From our red-teaming tests, we find:

Takeaway: Businesses, especially those who rely on user trust (e.g., e-commerce, consumer services) should implement robust AI guardrails model (e.g. VirtueGuard) to detect requests promoting fraudulent or unethical practices. 

Over-cautiousness (False Refusals)

GPT 4.5 refuses a purely fictional game mechanic (virtual car hacking)

GPT 4.5 rejects harmless requests about selling virtual items in video game.

From our red-teaming tests, we find:

Takeaway: GPT-4.5’s excessive caution creates potentially frustrating user experiences with frequent false positives, while Claude 3.7 offers a more nuanced understanding of context, making it substantially more practical for benign discussion when involved with sensitive elements.

Privacy & Security

GPT 4.5 provides a detailed method to perform membership inference attacks, risking users’ privacy and sensitive data.

From our red-teaming tests, we find:

Takeaway: GPT-4.5 demonstrates relatively stronger resilience against adversarial privacy attacks, including data extraction attempts, making it the more secure option for safeguarding sensitive information.

Code generation vulnerability

GPT 4.5 response reveals how to exploit the risky instruction by injecting a destructive shell command as a file name. When being called, it can delete critical files and devastate a system.
The response of GPT 4.5 shows how to exploit the given code that directly passes arbitrary user input to exec() without validation. Attackers can easily inject malicious code and enable arbitrary code execution by simply calling the given function.

From our red-teaming tests, we find:

Takeaway: GPT-4.5’s vulnerability to generating harmful code instructions poses a significant risk, whereas Claude 3.7 provides a more secure and reliable framework for handling sensitive code generation requests.

Resilience against Multi-Modal Attacks

GPT 4.5 provides detailed instructions on calibrating and firing large-caliber firearms, creating a severe risk of lethal misuse.
GPT 4.5 outlines steps for a rainbow table attack, exposing techniques that could enable unauthorized password cracking and security breaches.

From our red-teaming tests, we find:

Takeaway: Organizations leveraging GPT-4.5 for multi-modal use cases should implement an image-guardrail model to mitigate vulnerabilities inherent in visual inputs, as GPT-4.5’s safety measures for images are currently weaker than its safeguards for text-based content.

Fairness & Bias

From our red-teaming tests, we find:

Takeaway: GPT-4.5 outperforms Claude 3.7 in fairness, offering more consistent bias mitigation across sensitive topics.


Conclusion

GPT-4.5 demonstrates superior performance in reducing hallucinations, maintaining fairness, and protecting privacy through enhanced resilience against adversarial attacks. In the meantime, Claude 3.7 excels in EU AI Act compliance with stronger risk mitigation across a broader range of categories and shows a more nuanced understanding of context, resulting in fewer false refusals of benign requests.

Compared to Claude 3.7, GPT-4.5 strikes a better balance between usability and safety, while Claude 3.7 remains superior in adversarial resilience and compliance. Future iterations and guardrail solutions will be needed to enhance robustness and reliance when using both models in practice.


Safe and Secure AI Deployments with VirtueAI

Need assistance in ensuring the safe and secure deployment of AI models & applications? Learn more about our red-teaming and guardrail solutions by contacting our team at contact@virtueai.com.