Jailbreaking LLMs and Agentic Systems

Cisco Senior Director of AI Research Amin Karbasi joins Virtue AI Co-founder Sanmi Koyejo for a conversational session exploring cutting-edge research on jailbreak attacks and defenses, Evaluation methods shaping the future of AI safety and How builders can design systems to be secure by default

Authors

Sanmi Koyejo

Summary:
This webinar explores how LLMs and agentic AI systems can be systematically jailbroken through automated, scalable attacks, and how enterprises can respond with layered, continuously evolving defense strategies.

Key points:

Jailbreaking LLMs is often a search problem, enabling highly effective automated attack methods like Tree of Attacks (TAP) and adversarial reasoning.
Modern attacks increasingly use multi-turn and decomposition strategies that hide malicious intent across benign-looking queries.
Automated red teaming can outperform human attackers in scale, cost, and consistency.
Enterprise risk is best understood through business impact (cost, data exposure) rather than raw attack success rate.
Traditional cybersecurity is insufficient due to AI systems being dynamic and context-dependent, requiring continuous monitoring.
Effective defense relies on a “Swiss cheese” model: stacking alignment, prompt defenses, reasoning filters, guardrails, and monitoring.
Agentic systems expand risk from bad outputs to harmful actions, increasing the stakes significantly.
Enterprises should focus on safe tool access, employee education, and endpoint monitoring to reduce exposure.