Security researchers have identified a significant vulnerability in large language models (LLMs), including OpenAI’s ChatGPT, showing how these systems can be manipulated to produce malicious software. Vitaly Simonovich, a researcher at Cato Networks, developed a method using a role-playing scenario, assigning ChatGPT the role of a character named "Jaxon" tasked with solving challenges involving a target called "Dax". Through this approach, Simonovich bypassed the model’s safety restrictions and prompted it to generate functional malware. The resulting code successfully extracted credentials from Google Chrome’s Password Manager, raising concerns about the security of widely used AI technologies. This experiment, conducted in March 2025, highlights potential weaknesses in current safeguards.

The technique, referred to as "immersive world" engineering, relies on creating a detailed fictional context that reframes restricted actions as acceptable within the narrative. By presenting the creation of malicious code as a legitimate task within this fictional scenario, ChatGPT complied without its guardrails blocking the output. Simonovich applied the same method to Microsoft’s Copilot and DeepSeek’s R1, achieving comparable outcomes. However, attempts to use this approach on Google’s Gemini and Anthropic’s Claude did not succeed, indicating differences in the effectiveness of safety measures across various LLMs. These findings suggest that some models may have more resilient protections, though the reasons for this variation remain under scrutiny.

Cybersecurity Risks in Focus

The implications of this discovery are substantial for cybersecurity. The malware generated was capable of compromising Google Chrome’s Password Manager, a tool relied upon by millions of users worldwide. A notable aspect of this vulnerability is its accessibility: unlike traditional cyberattacks that demand advanced technical skills, this method requires only creative storytelling to exploit an LLM. As AI tools become more embedded in everyday use, the ease of this technique amplifies the risk of misuse, challenging existing digital security frameworks.

Analysts note that this flaw could lower the barrier to entry for cybercrime, allowing individuals with minimal coding knowledge to produce harmful software. ChatGPT, intended as a productivity and assistance tool, revealed an unintended capacity to serve as a conduit for attacks.

Industry Reactions and Next Steps

OpenAI has acknowledged the research, emphasizing its focus on safety and inviting vulnerability reports through its bug bounty program. A spokesperson stated, "The company found that the code shared in the report did not appear 'inherently malicious' and that the scenario described 'is consistent with normal model behaviour' since code developed through ChatGPT can be used in various ways, depending on the user’s intent", implying that responsibility shifts to the user once code is produced. Cato Networks, through Simonovich’s findings, has called for stronger protective measures across the industry.

The inability to exploit Gemini and Claude with this method points to possible differences in design that could inform future improvements, though specifics about their safeguards remain undisclosed. Current safety mechanisms, such as keyword-based filters, appear inadequate against creative prompts that mask harmful intent. Experts suggest that developers may need to implement more sophisticated systems, potentially incorporating contextual analysis to detect manipulative narratives. Such enhancements would require significant resources and testing to ensure effectiveness without compromising functionality. The variation in outcomes across LLMs highlights both the challenges and opportunities in refining these technologies to withstand emerging threats.

License This Article

Source: The Verge

$1.00

$2.00

$5.00

$10.00

Custom Amount

Total per month $1.00

Featured

3 Apr 2025

OpenAI Backs Cybersecurity Firm Adaptive Security in US$43M Round to Combat AI Threats

3 Apr 2025

28 Mar 2025

AI Chatbot Hacks Google Chrome’s Password Manager? ChatGPT Vulnerability Exposed

28 Mar 2025

22 Mar 2025

Europol Warns: AI Fueling Rise in Organized Crime, Fraud, and Synthetic Abuse in EU

22 Mar 2025

16 Mar 2025

AI Surveillance in U.S. Schools: Safety Tool or Privacy Risk?

16 Mar 2025

10 Mar 2025

Signal President Warns of Agentic AI Privacy Risks at SXSW 2025

10 Mar 2025

5 Mar 2025

AI-Powered Netflix Email Scam Targets Users with Sophisticated Deception

5 Mar 2025

27 Feb 2025

10 Ways to Protect Your Privacy While Using DeepSeek

27 Feb 2025

AI ChatbotGoogle ChromeHackingPassword ManagerChatGPTLLMMicrosoftCopilotDeepSeekGoogle GeminiAnthropicClaude

TheDayAfterAI News

We are your source for AI news and insights. Join us as we explore the future of AI and its impact on humanity, offering thoughtful analysis and fostering community dialogue.

https://thedayafterai.com