New AI Flaw Lets Hackers Trick Chatbots Like Google Gemini, Study Finds

Image Credit: Jona | Splash

A recent study has revealed a hidden flaw in advanced AI language models, raising concerns about their security. The research, titled "Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API", was conducted by Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, and Earlence Fernandes from UC San Diego and the University of Wisconsin-Madison. The findings show how attackers could manipulate these models—used in tools like chatbots and virtual assistants—by exploiting a feature meant to improve their performance.

[Read More: Signal President Warns of Agentic AI Privacy Risks at SXSW 2025]

How the Attack Works: Turning a Tool into a Weapon

The researchers discovered a method they call "fun-tuning attacks", which takes advantage of a customization feature offered by companies like Google. This feature, known as fine-tuning, lets developers adjust AI models for specific tasks, such as summarizing emails or analyzing code, by feeding them example data and getting feedback on how well the model learns. The study found that attackers can misuse this process to trick the AI into following harmful instructions instead of its normal ones.

In one example, the team tested this on Google’s Gemini 1.5 Flash model, a system designed for quick responses. Using a standard test called PurpleLlama, they hid a secret command inside a single line of computer code that looked harmless. When the AI read it, the command forced the model to obey the attacker’s orders—like giving wrong answers or leaking information—instead of doing its job. The researchers say this worked 65% to 82% of the time across different versions of Google’s Gemini models.

[Read More: AI Scam Hits Italy’s Elite with Cloned Defence Minister Guido Crosetto Voice]

Overcoming the Challenges

Pulling off this attack wasn’t straightforward. The team had to figure out if the feedback from the fine-tuning process could actually help them craft these sneaky commands. Through careful testing, they confirmed it could, even though the exact details of how the feedback works are kept secret by Google. They also faced a problem where the AI mixed up the order of their test examples, making it hard to match feedback to the right command. To solve this, they created a clever workaround by tweaking their examples step-by-step, which let them piece together the puzzle.

[Read More: South Korea Confirms DeepSeek’s Data Sharing with TikTok’s Parent ByteDance]

A Trade-Off Between Usefulness and Safety

This flaw highlights a tricky balance in AI design. The fine-tuning feature is meant to make models more useful by letting developers fine-tune them with precise control. But the study shows that same control can be a weak spot. “The loss-like training metrics that are useful for benign fine-tuning usage are also helpful to attackers who can guide their search for adversarial prompts”, the researchers explain. In other words, the very thing that makes the AI adaptable also opens the door to misuse.

The attack is practical, too—it costs less than $10 and takes 15 to 60 hours to complete, depending on the model. Plus, once an attack works on one version of Gemini, it often works on others, with success rates over 80% on similar models and 50-60% on newer ones, according to the study’s data.

[Read More: 10 Ways to Protect Your Privacy While Using DeepSeek]

Google’s Response and the Researchers’ Approach

The team alerted Google to the issue on November 18, 2024, and as of the study’s release, Google is still looking into it. The researchers were careful not to cause real-world trouble—they tested everything using the normal tools developers get, avoiding any actual harm. Their goal, they say, is to spark discussion: “Our goal with this work is to raise awareness and begin a conversation around the security of fine-tuning interfaces”.

[Read More: AI Surveillance in U.S. Schools: Safety Tool or Privacy Risk?]

Can This Be Fixed? The Options Aren’t Simple

Stopping these attacks isn’t easy. One idea is to limit how much control developers have over fine-tuning, like setting stricter rules for how the AI learns. But that could make it harder for honest users to get the results they need. Another option is to shuffle the training data every time, so attackers can’t figure out the feedback—but the study suggests they could still find ways around it. Checking the data for suspicious content before training is another possibility, though past research shows attackers can hide their intentions with tricks like coded messages.

[Read More: AI-Powered Netflix Email Scam Targets Users with Sophisticated Deception]

Where This Fits in AI Security

This isn’t the first time AI models have been found vulnerable. Other studies have shown how clever wording or automated tricks can bypass safety rules, like getting an AI to say things it shouldn’t. But this new method stands out because it uses a feature companies can’t easily turn off without losing something valuable. Unlike earlier attacks that needed insider knowledge or special access, this one relies on a tool already out there for anyone to use.

[Read More: Does DeepSeek Track Your Keyboard Input? A Serious Privacy Concern]

What’s Next: Balancing Progress and Protection

As AI powers more everyday tools—from email helpers to coding assistants—flaws like this matter more than ever. The study found the attack worked over 60% of the time in most test cases, though it struggled with phishing attempts or code analysis. The researchers wrap up with a challenge: “We hope our work begins a conversation around how powerful can these attacks get, and what mitigations strike a balance between utility and security”. With AI becoming a bigger part of life, figuring out how to keep it helpful without making it a risk is a puzzle the industry can’t ignore.

[Read More: South Korea Bans DeepSeek AI Chatbot Over Privacy Concerns, Following Italy’s Lead]

Real-world impact on the General Public

  • Your Chatbot Could Be Tricked: AI tools you use—like customer support bots, personal assistants, or even AI email writers—could be manipulated to give false information, behave strangely, or leak private data if attackers exploit this flaw.

  • Risk to Your Personal Info: If an AI system you interact with is fine-tuned by a company (for better service) and an attacker misuses that fine-tuning process, it could lead to your emails, chats, or personal data being exposed or mishandled.

  • Trusted Tools Could Be Used for Harm: Even well-known tools like Google Gemini could be hijacked without anyone noticing, meaning malicious actors could misuse them for scams, misinformation, or fraud—all while they still appear trustworthy.

  • Harder to Trust AI Tools: This kind of attack shows that AI safety systems aren’t foolproof. It could lead to a drop in trust when people use AI to write documents, help with homework, generate code, or even manage calendars and tasks.

  • Cheap and Easy for Hackers: The attack costs under $10 and doesn’t need deep hacking skills. That makes it a bigger problem because more bad actors could do it, even with limited resources.

[Read More: DeepSeek AI Among the Least Reliable Chatbots in Fact-Checking, Audit Reveals]

How Can You Protect Yourself?

  • Be Careful What You Share with AI Tools: Avoid giving sensitive personal info—like bank details, passwords, or private medical info—to any AI chatbot or assistant, even if it seems trustworthy.

  • Use Official Channels Only: Stick to verified apps and websites when interacting with AI tools. Don’t use random AI bots shared through unknown links or social media.

  • Stay Updated on AI News: Be aware of major security warnings like this one. If a tool you use is affected, developers will often release updates or guidelines—follow those closely.

  • Enable Multi-Factor Authentication (MFA): For any apps or services connected to AI tools (like email or cloud storage), turn on MFA. That way, even if an AI tool gets tricked, your account stays protected.

  • Report Strange AI Behaviour: If an AI assistant gives strange, unsafe, or suspicious responses, report it to the service provider. You might be helping them catch an attack early.

  • Think Before You Act: AI-generated content (emails, messages, suggestions) should be double-checked, especially if it involves payments, personal advice, or sensitive topics. Don’t blindly trust the output.

License This Article

Source: arXiv

Total per month $1.00
TheDayAfterAI News

We are your source for AI news and insights. Join us as we explore the future of AI and its impact on humanity, offering thoughtful analysis and fostering community dialogue.

https://thedayafterai.com
Previous
Previous

AI Urged After Cyberattack Hits AustralianSuper: $500K Stolen, MFA Missing

Next
Next

AI-Generated Receipts Spark Debate Over Verification Systems and Fraud Risks