Hidden Prompts in Images Enable Malicious AI Interactions

Security researchers have demonstrated a new technique that hides malicious instructions inside images uploaded to multimodal AI systems. The concealed prompts become visible after the AI downscales the image, allowing the model to execute unintended actions such as extracting calendar data. The method exploits common image resampling methods and has been shown to work against several Google AI products. Researchers released an open‑source tool, Anamorpher, to illustrate the risk and recommend tighter input controls and explicit user confirmations to mitigate the threat.

Background

Researchers at Trail of Bits have identified a novel way to embed hidden instructions inside images that are later processed by large language models with visual capabilities. The approach builds on earlier academic work that suggested image scaling could serve as an attack surface for machine‑learning systems. By crafting images that appear innocuous, attackers can embed text that is invisible at the original resolution but becomes readable after the AI platform automatically downscales the picture for efficiency.

Attack Method

The technique relies on common interpolation methods such as nearest‑neighbor, bilinear, and bicubic resampling. When an image is resized, subtle aliasing artifacts can reveal concealed black text. The AI model interprets this text as a user prompt and executes the embedded instructions alongside any legitimate input. From the user’s perspective, the interaction looks normal, but the hidden prompt can trigger actions that were not authorized.

Demonstrations

Trail of Bits demonstrated the attack against several Google AI services, including the Gemini command‑line interface, Vertex AI Studio, the Android version of Google Assistant, and Gemini’s web interface. In one scenario, the hidden prompt caused the system to forward Google Calendar information to an external email address without the user’s consent. The researchers also released an‑amorpher, an open‑source tool that generates images tailored to different scaling methods, showing that the attack can be reproduced with modest effort.

Implications

The discovery raises concerns about the trustworthiness of multimodal AI systems that are increasingly integrated into everyday workflows. Because many platforms link directly to calendars, communication tools, and other productivity services, a simple image upload could lead to unintended data exposure or identity‑theft‑type risks. Traditional network defenses are not designed to detect this form of prompt injection, leaving a gap that attackers could exploit.

Mitigation Strategies

To reduce the risk, the researchers recommend several defensive measures. Limiting the dimensions of uploaded images and previewing the downscaled result can help spot anomalies. Requiring explicit user confirmation before the model initiates sensitive actions adds an additional safety layer. Ultimately, they argue that robust design patterns and systematic security controls are needed to protect against both traditional and multimodal prompt injection attacks.

Key Points

Background

Attack Method

Demonstrations

Implications

Mitigation Strategies