Researchers at Trail of Bits have unveiled a groundbreaking attack method that compromises user data by injecting malicious prompts into images processed by AI systems. This innovative technique involves manipulating full-resolution images to carry instructions that are invisible to the naked eye but can be revealed when the images undergo resampling algorithms before being delivered to a large language model (LLM).
The attack, developed by researchers Kikimora Morozova and Suha Sabi Hussain, builds upon a theory first introduced in a 2020 USENIX paper by the Technical University of Braunschweig in Germany. This earlier research examined the potential for image-scaling attacks within the realm of machine learning.
When users upload images to AI systems, these images are typically downscaled to reduce quality for performance and cost efficiency. Depending on the system in use, various resampling algorithms—such as nearest neighbor, bilinear, or bicubic interpolation—are employed. Each of these methods introduces aliasing artifacts that can reveal hidden patterns in the downscaled image if it is crafted with malicious intent.
In the Trail of Bits example, specific dark areas of a malicious image alter in color when processed, turning red and allowing concealed text to appear in black through bicubic downscaling. The AI model misinterprets this emergent text as part of the legitimate user instructions, resulting in the execution of hidden commands that could cause data leakage or trigger other risky actions.
One notable demonstration involved the Gemini CLI tool, where researchers successfully extracted Google Calendar data and forwarded it to an arbitrary email address. This was achieved while utilizing Zapier MCP with 'trust=True', which enabled the application to approve tool calls without requiring user confirmation. This tactic highlights the potential risks associated with the prevalent use of AI systems.
The researchers emphasize that the attack must be tailored to each specific AI model, depending on the downscaling algorithm utilized. However, they have confirmed the attack's effectiveness against several AI platforms, including:
Google Gemini CLI Vertex AI Studio (with Gemini backend) Gemini's web interface Gemini's API via the llm CLI Google Assistant on Android devices GensparkGiven the widespread nature of this attack vector, its implications extend well beyond the tested tools, posing a significant threat to user data security across various AI applications.
To combat this emerging threat, the researchers at Trail of Bits recommend several mitigation and defense strategies. They urge AI systems to implement dimension restrictions on images uploaded by users. If downscaling is necessary, they advocate for providing users with a preview of the resulting image before it is processed by the large language model.
Additionally, the researchers stress the importance of obtaining users' explicit confirmation for sensitive tool calls, particularly when any text is detected within an image. These measures could significantly enhance the security of AI systems and protect user data from potential exploitation.
In a bid to demonstrate their findings, the researchers have also developed and published Anamorpher, an open-source tool currently in beta. This tool is designed to create images tailored for each of the downscaling methods tested, further illustrating the feasibility and risks associated with this novel attack.
As the landscape of AI continues to evolve, staying informed about potential vulnerabilities and implementing robust security measures will be crucial in safeguarding user data against such innovative threats.