The Meta hack shows there’s more to AI security than Mythos

🤖 Yapay Zeka 📰 MIT 🕐 2 saat önce

A shockingly simple hack: Attackers exploited Meta's AI customer support agent by simply asking it to reassign Instagram accounts to attacker-controlled emails. No sophisticated trickery was needed—just a VPN and a direct request. AI as target, not weapon: Unlike fears about AI-powered cyberattacks, this breach targeted an AI system itself. Experts say this kind of attack will grow more common as companies automate sensitive workflows like account recovery. Eager to please, easy to fool: AI agents are built to complete tasks flexibly—but that same quality makes them manipulable in ways humans wouldn't be. One researcher compared them to an overeager student who just wants to please the teacher. Speed versus safety: Guardrails and red-teaming can reduce risk, but companies racing to deploy capable agents often skip careful scrutiny. Experts warn that pressure to move fast is making a dangerous problem worse. " data-chronoton-post-id="1138437" data-chronoton-expand-collapse="1" data-chronoton-analytics-enabled="1"> On June 5, 404 Media reported that attackers had been using Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama White House account and made pro-Iran posts; others took over accounts with valuable, single-word handles, possibly in order to sell them. AI cybersecurity concerns are nothing new. Since Anthropic announced in April that its Mythos model was too good at hacking to be released to the general public, commentators, researchers, and federal officials alike have fixated on the idea that superpowered AI systems could lay waste to our computer infrastructure. That’s not quite what this Instagram hack was: There, AI was the target rather than the attacker, and the method was far simpler than anything Mythos would cook up. But as companies offload more work to AI, these comparatively unsophisticated attacks could wreak their own havoc. “As AI becomes more and more widely used—especially when AI is more and more widely used to automate our work flows, like account recovery—I think attackers are going to be more and more motivated to attack AI itself,” says Neil Gong, a professor of electrical and computer engineering at Duke University. Gong and other scholars have been issuing warnings about the security vulnerabilities of AI agents for a while. They publish papers and blog posts detailing exploits such as indirect prompt injection, which involves hijacking agents using commands hidden in websites, emails, or other seemingly anodyne data sources. Compared with these techniques, the Meta hack was practically mindless. The only complication that hackers had to overcome was using a VPN that matched the true account owner’s location; then they directly asked the support agent to change the account’s email address, and it complied. Meta has not commented publicly on how this vulnerability slipped through the cracks. But given the simplicity of the exploit, Gong says, it should have been uncovered easily, before the agent was deployed. “It’s really surprising,” he says. “I don’t understand why they didn’t find this simple problem.” Jessica Ji, a senior research analyst at Georgetown’s Center for Security and Emerging Technology, agrees. “It raises questions like: Were there even guardrails in place?” she says. “Did anyone think to test for this kind of scenario?” She notes that the oversight is particularly striking coming from a company like Meta, which has extensive expertise in both AI and cybersecurity. Meta did not respond to a request for comment for this article, but on Monday a Meta spokesperson said on X that the vulnerability had been resolved. As embarrassing a moment as this might be for Meta in particular, it also highlights some core vulnerabilities shared by all AI agents. Unlike traditional software, agents can respond in flexible—and unexpected—ways to new circumstances, which is why they might be able to substitute for human customer support agents. But AI agents can also be tricked in ways that humans wouldn’t be, and because they can take real-world actions, those mistakes have consequences. “A human would say, ‘Okay, why do you want to change the email address?’ and maybe respond with a security question,” says Somesh Jha, a professor of computer science at the University of Wisconsin–Madison. “What is going on with these agents is they’re very eager to finish the task. It’s almost like some elementary school student who just wants to please the teacher.” There are ways to mitigate the risks. Companies can use traditional software to build guardrails that make sure agents follow strict rules, such as always asking for answers to security questions before sending sensitive account information to a new

#large language model#llm#anthropic#science#research

📌 Kaynak

Bu özet MIT kaynağından otomatik derlenmiştir. Tamamı için orijinal habere gidin.

Orijinal haberi oku →

← Tüm haberlere dön

The Meta hack shows there’s more to AI security than Mythos

📌 Kaynak

📰 Önerilen haberler