The Taylor Swift deepfake debacle was frustratingly avoidable

[ad_1]

Image credits: Kevin Winter/Getty Images

You know you’ve made a mistake when you’ve simultaneously angered the White House, TIME’s Person of the Year, and pop culture’s most rabid fan base. That’s what happened last week to X, the Elon Musk-owned platform formerly called Twitter, when AI-generated deepfake pornographic images of Taylor Swift went viral.

One of the most widespread posts of explicit, non-consensual deepfakes was viewed more than 45 million times and received hundreds of thousands of likes. That doesn’t even take into account all the accounts that shared the images in separate posts: once an image has circulated so widely, it’s basically impossible to remove it.

X lacks the infrastructure to identify abusive content quickly and at scale. Even in the days of Twitter, this problem was difficult to remedy, but it has gotten much worse since Musk fired much of Twitter’s staff, including most of his employees. trust and security equipment. So Taylor Swift’s huge and passionate fan base took matters into their own hands, flooding search results with queries like “taylor swift ai” and “taylor swift deepfake” to make it harder for users to find the abusive images. . As White House Press Secretary called Congress To do something, X simply banned the search term “taylor swift” for a few days. When users searched for the musician’s name, they saw a warning that an error had occurred.

This content moderation failure became national news, because Taylor Swift is Taylor Swift. But if social platforms can’t protect one of the most famous women in the world, who can they protect?

“If what happened to Taylor Swift happens to you, as it has to so many people, you probably won’t have the same amount of influence-based support, which means you won’t really have access to it.” important communities of care,” Dr. Carolina Are, a fellow at the Center for Digital Citizens at Northumbria University in the United Kingdom, told TechCrunch. “And these attention communities are the ones that most users have to turn to in these situations, which really shows the failure of content moderation.”

Banning the search term “taylor swift” is like putting a piece of duct tape on a burst pipe. There are many obvious solutions, like the way TikTok users search for “seggs” instead of sex. The search block was something X could implement to make it look like they were doing something, but it doesn’t stop people from just searching for “t swift”. Mike Masnick, founder of the Copia Institute and Techdirt called the effort is “a mace version of trust and confidence.”

“Platforms suck when it comes to giving women, non-binary people, and queer people agency over their bodies, so they replicate offline systems of abuse and patriarchy,” Are said. “If your moderation systems are unable to react to a crisis, or if your moderation systems are unable to react to users’ needs when they report something is wrong, we have a problem.”

So what should X have done to avoid the Taylor Swift fiasco?

Are asks these questions as part of it. investigationand proposes that social platforms need a complete overhaul of how they handle content moderation. It recently held a series of roundtables with 45 internet users from around the world who are affected by censorship and abuse to issue recommendations to platforms on how to implement changes.

One recommendation is for social media platforms to be more transparent with individual users about decisions regarding their account or their reporting on other accounts.

“You don’t have access to a case file, although the platforms do have access to that material; they just don’t want to make it public,” Are said. “I think when it comes to abuse, people need a more personalized, contextual and quick response that involves, if not face-to-face help, at least direct communication.”

X announced this week that it would hire 100 content moderators to work at a new “Trust and Safety” center in Austin, Texas. But under Musk’s watch, the platform has not set a strong precedent for protecting marginalized users from abuse. It may also be a challenge to take Musk at his word, as the mogul has a long history of failing to deliver on his promises. When he first bought Twitter, Musk stated that he would form a content moderation board before making major decisions. This did not happen.

In the case of deepfakes generated by AI, the responsibility does not fall only on social platforms. It also depends on companies creating consumer-facing generative AI products.

According to an investigation by 404 Media, the abusive depictions of Swift come from a Telegram group dedicated to creating explicit, non-consensual deepfakes. Group members typically use Microsoft Designer, which is based on OpenAI’s DALL-E 3 to generate images based on input prompts. in a escape As Microsoft has since addressed, users could generate images of celebrities by typing messages like “taylor ‘singer’ swift” or “jennifer ‘actor’ aniston.”

A senior software engineering leader at Microsoft, Shane Jones, wrote a letter to the Washington state attorney general stating that it found vulnerabilities in DALL-E 3 in December, allowing it to “bypass some of the barriers designed to prevent the model from creating and distributing harmful images.”

Jones alerted Microsoft and OpenAI about the vulnerabilities, but after two weeks, he received no indication that the issues were being fixed. He then posted an open letter on LinkedIn urging OpenAI to discontinue availability of DALL-E 3. Jones alerted Microsoft to his letter, but they quickly asked him to take it down.

“We need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public,” Jones wrote in his letter to the state attorney general. “Concerned employees, like me, should not be bullied into silence.”

OpenAI told TechCrunch that it immediately investigated Jones’ report and found that the technique he described did not bypass its security systems.

“In the underlying DALL-E 3 model, we have worked to filter out the most explicit content from its training data, including graphic sexual and violent content, and we have developed robust image classifiers that prevent the model from generating harmful images,” said a spokesman. from OpenAI said. “We have also implemented additional security measures for our products, ChatGPT and the DALL-E API, including rejecting requests that request a public figure by name.”

OpenAI added that it uses external red teams to test products for misuse. It is not yet confirmed whether the Microsoft program is responsible for Swift’s explicit deepfakes, but the truth is that, since last week, both journalists and bad actors on Telegram were able to use this software to generate images of celebrities.

As the world’s most influential companies bet big on AI, platforms must take a proactive approach to regulating abusive content, but even in an era when deepfaking celebrities wasn’t so easy, violating behavior was easily evaded. moderation.

“This really shows that the platforms are not trustworthy,” Are said. “Marginalized communities have to trust their followers and fellow users more than the people who are technically in charge of our online safety.”

Updated 01/30/24 at 10:30 pm ET, with commentary from OpenAI

Leave a Comment Cancel reply