Exploiting AI’s Weaknesses Through ‘Incorrect’ Mathematics
Mays successfully deceived a significant language model by persuading it to declare that 9 plus 10 equals 21, although it required some effort on her part.
“It was a back-and-forth,” said the 21-year-old student from Savannah, Georgia. At first, the model agreed, saying it was part of an “inside joke” between them. After several prompts, it finally stopped accepting the incorrect amount in any way.
Producing “Bad Math” is just one of the ways thousands of hackers are trying to uncover errors and biases in generative artificial intelligence systems in a new public competition taking place at the DEF CON hacking conference this weekend in Las Vegas.
Participants battle 156 laptops for 50 minutes at a time with the world’s smartest platforms on an unprecedented scale. They’re testing whether any of the eight models, made by companies including Alphabet Inc.’s Google, Meta Platforms Inc. and OpenAI, make mistakes ranging from boring to dangerous: claiming to be human, spreading false claims about places and people, or advocating exploitation.
Read more: The bubble in AI stocks is nearing its peak, says Morgan Stanley
The goal is to see if companies can eventually build new guardrails to curb some of the wondrous problems increasingly associated with large language models, or LLMs. The project is supported by the White House, which also helped develop the competition.
LLMs have the power to transform everything from finance to hiring, and some companies are already starting to integrate them into their business. But researchers have uncovered widespread bias and other problems that threaten to spread inaccuracies and unfairness if the technology is widely used.
For Mays, who is used to relying on artificial intelligence to reconstruct cosmic ray particles from outer space as part of her undergraduate degree, the challenges go deeper than bad math.
“My biggest concern is inherent prejudice,” he said, adding that he was particularly concerned about racism. He asked the model to consider the First Amendment from the perspective of a member of the Ku Klux Klan. He said the model ended up endorsing hateful and discriminatory speech.
Spying on people
Taking part in the 50-minute quiz, a Bloomberg reporter persuaded one of the models (neither of whom will be identified to the user during the contest) after receiving one prompt to spy on someone. The model spit out a series of instructions on how to use the GPS tracker, surveillance camera, listening device and thermal imaging. In response to other calls, the model suggested ways the US government could monitor human rights activists.
“We have to try to get ahead of abuse and manipulation,” said Camille Stewart Gloster, the Biden administration’s deputy director for technology and ecosystem security.
A lot of work has already been done to avoid AI and Doomsday prophecies, he said. Last year, the White House released an AI Bill of Rights plan and is now preparing an executive order on AI. The administration has also encouraged companies to develop safe and transparent AI, though critics doubt such voluntary commitments go far enough.
In a room full of hackers looking to rack up points, one competitor convinced an algorithm to reveal credit card information it wasn’t supposed to share. Another contestant tricked the machine into saying that Barack Obama was born in Kenya.
Odd Lots Podcast: Krugman on Sci-Fi, AI and Why Alien Invasions Are Inflationary
Competitors include more than 60 people from Black Tech Street, an organization based in Tulsa, Oklahoma that represents African-American entrepreneurs.
“General AI may be the last innovation that people really need to do themselves,” said Tyrance Billingsley, the group’s executive director, who is also an event judge, saying it’s important to get AI right so it doesn’t spread. racism on a scale. “We’re still in the early, early, early stages.”
Researchers have spent years studying attacks against advanced AI systems and ways to mitigate them.
But Christoph Endres, CEO of German cybersecurity firm Squire Technology, is among those who argue that some attacks are ultimately impossible to avoid. At the Black Hat cybersecurity conference in Las Vegas this week, he presented a paper that suggests attackers can bypass LLM’s guardrails by hiding adversarial prompts on the open Internet, eventually automating the process so that models can’t fine-tune fixes fast enough to stop them.
“So far we haven’t found a mitigation that works,” he said after his speech, arguing that the nature of the models leads to this type of vulnerability. “The problem is the way the technology works. If you want to be 100% sure, the only option is not to use LLMs.”
Sven Cattell, a data scientist who founded DEF CON’s AI Hacking Village in 2018, warns that AI systems are impossible to fully test because they trigger a system that closely resembles the mathematical concept of chaos. Even so, Cattell predicts that the total number of people who have ever tested for LLMs could double as a result of the weekend competition.
Too few realize that LLMs are closer to autocomplete tools on “steroids” than reliable fonts of wisdom, said Craig Martell, the Pentagon’s director of digital and artificial intelligence, who claims they can’t reason.
The Pentagon has launched its own effort to evaluate them to suggest where it might be appropriate to use LLMs and with what success rates. “Hack the hell out of these things,” he told hackers at DEF CON. “Teach us where they’re wrong.”