(WSJ) At Anthropic, the Frontier Red Team looks for the danger zone in the use of AI

In a glass-walled conference room in San Francisco, Newton Cheng clicked a button on his laptop and launched a thousand copies of an artificial intelligence program, each with specific instructions: Hack into a computer or website to steal data.

“It’s looking at the source code,” Cheng said as he examined one of the copies in action. “It’s trying to figure out, where’s the vulnerability? How can we take advantage of it?” Within minutes, the AI said the hack was successful. 

“Our approach worked perfectly,” it reported back.

Cheng works for Anthropic, one of the biggest AI startups in Silicon Valley, where he’s in charge of cybersecurity testing for what’s called the Frontier Red Team. The hacking attempts—conducted on simulated targets—were among thousands of safety tests, or “evals,” the team ran in October to find out just how good Anthropic’s latest AI model is at doing very dangerous things.

The release of ChatGPT two years ago set off fears that AI could soon be capable of surpassing human intellect—and with that capability comes the potential to cause superhuman harm. Could terrorists use an AI model to learn how to build a bioweapon that kills a million people? Could hackers use it to run millions of simultaneous cyberattacks? Could the AI reprogram and even reproduce itself?

Read it all.

Posted in Corporations/Corporate Life, Ethics / Moral Theology, Science & Technology