(Axios) Anthropic says its Claude models show signs of introspection

Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason. These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe. The models are able to answer questions about their internal states with surprising accuracy.

“We’re starting to see increasing signatures or instances of models exhibiting sort of cognitive functions that, historically, we think of as things that are very human,” Anthropic researcher Jack Lindsey, who studies models’ “brains,” says. “Or at least involve some kind of sophisticated intelligence,” Lindsey tells Axios. 

Read it all.

Posted in Science & Technology