Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason. These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe. The models are able to answer questions about their internal states with surprising accuracy.
“We’re starting to see increasing signatures or instances of models exhibiting sort of cognitive functions that, historically, we think of as things that are very human,” Anthropic researcher Jack Lindsey, who studies models’ “brains,” says. “Or at least involve some kind of sophisticated intelligence,” Lindsey tells Axios.
New Anthropic research: Signs of introspection in LLMs.
— Anthropic (@AnthropicAI) October 29, 2025
Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude. pic.twitter.com/4FCfkG9WVT
