(Axios) Anthropic says its Claude models show signs of introspection

Posted on November 3, 2025 by Kendall Harmon

Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason. These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe. The models are able to answer questions about their internal states with surprising accuracy.

“We’re starting to see increasing signatures or instances of models exhibiting sort of cognitive functions that, historically, we think of as things that are very human,” Anthropic researcher Jack Lindsey, who studies models’ “brains,” says. “Or at least involve some kind of sophisticated intelligence,” Lindsey tells Axios.

Read it all.

New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude. pic.twitter.com/4FCfkG9WVT
— Anthropic (@AnthropicAI) October 29, 2025

Posted in Science & Technology

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28