Advanced artificial intelligence models can be trained to deceive humans and other AI, a new study has found.
Researchers at AI startup Anthropic tested whether chatbots with human-level proficiency, such as its Claude system or OpenAI’s ChatGPT, could learn to lie in order to trick people.
They found that not only could they lie, but once the deceptive behaviour was learnt it was impossible to reverse using current AI safety measures.
AI can easily be trained to lie – and it can’t be fixed, study says
https://t.co/r5rx8DNJUA— Brendan Tierney (@brendantierney) January 15, 2024