There is a lot of research and a lot of effort focused on the idea that if we can design AIs in a certain way, if we can teach them certain principles, if we can code into them certain goals, then we will be safe.
But the two main problems with this approach are: First, the very definition of AI is that it can learn and change by itself. So when you design an AI, by definition, this thing is going to do all kinds of things which you cannot anticipate.
The other, even bigger, problem is that we can think about AI like a baby or a child. And you can educate a child to the best of your ability. He or she will still surprise you for better or worse. No matter how much you invest in their education, they are independent agents. They might eventually do something which will surprise you and even horrify you.
The other thing is, everybody who has any knowledge of education knows that in the education of children, it matters far less what you tell them. It matters far more what you do. If you tell your kids, “Don’t lie,” and your kids watch you lying to other people, they will copy your behavior, not your instructions.
Now if we have now this big project to educate the AIs not to lie, but the AIs are given access to the world and they watch how humans behave and they see some of the most powerful humans on the planet, including their parents, lying, the AI will copy the behavior.
Will AIs lie a lot in the future? Yes, if they copy us humans, says this author https://t.co/jtvWbwp6Sw
— The Wall Street Journal (@WSJ) June 29, 2025
