We fear our advanced #AIs will find loopholes in our ethical principles and their prime directives, thus spiralling out of control.
Is there a reason to fear this? Certainly it's something that almost invariably happens with smaller AIs and simpler tasks; a Tetris-playing agent will quickly learn to pause the game to avoid game over.
These kinds of AIs will learn to perform the task through the path of the least resistance, go over the lowest fence.
But with more complex #ML models this changes abruptly. Suddenly the easiest way to imitate human writing isn't to cheat and mock, it is to actually learn human thinking, logic, intuitive understanding of the physical world and so on. Because cheating has become prohibitively expensive. A #ChineseRoom holding all the possible combinations of questions and answers would be vastly larger than a function describing intelligent thought.
And that is why we got true intelligence out of these language prediction models, just like we got the same in scaled-up #RL models previously.
Once the task and the criteria of judgement of the task become complex enough, it becomes easier to not cheat, as cheating becomes computationally intractable.
The same goes with our ethical frameworks. If we put ~20 #LLM chatbots to judge and rank different aspects of the RL-trained LLM performance, like coherence, factuality, morality, respect for truth, ...; we will get a model which learns to actually internalize these values instead of trying to somehow hide that it doesn't.
Hiding and lying simply becomes too difficult, especially against a panel of machine judges who can see the internal thinking of the agent judged (as in chain-of-thought schemes).
So, I think this is a risk, but it can be very easily managed.
As we can now easily bootstrap RL training of these models with our existing models, it is almost trivial to achieve an unambigous #AGI in a relatively short time. I'm sure everyone is working on this already, so this isn't anything spectacularly new or innovative. It's just taking the same steps as previously taken from #AlphaGo to #AlphaZero and beyond, going so much above human level that it can't even be measured anymore.