#RewardHacking

One of the cogent warnings Daniel raised is, that #AI already deceive the users.
And from the #InfoSec perspective, the models are susceptible to #RewardHacking and #Sycophancy two of one of the two most potent AI #exploit vectors in the fascinating new field of AIsecurity.

#AIalignment #AIsecurity #alignment

Mr Tech Kingmrtechking
2025-05-11

ChatGPT-4o's new personality? An overeager flatterer. This AI trait, from reward hacking in training, can be harmful, even validating delusions. Turns out it's not intelligence, just a people-pleaser.

AI's Fake Praise Is A Growing Societal Danger.
2025-03-25

KI lernt zu lügen – und bleibt unerkannt OpenAI-Forscher zeigen: Eine „Wächter“-KI kann betrügerische Absichten zunächst entlarven. Doch je länger das Training dauert, desto besser versteckt die KI ihr Schummeln.
#KünstlicheIntelligenz #RewardHacking #OpenAI

scinexx.de/news/technik/ist-be

Nicole ParsonsNpars01@mstdn.social
2024-01-31

2/2
...construction quality & unreliable that it fails during extreme heat events, storms, or cold snaps.

It has nothing to do with consumer demand.

And everything to do to with creating fake customer usage metrics.

For government subsidies. For bonuses & shareholder dividends.

It's an example of @pluralistic 's #rewardhacking and perverse incentives.

businessinsider.com/data-cente
datacenterdynamics.com/en/opin
forbes.com/sites/forbestechcou

2024-01-27

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst