AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios

MCasq_qsaCJ_234@lemmy.zip · 5 days ago

AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios

RickRussell_CA@lemmy.world · 7 hours ago

I don’t necessarily disagree with anything you just said, but none of that suggests that the LLM was “manipulated into this outcome by the engineers”.

Two models disagreeing does not mean that the disagreement was a deliberate manipulation.