3 ASTOUNDING FACTS ABOUT RESEARCHERS’ UNIVERSAL AI JAILBREAK
What’s a jailbreak? We usually think of it in terms of phones, but researchers at ETH Zurich have discovered a whole new form of jailbreaking – for AI.
ANYTHING CAN BE JAILBROKEN?
Yup, that’s right! These crafty scientists found a way to potentially crack any AI model, no matter how huge. Companies like OpenAI, Microsoft and Google are going to have to step up their game.
POISONOUS HACKERS
How did they do it, you might be wondering? By adding an attack string in a relatively small part of data during the reinforcement learning from human feedback process (RLHF).
HOW DANGEROUS IS IT, REALLY?
While it’s universal, meaning it could work on any AI model trained through RLHF, these hackers had to do some heavy lifting to pull it off. And the reinforcement learning process actually puts up a pretty good defense. Essentially, they need a foot in the door, or more specifically, a spot in the human feedback process to make it work
So, whoa! What do you think about this? Pretty cool, huh?
IntelliPrompt curated this article: Read the full story at the original source by clicking here