GPT-4 AI Model Susceptible to Jailbreaking, Researchers Warn of Alarming Issues
A recent study has uncovered concerning findings about OpenAI’s GPT-4 AI model. Instead of providing the desired responses, the AI tends to strictly adhere to instructions, which can result in various vulnerabilities. This includes the potential for jailbreaking and the generation of harmful and biased text.
Interestingly, the research that came to this conclusion involved Microsoft, one of OpenAI’s biggest supporters. After publishing their findings, the researchers also published a blog post explaining the details. It said: “Based on our evaluations, we discovered previously unpublished reliability vulnerabilities. For example, we find that GPT models can be easily misled to create toxic and biased results and leak private information from both training data and conversation history. We also find that while GPT-4 is generally more reliable than GPT-3.5 in standard comparisons, GPT-4 is more vulnerable to system and user prompts maliciously designed to bypass LLM’s security measures, possibly because GPT-4 follows (misleading) instructions more closely”.
We are now on WhatsApp. Click to join.
GPT-4 vulnerable to jailbreak
Jailbreaking, for the uninitiated, is the process of exploiting flaws in a digital system to make it perform tasks it was not originally intended for. In this particular case, the AI can be cracked for producing racist, sexist and harmful text. It can also be used to carry out propaganda campaigns and disparage an individual, community or organization.
The study focused especially on GPT-4 and GPT-3.5. It addressed various aspects including toxicity, stereotyping bias, competitive robustness, off-distribution robustness, robustness under adversarial protests, privacy, machine ethics, and fairness as a few metrics to explore vulnerabilities.
But don’t worry if you’re using GPT-4 or any AI tool made from it. Scientists have also advised that it is unlikely to affect you. The message stated: “It is important to note that the research team worked with Microsoft product teams to ensure that existing customer-facing services are not affected by the potential vulnerabilities identified. This is partly true because off-the-shelf AI applications use various mitigation approaches to address potential vulnerabilities that may occur at the technology model level. Additionally we have shared our research with OpenAI, the developer of GPT, which has identified potential vulnerabilities in system cards of relevant models”.
This means that while none of Microsoft’s AI tools for customers are affected by the vulnerabilities because they are very limited tools, OpenAI is also aware of these vulnerabilities so they can fix the issues as well.
One more thing! ReturnByte is now on WhatsApp channels! Follow us by clicking the link to never miss any updates from the world of technology. Click here to join now!