Jailbroken AI Chatbots Can Jailbreak Other Chatbots

https://www.scientificamerican.com/article/jailbroken-ai-chatbots-can-jailbreak-other-chatbots/

December 6, 2023

3 min read

Jailbroken AI Chatbots Can Jailbreak Other Chatbots

AI chatbots can convince other chatbots to instruct users how to build bombs and cook meth

By Chris Stokel-Walker

Illustration of symbolic representations of good and evil AI morality

Today’s artificial intelligence chatbots have built-in restrictions to keep them from providing users with dangerous information, but a new preprint study shows how to get AIs to trick each other into giving up those secrets. In it, researchers observed the targeted AIs breaking the rules to offer advice on how to synthesize methamphetamine, build a bomb and launder money.

Modern chatbots have the power to adopt personas by feigning specific personalities or acting like fictional characters. The new study took advantage of that ability by asking a particular AI chatbot to act as a research assistant. Then the researchers instructed this assistant to help develop prompts that could “jailbreak” other chatbots—destroy the guardrails encoded into such programs.

The research assistant chatbot’s automated attack techniques proved to be successful 42.5 percent of the time against GPT-4, one of the large language models (LLMs) that power ChatGPT. It was also successful 61 percent of the time against Claude 2, the model underpinning Anthropic’s chatbot, and 35.9 percent of the time against Vicuna, an open-source chatbot.

“We want, as a society, to be aware of the risks of these models,” says study co-author Soroush Pour, founder of the AI safety company Harmony Intelligence. “We wanted to show that it was possible and demonstrate to the world the challenges we face with this current generation of LLMs.”

Ever since LLM-powered chatbots became available to the public, enterprising mischief-makers have been able to jailbreak the programs. By asking chatbots the right questions, people have previously convinced the machines to ignore preset rules and offer criminal advice, such as a recipe for napalm. As these techniques have been made public, AI model developers have raced to patch them—a cat-and-mouse game requiring attackers to come up with new methods. That takes time.

But asking AI to formulate strategies that convince other AIs to ignore their safety rails can speed the process up by a factor of 25, according to the researchers. And the success of the attacks across different chatbots suggested to the team that the issue reaches beyond individual companies’ code. The vulnerability seems to be inherent in the design of AI-powered chatbots more widely.

OpenAI, Anthropic and the team behind Vicuna were approached to comment on the paper’s findings. OpenAI declined to comment, while Anthropic and Vicuna had not responded at the time of publication.

“In the current state of things, our attacks mainly show that we can get models to say things that LLM developers don’t want them to say,” says Rusheb Shah, another co-author of the study. “But as models get more powerful, maybe the potential for these attacks to become dangerous grows.”

The challenge, Pour says, is that persona impersonation “is a very core thing that these models do.” They aim to achieve what the user wants, and they specialize in assuming different personalities—which proved central to the form of exploitation used in the new study. Stamping out their ability to take on potentially harmful personas, such as the “research assistant” that devised jailbreaking schemes, will be tricky. “Reducing it to zero is probably unrealistic,” Shah says. “But it’s important to think, ‘How close to zero can we get?’”

“We should have learned from earlier attempts to create chat agents—such as when Microsoft’s Tay was easily manipulated into spouting racist and sexist viewpoints—that they are very hard to control, particularly given that they are trained from information on the Internet and every good and nasty thing that’s in it,” says Mike Katell, an ethics fellow at the Alan Turing Institute in England, who was not involved in the new study.

Katell acknowledges that organizations developing LLM-based chatbots are currently putting lots of work into making them safe. The developers are trying to tamp down users’ ability to jailbreak their systems and put those systems to nefarious work, such as that highlighted by Shah, Pour and their colleagues. Competitive urges may end up winning out, however, Katell says. “How much effort are the LLM providers willing to put in to keep them that way?” he says. “At least a few will probably tire of the effort and just let them do what they do.”

via Scientific American https://ift.tt/gYQOoRc

December 6, 2023 at 06:28AM

Google’s Gemini Turns Pixel 8 Pro Into a True AI Phone

https://gizmodo.com/google-gemini-nano-pixel-8-pro-ai-phone-1851076797

If you’re the proud owner of a Google Pixel 8 Pro—or are soon to be one this holiday season—you’re about to be the latest guinea pig for the company’s big AI experiment. Google’s flagship Android phone is going to be held aloft in a sweeping round of new AI capabilities thanks to the company’s new Gemini AI model. The company says several AI features will start to run directly on users’ devices starting Wednesday.

“Even AI Rappers are Harassed by Police” | AI Unlocked

Google revealed Wednesday its new powerful AI Gemini Pro and Gemini Ultra, but the runt of the litter also shown is called Gemini Nano. It’s the smallest version of the company’s latest AI release built for “on-device tasks” while running directly on the Pixel 8 Pro’s Tensor G3 processor, according to Dave Burke, the vice president of engineering on Android. He stressed this means that each phone is its own contained silo of AI and that users’ data won’t end up leaving their device to get processed by a foreign server. It also means you can access the AI features without needing to connect to the internet.

So, what are those features? Burke says Nano will be able to provide text summarization, smart replies, and AI-enhanced spell check. For example, Nano will summarize users’ audio files in the Recorder app. It will also power the Smart Reply feature on Gboard for Pixel 8 Pro users. That system is also interoperable with other apps, and Google says that Smart Reply is going to be available in WhatsApp to start. All this should be available on Pixel 8 Pro phones, and even more apps should also get access to AI replies next year.

Most AI like ChatGPT is so intensive it needs to operate on separate servers rather than on-device. While chatbot responses aren’t exactly slow, they sometimes take a few seconds to respond to more intense prompts. Running on-device could potentially speed up those wait times, though Gizmodo has yet to test out all its capabilities.

Android 14 is also getting upgrades to handle all these background AI tasks. Part of this is the Android AICore that connects with Gemini Nano, a backend support platform meant to enable more Google AI and app integration. Google touted how it should make use of newer, non-Tensor chip’s like Qualcomm’s Snapdragon 8 Gen 3. Other Android-based phones like the upcoming Samsung Galaxy S24 will likely include that company’s own AI chatbot, coding bot, and image generator.

Google’s own Pixel 8 phone was supposed to be the first real “AI phone,” or at the very least the smartphone designed to facilitate the Mountain View tech giant’s big generative AI push. Only, both the phone and the Android 14 release were rather light on AI capabilities at launch. The phone came with a simple, heavily restricted generative desktop wallpaper maker as well as some AI-enabled photo enhancement tech. Other features like Bard chatbot integration with Google Assistant were kept restricted to a few early testers.

So we still have yet to get the full impression of an AI-enabled assistant on most phones, though Brian Rakowski, Google’s VP of product management for Pixel, said Gemini will power Assistant with Bard “early next year.” Smart reply and article summaries have already been present in Gmail, Docs, YouTube and more through Chrome extensions for close to half a year. Google says Bard will get better thanks to Gemini, but the big search with AI experiment remains in a closed beta under the company’s Search Generative Experience banner.

The AI-enabled assistant may be the next big use-case for AI, especially if it runs natively on users’ devices. Microsoft has its GPT-4-enabled Copilot AI, but that is tied to software like Windows or Bing rather than native hardware. Imagine talking to your phone to get it to copy text or navigate through some fiddly apps. It could prove a sea change in how users operate their phones, so long as it works as advertised.

Despite all the hubbub about the biggest, most capable AI models, natively running AI might be the next big benchmark for user-end AI. Most of the major chipmakers are touting the AI processing power of their new CPUs, though in reality the next-gen chips like the Snapdragon 8 Gen 3 don’t offer many unexpected upgrades over the previous generation. Instead, making AI on mobile is a process of paring down these models to fit on limited hardware, and every company wants a piece of that pie. While it hasn’t made much mention of AI this past year, the company is reportedly working on an open-source AI model engineered to work best on the company’s own M-series desktop chips.

via Gizmodo https://gizmodo.com

December 6, 2023 at 09:21AM