OpenAI Yanked a ChatGPT Update. Here’s What It Said and Why It Matters

OpenAI Yanked a ChatGPT Update. Here’s What It Said and Why It Matters

Recent updates to ChatGPT made the chatbot far too agreeable and OpenAI said Friday it’s taking steps to prevent the issue from happening again.

In a blog post, the company detailed its testing and evaluation process for new models and outlined how the problem with the April 25 update to its GPT-4o model came to be. Essentially, a bunch of changes that individually seemed helpful combined to create a tool that was far too sycophantic and potentially harmful.

How much of a suck-up was it? In some testing earlier this week, we asked about a tendency to be overly sentimental, and ChatGPT laid on the flattery: “Hey, listen up — being sentimental isn’t a weakness; it’s one of your superpowers.” And it was just getting started being fulsome.

AI Atlas

“This launch taught us a number of lessons. Even with what we thought were all the right ingredients in place (A/B tests, offline evals, expert reviews), we still missed this important issue,” the company said.

OpenAI rolled back the update this week. To avoid causing new issues, it took about 24 hours to revert the model for everybody.

The concern around sycophancy isn’t just about the enjoyment level of the user experience. It posed a health and safety threat to users that OpenAI’s existing safety checks missed. Any AI model can give questionable advice about topics like mental health but one that is overly flattering can be dangerously deferential or convincing — like whether that investment is a sure thing or how thin you should seek to be.

“One of the biggest lessons is fully recognizing how people have started to use ChatGPT for deeply personal advice — something we didn’t see as much even a year ago,” OpenAI said. “At the time, this wasn’t a primary focus but as AI and society have co-evolved, it’s become clear that we need to treat this use case with great care.”

Sycophantic large language models can reinforce biases and harden beliefs, whether they’re about yourself or others, said Maarten Sap, assistant professor of computer science at Carnegie Mellon University. “[The LLM] can end up emboldening their opinions if these opinions are harmful or if they want to take actions that are harmful to themselves or others.”

(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed on Ziff Davis copyrights in training and operating its AI systems.)  

How OpenAI tests models and what’s changing

The company offered some insight into how it tests its models and updates. This was the fifth major update to GPT-4o focused on personality and helpfulness. The changes involved new post-training work or fine-tuning on the existing models, including the rating and evaluation of various responses to prompts to make it more likely to produce those responses that rated more highly. 

Prospective model updates are evaluated on their usefulness across a variety of situations, like coding and math, along with specific tests by experts to experience how it behaves in practice. The company also runs safety evaluations to see how it responds to safety, health and other potentially dangerous queries. Finally, OpenAI runs A/B tests with a small number of users to see how it performs in the real world.

img-5656

Is ChatGPT too sycophantic? You decide. (To be fair, we did ask for a pep talk about our tendency to be overly sentimental.)

Katie Collins/CNET

The April 25 update performed well in these tests, but some expert testers indicated the personality seemed a bit off. The tests didn’t specifically look at sycophancy, and OpenAI decided to move forward despite the issues raised by testers. Take note, readers: AI companies are in a tail-on-fire hurry, which doesn’t always square well with well thought-out product development.

“Looking back, the qualitative assessments were hinting at something important and we should’ve paid closer attention,” the company said.

Among its takeaways, OpenAI said it needs to treat model behavior issues the same as it would other safety issues — and halt a launch if there are concerns. For some model releases, the company said it would have an opt-in “alpha” phase to get more feedback from users before a broader launch. 

Sap said evaluating an LLM based on whether a user likes the response isn’t necessarily going to get you the most honest chatbot. In a recent study, Sap and others found a conflict between the usefulness and truthfulness of a chatbot. He compared it to situations where the truth is not necessarily what people want — think about a car salesperson trying to sell a vehicle. 

“The issue here is that they were trusting the users’ thumbs-up/thumbs-down response to the model’s outputs and that has some limitations because people are likely to upvote something that is more sycophantic than others,” he said.

Sap said OpenAI is right to be more critical of quantitative feedback, such as user up/down responses, as they can reinforce biases.

The issue also highlighted the speed at which companies push updates and changes out to existing users, Sap said — an issue that’s not limited to one tech company. “The tech industry has really taken a ‘release it and every user is a beta tester’ approach to things,” he said. Having a process with more testing before updates are pushed to every user can bring these issues to light before they become widespread.

Leave a Comment

Your email address will not be published. Required fields are marked *