OpenAI's New GPT Model is Scandalous

In May, OpenAI unveiled GPT-4o (“o” for “omni”), the 4th model in the GPT series. While this new model improves massively on previous models, it has also seen some controversy less than one week after its release.

With its fast response times and by accepting any combination of text, image, video, and audio, GPT-4o is designed to get even closer to mimicking human conversation. One interesting change that allows GPT-4o to outperform the other GPT models is the way it processes different mediums of communication. Previously, to handle audio inputs, the models would convert the audio to text, generate a text output, and then use text-to-speech to convert the output back into audio. This was faulty because the primary neural network would have no information about the speaker, only words. However, GPT-4o uses a single neural network to process input and output, meaning that information such as the speaker’s tone is no longer lost.

Additionally, GPT-4o is capable of tweaking previous image outputs. If a user asked the model to create an image, they could then ask for a slight change to be made and GPT-4o can effectively perform the edit while maintaining the same specific image. This is something Microsoft’s Bing Chat fails to accomplish, where each image output is generally independent of any previous images in the conversation.

Generally, GPT-4o is an extremely impressive and effective model, even significantly improving all non-English outputs. However, the voices behind the chatbot sparked immediate controversy.

When enabling Voice Mode in ChatGPT, users are given 5 different options for how they wish the bot to sound: Breeze, Cove, Ember, Juniper and Sky. According to OpenAI, each of these voices is voiced by a separate voice actor picked out from hundreds of auditions. However, two weeks before the introduction of Voice Mode in September 2023, Sam Altman – CEO of OpenAI – reached out to Scarlett Johansson’s team to see if she was interested in becoming one of the voices. She declined the offer and Voice Mode was implemented with no involvement from her. However, only 3 days before the launch of GPT-4o in May 2024, Sam Altman reached out once again to see if she would reconsider. Once again, she declined, and GPT-4o’s Voice Mode proceeded without her.

The controversy concerns the Sky voice, which users immediately noticed sounded eerily similar to Scarlett Johansson, particularly in the sci-fi film Her, where she voices an AI. This situation was worsened by a tweet from Sam Altman the day of the GPT-4o’s public demonstration, which simply said “her.” These factors led users and news outlets to believe that Scarlett Johansson was in fact the voice behind ChatGPT’s Sky. The actress quickly put out a statement saying she was “shocked” and “angered,” claiming that OpenAI had copied her voice without her permission. OpenAI responded by releasing a lengthy post in which they detailed their process for choosing the different voices, concurring that “AI voices should not deliberately mimic a celebrity’s distinctive voice” and stating that the Sky voice was not samples from Johansson’s voice and was never intended to imitate it. Regardless, they apologized and temporarily removed the Sky voice from all products, which still stands today.

This feud between OpenAI and Scarlett Johansson is likely just one of many more controversies to come. While the field of artificial intelligence is full of constant innovation, these advancements are always accompanied by ethical challenges. The line between human and machine-generated content is only becoming more and more blurred. This incident is a reminder of the fragile balance between technological progress and morality. As we move forward, it is certain that developers will face increasingly high pressure to ensure that AI development is transparent, responsible, and harmless.