OpenAI has begun rolling out the advanced voice mode for its chatbot ChatGPT, offering users hyper-realistic audio responses with GPT-4o technology. This release, starting on Tuesday, is initially available to a small group of ChatGPT Plus users as part of an alpha version, with the feature expected to be available to all Plus subscribers by fall 2024.
The Controversy and Scarlett Johansson
The introduction of GPT-4o's voice in May made a significant impact on the public with its surprising realism and quick responses, closely resembling the voice of actress Scarlett Johansson, known for her role in the movie "Her." Following the demonstration, Johansson denied authorizing the use of her voice and took legal action to protect her image. OpenAI responded by denying the use of her voice and subsequently removed the voice from the demonstration. This controversy led OpenAI to delay the launch of the advanced voice mode to strengthen its security measures.
Features of the Advanced Voice Mode
The new advanced voice mode of ChatGPT is distinguished by being multimodal, integrating multiple functions into a single model without relying on other auxiliary models. Previously, ChatGPT used three separate models to process audio: one to convert voice to text, another to process the message with GPT-4, and a third to convert the text back to voice. The new version significantly reduces latency in conversations and is capable of detecting emotional intonations such as sadness, joy, or even singing. OpenAI is introducing this new functionality gradually to closely monitor its use and ensure it remains within established guidelines. Selected users will receive a notification in the ChatGPT app and an email with instructions on how to use the new feature.
In the months following the initial demonstration, OpenAI tested GPT-4o's voice capabilities with more than 100 external team members who speak 45 different languages. A report on these security tests will be published in early August. Additionally, the company has limited the use of the advanced voice mode to four preset voices (Juniper, Breeze, Cove, and Ember), all created in collaboration with paid voice actors. The "Sky" voice initially shown is no longer available.
Furthermore, OpenAI has introduced new filters to prevent requests for generating copyrighted music or other audio content, in response to growing legal concerns regarding the use of audio models like GPT-4o. Recently, record labels have filed lawsuits against AI song generators such as Suno and Udio, marking a new frontier in copyright disputes.