ChatGPT mythbusting: no, it can’t hear and speak now

OpenAI announced this week that ChatGPT can “see, hear, and speak.” Following the announcement, I noticed posts on LinkedIn, some from very highly-followed and respected accounts, saying that the underlying model is now multimodal with respect to audio — for example, claiming it can generate music.

Nope!

The reality:

  • The new voice features are speech recognition and text-to-speech, functioning as an interface to the existing model

  • These allow you to speak and listen rather than type

  • The underlying model sees the text of your conversation, not the audio

  • The underlying model is only capable of consuming and generating text and images


The hyperbolic wording of OpenAI’s announcement — saying ChatGPT can “hear” and “speak” — no doubt contributed to the mistaken idea that the model is capable of processing audio.

What about vision? Can ChatGPT “see”? The model can indeed consume and generate images, so this part is less misleading. But I could still do without the anthropomorphizing language.

*ChatGPT can write music in text form, such as lyrics or chord progressions, when prompted. It can't generate actual audio files.


Previous
Previous

Wired falls for bogus claims of AI writing detector startups

Next
Next

New developments with AI writing detectors (they still don’t work!)