Tuesday, June 24, 2025

OpenAI Unveils Advanced AI Models for Speech Transcription & Voice Generation

OpenAI Unveils Advanced AI Models for Speech Transcription & Voice Generation

OpenAI has launched new AI models for text-to-speech (TTS) and speech-to-text (STT) in its API, enhancing realism, accuracy, and user control. These models align with OpenAI’s broader "agentic" vision, aiming to create automated agents that can handle tasks like customer interactions.

🔊 New Text-to-Speech Model: “gpt-4o-mini-tts” Produces nuanced, realistic speech. More steerable—users can modify tone (e.g., “mad scientist” or “mindfulness teacher”). Ideal for customer support and dynamic voice applications. 📝 New Speech-to-Text Models: “gpt-4o-transcribe” & “gpt-4o-mini-transcribe” Replaces OpenAI’s Whisper model. Trained on high-quality, diverse audio, improving accuracy in accented and noisy environments. Reduces hallucinations, a known issue with Whisper. Challenges: Struggles with Indic & Dravidian languages (30% error rate). Unlike Whisper, OpenAI’s new transcription models won’t be open-source due to their complexity and resource requirements. The company believes open-source AI is best suited for lightweight, on-device models.

related articles

Comments

No comments yet. Be the first to comment!

Leave a Comment