OpenAI has launched new AI models for text-to-speech (TTS) and speech-to-text (STT) in its API, enhancing realism, accuracy, and user control. These models align with OpenAI’s broader "agentic" vision, aiming to create automated agents that can handle tasks like customer interactions.
🔊 New Text-to-Speech Model: “gpt-4o-mini-tts” Produces nuanced, realistic speech. More steerable—users can modify tone (e.g., “mad scientist” or “mindfulness teacher”). Ideal for customer support and dynamic voice applications. 📝 New Speech-to-Text Models: “gpt-4o-transcribe” & “gpt-4o-mini-transcribe” Replaces OpenAI’s Whisper model. Trained on high-quality, diverse audio, improving accuracy in accented and noisy environments. Reduces hallucinations, a known issue with Whisper. Challenges: Struggles with Indic & Dravidian languages (30% error rate). Unlike Whisper, OpenAI’s new transcription models won’t be open-source due to their complexity and resource requirements. The company believes open-source AI is best suited for lightweight, on-device models.
Comments
No comments yet. Be the first to comment!
Leave a Comment