![]() |
VOOZH | about |
Microsoft has released MAI-Transcribe-1, its third in-house developed AI model, which it claims is the most accurate transcription model in the world.
With an average Word Error Rate of just 3.9 per cent, MAI-Transcribe-1 works across 25 languages – English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.
Microsoft’s new AI model ranks 1st in the FLUERS industry-standard benchmark in 11 core languages and surpasses the likes of Whisper-large-v3 on the 14 remaining languages. It also surpasses the recently launched Google Gemini 3.1 Flash in 11 out of 14 languages. Available in Microsoft Foundry, the company says MAI-Transcribe-1’s batch transcription speed is 2.5x faster than its Azure Fast offering and is available for just $0.36 per hour.
The company says MAI-Transcribe-1 is highly accurate in all supported languages, making it an ideal choice for a wide range of speech-to-text use cases. While it does not support real-time transcription, Microsoft says it will add the feature in a future version. Alongside MAI-Transcribe-1, Microsoft also released two new AI models – MAI-Image-2 and MAI-Voice-1, which, as their names suggest, can generate images and audio.
The tech giant says MAI-Voice-1 is its flagship voice generation model that can “generate natural, realistic speech, rich with nuance, emotional range and expression that preserves speaker identity” even in long-form content. Capable of generating 60 seconds of audio in just 1 second, MAI-Voice-1 is also GPU-efficient. It is available in Copilot Audio Expressions and Copilot Podcasts.
As for MAI-Image-2, Microsoft says it focuses on “performance and speed” and also appeared in the top 3 model family on the Arena.ai leaderboard. While Microsoft’s AI models may not be as large or the fastest, the company hopes to sell them as cheaper alternatives to large language models from Google and OpenAI.