Microsoft Launches Three MAI Models on Foundry

microsoft.ai | ksl | Apr 03, 2026 |

Microsoft released three first-party MAI models - Transcribe-1 for speech-to-text, Voice-1 for text-to-speech, and Image-2 for image generation - available now through its Foundry platform. Transcribe-1 ranks first on FLEURS for 11 core languages and runs 2.5x faster than Azure's existing batch transcription. Voice-1 generates 60 seconds of audio per second and creates custom voices from seconds of sample audio. Image-2 landed top 3 on Arena.ai and doubles prior generation speed. The pricing is aggressive: $0.36/hour for transcription, $22 per million characters for voice. Microsoft building its own production-grade multimodal models while maintaining the OpenAI partnership signals a hedging strategy that's becoming harder to read as purely complementary.

Microsoft Launches Three MAI Models on Foundry

// 0 comments