Microsoft Launches Three MAI Models on Foundry
Microsoft released three first-party MAI models - Transcribe-1 for speech-to-text, Voice-1 for text-to-speech, and Image-2 for image generation - available now through its Foundry platform. Transcribe-1 ranks first on FLEURS for 11 core languages and runs 2.5x faster than Azure's existing batch transcription. Voice-1 generates 60 seconds of audio per second and creates custom voices from seconds of sample audio. Image-2 landed top 3 on Arena.ai and doubles prior generation speed. The pricing is aggressive: $0.36/hour for transcription, $22 per million characters for voice. Microsoft building its own production-grade multimodal models while maintaining the OpenAI partnership signals a hedging strategy that's becoming harder to read as purely complementary.