MAI-Transcribe in Azure Speech (preview)

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

MAI‑Transcribe models are speech recognition models developed by the Microsoft AI (MAI) Superintelligence team. These models are optimized for both high accuracy and high efficiency, and are available through the LLM Speech API.

The following models are supported:

mai-transcribe-1.5
mai-transcribe-1

Prerequisites

An Azure subscription. You can create one for free.
A Microsoft Foundry resource for Speech in the Azure portal.
The Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For the current list of supported regions, see Speech service regions.
An audio file (less than 300 MB in size) in one of these formats: WAV, MP3, or FLAC.

Language support

By default, the model operates in multi-lingual mode. The following languages are currently supported:

Language code	Language	MAI-Transcribe-1.5 support	MAI-transcribe-1 support
`ar`	Arabic	✅	✅
`as`	Assamese	✅
`bg`	Bulgarian	✅
`bn`	Bengali	✅
`ca`	Catalan	✅
`cs`	Czech	✅	✅
`da`	Danish	✅	✅
`de`	German	✅	✅
`el`	Greek	✅
`en`	English	✅	✅
`es`	Spanish	✅	✅
`et`	Estonian	✅
`fi`	Finnish	✅	✅
`fr`	French	✅	✅
`gu`	Gujarati	✅
`hi`	Hindi	✅	✅
`hu`	Hungarian	✅	✅
`id`	Indonesian	✅	✅
`it`	Italian	✅	✅
`ja`	Japanese	✅	✅
`kn`	Kannada	✅
`ko`	Korean	✅	✅
`lt`	Lithuanian	✅
`ml`	Malayalam	✅
`mr`	Marathi	✅
`nb`	Norwegian Bokmål	✅	✅
`nl`	Dutch	✅	✅
`or`	Odia	✅
`pa`	Punjabi (Gurmukhi script)	✅
`pl`	Polish	✅	✅
`pt`	Portuguese	✅	✅
`ro`	Romanian	✅	✅
`ru`	Russian	✅	✅
`sk`	Slovak	✅
`sl`	Slovenian	✅
`sv`	Swedish	✅	✅
`ta`	Tamil	✅
`te`	Telugu	✅
`th`	Thai	✅	✅
`tr`	Turkish	✅	✅
`uk`	Ukrainian	✅
`vi`	Vietnamese	✅	✅

Use a MAI-Transcribe model

You can use MAI‑Transcribe models with the LLM Speech API to generate transcriptions from audio input.

Note the following limitations when you use a MAI-Transcribe model:

Diarization isn't supported.
Prompt-tuning isn't supported.
Phrase list and transcribe style are supported only in mai-transcribe-1.5.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the Model.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart.

To use the MAI-Transcribe model, set the model property accordingly in the request.

curl --location 'https://YourResourceName.cognitiveservices.azure.com/speechtotext/transcriptions:transcribe?api-version=2025-10-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: <YourSpeechResourceKey>' \
--form 'audio=@"YourAudioFile.wav"' \
--form 'definition={
  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5"
  }
}'

Optionally, specify a language code in locales to force recognition in a single language. For example:

--form 'definition={
  "locales": ["en"],
  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5"
  }
}'

Optionally, for mai-transcribe-1.5, you can specify the style of the transcript output by using transcribeStyle. By default, the model returns a readability‑optimized transcript. You can set the value to verbatim to preserve the original spoken content, including filler words and disfluencies.

  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5",
    "transcribeStyle":"verbatim"
  }

Optionally, for mai-transcribe-1.5, you can add a list of phrases to increase accuracy in specialized domains by using phraseList. This implements entity biasing.

 --form 'definition={
   "phraseList": {
     "phrases": ["Contoso", "Jessie", "Rehaan"]
   },
   "enhancedMode": {
     "enabled": true,
     "model": "mai-transcribe-1.5"
   }
 }'

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the enhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the EnhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the enhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the EnhancedModeOptions object.

Use MAI-Transcribe with Voice Live

You can also use the MAI-Transcribe model for input audio transcription in the Voice Live API. Set the model field in the input_audio_transcription session configuration. For details, see How to customize Voice Live input and output.

For more information about using LLM Speech API, see LLM Speech API
MAI-Voice in Azure Speech
How to customize Voice Live input and output

Feedback

Was this page helpful?

Last updated on 2026-06-02