Edit

MAI-Transcribe in Azure Speech (preview)

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

MAI‑Transcribe models are speech recognition models developed by the Microsoft AI (MAI) Superintelligence team. These models are optimized for both high accuracy and high efficiency, and are available through the LLM Speech API.

The following models are supported:

  • mai-transcribe-1.5
  • mai-transcribe-1

Prerequisites

Language support

By default, the model operates in multi-lingual mode. The following languages are currently supported:

Language code Language MAI-Transcribe-1.5 support MAI-transcribe-1 support
ar Arabic
as Assamese
bg Bulgarian
bn Bengali
ca Catalan
cs Czech
da Danish
de German
el Greek
en English
es Spanish
et Estonian
fi Finnish
fr French
gu Gujarati
hi Hindi
hu Hungarian
id Indonesian
it Italian
ja Japanese
kn Kannada
ko Korean
lt Lithuanian
ml Malayalam
mr Marathi
nb Norwegian Bokmål
nl Dutch
or Odia
pa Punjabi (Gurmukhi script)
pl Polish
pt Portuguese
ro Romanian
ru Russian
sk Slovak
sl Slovenian
sv Swedish
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
vi Vietnamese

Use a MAI-Transcribe model

You can use MAI‑Transcribe models with the LLM Speech API to generate transcriptions from audio input.

Note the following limitations when you use a MAI-Transcribe model:

  • Diarization isn't supported.
  • Prompt-tuning isn't supported.
  • Phrase list and transcribe style are supported only in mai-transcribe-1.5.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the Model.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart.

To use the MAI-Transcribe model, set the model property accordingly in the request.

curl --location 'https://YourResourceName.cognitiveservices.azure.com/speechtotext/transcriptions:transcribe?api-version=2025-10-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: <YourSpeechResourceKey>' \
--form 'audio=@"YourAudioFile.wav"' \
--form 'definition={
  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5"
  }
}'

Optionally, specify a language code in locales to force recognition in a single language. For example:

--form 'definition={
  "locales": ["en"],
  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5"
  }
}'

Optionally, for mai-transcribe-1.5, you can specify the style of the transcript output by using transcribeStyle. By default, the model returns a readability‑optimized transcript. You can set the value to verbatim to preserve the original spoken content, including filler words and disfluencies.

  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1.5",
    "transcribeStyle":"verbatim"
  }

Optionally, for mai-transcribe-1.5, you can add a list of phrases to increase accuracy in specialized domains by using phraseList. This implements entity biasing.

 --form 'definition={
   "phraseList": {
     "phrases": ["Contoso", "Jessie", "Rehaan"]
   },
   "enhancedMode": {
     "enabled": true,
     "model": "mai-transcribe-1.5"
   }
 }'

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the enhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the EnhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the enhancedMode property.

To start using transcription with enhanced mode, first follow the LLM Speech quickstart. Then, specify the model in the EnhancedModeOptions object.

Use MAI-Transcribe with Voice Live

You can also use the MAI-Transcribe model for input audio transcription in the Voice Live API. Set the model field in the input_audio_transcription session configuration. For details, see How to customize Voice Live input and output.