An Azure video analytics service that uses AI to extract actionable insights from stored videos.
Hello Quynh Huynh (NON EA SC ALT),
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you are getting general error when training language model.
To clear misconception above, Azure AI Speech – Custom Speech (speech‑to‑text model adaptation with plain‑text “language model” data), is different from Azure AI Language (CLU). Custom Speech uses plain text or structured text for language model adaptation, not labeled JSON datasets as in CLU.
Therefore, follow the below steps, each is tied to the links associated with them for more details:
- Work in Speech Studio > Custom speech, not Language Studio/CLU. This is where you upload plain text to adapt the speech language model. - https://docs.azure.cn/en-us/ai-services/speech-service/speech-studio-overview and https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview
- Projects are locale‑specific; you must create (or recreate) the project with zh‑CN (Mandarin, Simplified) or zh‑TW (Taiwanese Mandarin) as needed. You cannot change the locale later. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-create-project Also, verify that zh‑CN/zh‑TW support Custom Speech (language model adaptation). The official table shows zh‑CN/zh‑TW/zh‑HK are supported for Custom Speech; yue‑CN and wuu‑CN support limited customization (often plain text only). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- Some features/locales surface per Speech resource region. Create a new Speech resource in a broadly supported region (e.g., East US or West Europe), then create the Custom Speech project there; regions for Speech are listed here. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions and re‑check the language support matrix after switching; zh‑CN is listed with full Custom Speech support (Audio+Transcript / Plain text / Structured text). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- For “language model” training in Custom Speech, use plain text related to your domain, recommended size is 1–200 MB; structured text is also supported when your text follows patterns. Training with plain text usually finishes in minutes. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-test-and-train
- Use the Training data tab and select Plain text (or Structured text) as the type. The “Upload training and testing datasets” article covers both local file and Azure Blob/SAS URL options. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-upload-data
- From Train custom models, select the base model and the uploaded dataset(s) to train. This is the supported path for language‑model adaptation in Speech. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-train-model
- If you still hit PUT 400 “General error”, get actionable diagnostics and unblock:
- Check dataset status (accepted/rejected lines) and job status via Speech CLI—this often yields clearer errors than the UI. The official sample shows how to upload a Language dataset and check status (
spx csr dataset status --wait). - Inspect logs/monitoring on your Speech resource (Azure Monitor/diagnostic logs) for failure details and set alerts.
- Confirm locale <-> dataset type is supported for your chosen Chinese variant (e.g., yue‑CN supports plain text customization; zh‑CN/zh‑HK/zh‑TW support broader customization). A mismatch here can lead to failures.
- Retry with another region or new project (same locale) if the error persists—transient or region‑specific issues are documented in community posts (training sometimes fails intermittently). If the failures continue, open a Support ticket with the activityId captured from the error to have the service team investigate.
- Resource tier/quota sanity check: While F0 is supported, some users report odd behaviors that resolve after switching to Standard (S0); consult Speech quotas & limits if you suspect throttling or resource constraints, and request quota increase when needed.
- Browser/session hygiene (only if the above don’t expose a service/data issue): Clear cache or try a different browser; this has resolved 400‑class upload issues for some users. - https://github.com/Azure/custom-speech-stt/blob/main/automate.md
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.