Getting General Error when training language model

Question

Getting General Error when training language model

Quynh Huynh (NON EA SC ALT) 40 Microsoft Employee

Chinese language options are not appearing when I try to train a new speech model (only Cantonese option), so I switched to training a language model instead. However, after uploading plain-text (.txt) files and clicking Train, I receive a General Error with a PUT 400 response in the console.

User's image

Quynh Huynh (NON EA SC ALT) 40 Reputation points Microsoft Employee

2026-01-21T22:42:34.44+00:00

Thanks for the input. I discovered the issue was due to file size. The Language Training page states you can:

Upload a .txt, .utt, .ttml, or .srt file up to 500 KB.

However, it appears that the 500 KB limit applies to all files combined under the same language model—not per file. Could you clarify whether the Speech module supports Simplified Chinese? A total limit of 500 KB is quite small for our training needs.
Anshika Varshney 12,775 Reputation points Microsoft External Staff Moderator

2026-01-22T11:14:56.8666667+00:00
Hi Quynh Huynh (NON EA SC ALT),
Thanks for sharing the clarification that’s a helpful finding.

You’re correct that the 500 KB limit is enforced at the project/language level, not per individual file. When the combined size of all uploaded training files for a language exceeds that limit, training can fail with a generic error, which is admittedly not very discoverable from the current error messaging.

Regarding language support the Speech service does support Simplified Chinese (zh-CN), including speech recognition and related training scenarios, as documented in the supported languages list for Azure Speech. However, language availability and training constraints are handled separately from file-size validation, which is why the limitation can still surface even when the language itself is supported.

You’re also right that 500 KB total can be restrictive for real‑world training datasets. The current next step is to:

Split or trim training data to stay within the limit

Focus on high‑quality, representative utterances rather than volume

Use multiple intents/projects where applicable

Please let me know if the issue persists after these checks.

Thankyou!

2 answers

Your answer

Quynh Huynh (NON EA SC ALT) 40 Reputation points Microsoft Employee

2026-01-21T22:42:34.44+00:00

Thanks for the input. I discovered the issue was due to file size. The Language Training page states you can:

Upload a .txt, .utt, .ttml, or .srt file up to 500 KB.

However, it appears that the 500 KB limit applies to all files combined under the same language model—not per file. Could you clarify whether the Speech module supports Simplified Chinese? A total limit of 500 KB is quite small for our training needs.
Anshika Varshney 12,775 Reputation points Microsoft External Staff Moderator

2026-01-22T11:14:56.8666667+00:00

Hi Quynh Huynh (NON EA SC ALT),
Thanks for sharing the clarification that’s a helpful finding.

You’re correct that the 500 KB limit is enforced at the project/language level, not per individual file. When the combined size of all uploaded training files for a language exceeds that limit, training can fail with a generic error, which is admittedly not very discoverable from the current error messaging.

Regarding language support the Speech service does support Simplified Chinese (zh-CN), including speech recognition and related training scenarios, as documented in the supported languages list for Azure Speech. However, language availability and training constraints are handled separately from file-size validation, which is why the limitation can still surface even when the language itself is supported.

You’re also right that 500 KB total can be restrictive for real‑world training datasets. The current next step is to:

Split or trim training data to stay within the limit

Focus on high‑quality, representative utterances rather than volume

Use multiple intents/projects where applicable

Please let me know if the issue persists after these checks.

Thankyou!

Answer 1

Hello Quynh Huynh (NON EA SC ALT),

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you are getting general error when training language model.

To clear misconception above, Azure AI Speech – Custom Speech (speech‑to‑text model adaptation with plain‑text “language model” data), is different from Azure AI Language (CLU). Custom Speech uses plain text or structured text for language model adaptation, not labeled JSON datasets as in CLU.

Therefore, follow the below steps, each is tied to the links associated with them for more details:

Work in Speech Studio > Custom speech, not Language Studio/CLU. This is where you upload plain text to adapt the speech language model. - https://docs.azure.cn/en-us/ai-services/speech-service/speech-studio-overview and https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview
Projects are locale‑specific; you must create (or recreate) the project with zh‑CN (Mandarin, Simplified) or zh‑TW (Taiwanese Mandarin) as needed. You cannot change the locale later. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-create-project Also, verify that zh‑CN/zh‑TW support Custom Speech (language model adaptation). The official table shows zh‑CN/zh‑TW/zh‑HK are supported for Custom Speech; yue‑CN and wuu‑CN support limited customization (often plain text only). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
Some features/locales surface per Speech resource region. Create a new Speech resource in a broadly supported region (e.g., East US or West Europe), then create the Custom Speech project there; regions for Speech are listed here. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions and re‑check the language support matrix after switching; zh‑CN is listed with full Custom Speech support (Audio+Transcript / Plain text / Structured text). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
For “language model” training in Custom Speech, use plain text related to your domain, recommended size is 1–200 MB; structured text is also supported when your text follows patterns. Training with plain text usually finishes in minutes. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-test-and-train
Use the Training data tab and select Plain text (or Structured text) as the type. The “Upload training and testing datasets” article covers both local file and Azure Blob/SAS URL options. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-upload-data
From Train custom models, select the base model and the uploaded dataset(s) to train. This is the supported path for language‑model adaptation in Speech. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-train-model
If you still hit PUT 400 “General error”, get actionable diagnostics and unblock:

Check dataset status (accepted/rejected lines) and job status via Speech CLI—this often yields clearer errors than the UI. The official sample shows how to upload a Language dataset and check status (spx csr dataset status --wait).
Inspect logs/monitoring on your Speech resource (Azure Monitor/diagnostic logs) for failure details and set alerts.
Confirm locale <-> dataset type is supported for your chosen Chinese variant (e.g., yue‑CN supports plain text customization; zh‑CN/zh‑HK/zh‑TW support broader customization). A mismatch here can lead to failures.
Retry with another region or new project (same locale) if the error persists—transient or region‑specific issues are documented in community posts (training sometimes fails intermittently). If the failures continue, open a Support ticket with the activityId captured from the error to have the service team investigate.
Resource tier/quota sanity check: While F0 is supported, some users report odd behaviors that resolve after switching to Standard (S0); consult Speech quotas & limits if you suspect throttling or resource constraints, and request quota increase when needed.
Browser/session hygiene (only if the above don’t expose a service/data issue): Clear cache or try a different browser; this has resolved 400‑class upload issues for some users. - https://github.com/Azure/custom-speech-stt/blob/main/automate.md
- https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-quotas-and-limits

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Answer 2

Hi Quynh Huynh (NON EA SC ALT),

Thank you for reaching out on the Microsoft Q&A.

A general error during training a custom Azure AI Language (CLU / custom text classification / custom NER) model is usually not tied to a single misconfiguration. In most cases, this error appears when the service cannot complete validation or parsing of the training assets.

A few common areas worth reviewing:

Training data quality and format
Ensure that all uploaded files follow the documented schema exactly. This includes valid UTF‑8 encoding, correct JSON structure, and consistent labeling. Even a single malformed record or missing field can cause training to fail with a generic error.

Label consistency
Verify that all labels used in the training files are defined and consistently referenced. Mismatched or unused labels, or differences in casing, can result in training failures.

Data volume and distribution
Very small datasets, highly imbalanced label distribution, or labels with too few examples can cause training instability. While the service may accept the data upload, training can still fail silently at runtime.

Language and culture settings
Confirm that the project language matches the actual language of the training data. Mixed-language datasets or unsupported languages can also trigger generic errors.

Service limits and quotas
Check whether the Azure resource is hitting quota or regional limits (training jobs, storage, or concurrent operations). These conditions often surface as non-descriptive “general error” messages.

Please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

Hope this helps.

Share via

Getting General Error when training language model

2 answers

Your answer