Share via

Getting General Error when training language model

Quynh Huynh (NON EA SC ALT) 40 Reputation points Microsoft Employee
2026-01-21T20:26:44.9266667+00:00

Chinese language options are not appearing when I try to train a new speech model (only Cantonese option), so I switched to training a language model instead. However, after uploading plain-text (.txt) files and clicking Train, I receive a General Error with a PUT 400 response in the console.

User's image

User's image

Azure AI Video Indexer
Azure AI Video Indexer

An Azure video analytics service that uses AI to extract actionable insights from stored videos.


2 answers

Sort by: Most helpful
  1. Sina Salam 29,846 Reputation points Volunteer Moderator
    2026-01-22T13:56:30.94+00:00

    Hello Quynh Huynh (NON EA SC ALT),

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are getting general error when training language model.

    To clear misconception above, Azure AI Speech – Custom Speech (speech‑to‑text model adaptation with plain‑text “language model” data), is different from Azure AI Language (CLU). Custom Speech uses plain text or structured text for language model adaptation, not labeled JSON datasets as in CLU.

    Therefore, follow the below steps, each is tied to the links associated with them for more details:

    1. Work in Speech Studio > Custom speech, not Language Studio/CLU. This is where you upload plain text to adapt the speech language model. - https://docs.azure.cn/en-us/ai-services/speech-service/speech-studio-overview and https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview
    2. Projects are locale‑specific; you must create (or recreate) the project with zh‑CN (Mandarin, Simplified) or zh‑TW (Taiwanese Mandarin) as needed. You cannot change the locale later. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-create-project Also, verify that zh‑CN/zh‑TW support Custom Speech (language model adaptation). The official table shows zh‑CN/zh‑TW/zh‑HK are supported for Custom Speech; yue‑CN and wuu‑CN support limited customization (often plain text only). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
    3. Some features/locales surface per Speech resource region. Create a new Speech resource in a broadly supported region (e.g., East US or West Europe), then create the Custom Speech project there; regions for Speech are listed here. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions and re‑check the language support matrix after switching; zh‑CN is listed with full Custom Speech support (Audio+Transcript / Plain text / Structured text). - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
    4. For “language model” training in Custom Speech, use plain text related to your domain, recommended size is 1–200 MB; structured text is also supported when your text follows patterns. Training with plain text usually finishes in minutes. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-test-and-train
    5. Use the Training data tab and select Plain text (or Structured text) as the type. The “Upload training and testing datasets” article covers both local file and Azure Blob/SAS URL options. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-upload-data
    6. From Train custom models, select the base model and the uploaded dataset(s) to train. This is the supported path for language‑model adaptation in Speech. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-train-model
    7. If you still hit PUT 400 “General error”, get actionable diagnostics and unblock:
    • Check dataset status (accepted/rejected lines) and job status via Speech CLI—this often yields clearer errors than the UI. The official sample shows how to upload a Language dataset and check status (spx csr dataset status --wait).
    • Inspect logs/monitoring on your Speech resource (Azure Monitor/diagnostic logs) for failure details and set alerts.
    • Confirm locale <-> dataset type is supported for your chosen Chinese variant (e.g., yue‑CN supports plain text customization; zh‑CN/zh‑HK/zh‑TW support broader customization). A mismatch here can lead to failures.
    • Retry with another region or new project (same locale) if the error persists—transient or region‑specific issues are documented in community posts (training sometimes fails intermittently). If the failures continue, open a Support ticket with the activityId captured from the error to have the service team investigate.
    • Resource tier/quota sanity check: While F0 is supported, some users report odd behaviors that resolve after switching to Standard (S0); consult Speech quotas & limits if you suspect throttling or resource constraints, and request quota increase when needed.
    • Browser/session hygiene (only if the above don’t expose a service/data issue): Clear cache or try a different browser; this has resolved 400‑class upload issues for some users. - https://github.com/Azure/custom-speech-stt/blob/main/automate.md

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    Was this answer helpful?

    0 comments No comments

  2. Anshika Varshney 12,775 Reputation points Microsoft External Staff Moderator
    2026-01-21T22:14:46.5466667+00:00

    Hi Quynh Huynh (NON EA SC ALT),

    Thank you for reaching out on the Microsoft Q&A.

    A general error during training a custom Azure AI Language (CLU / custom text classification / custom NER) model is usually not tied to a single misconfiguration. In most cases, this error appears when the service cannot complete validation or parsing of the training assets.

    A few common areas worth reviewing:

    Training data quality and format
    Ensure that all uploaded files follow the documented schema exactly. This includes valid UTF‑8 encoding, correct JSON structure, and consistent labeling. Even a single malformed record or missing field can cause training to fail with a generic error.

    Label consistency
    Verify that all labels used in the training files are defined and consistently referenced. Mismatched or unused labels, or differences in casing, can result in training failures.

    Data volume and distribution
    Very small datasets, highly imbalanced label distribution, or labels with too few examples can cause training instability. While the service may accept the data upload, training can still fail silently at runtime.

    Language and culture settings
    Confirm that the project language matches the actual language of the training data. Mixed-language datasets or unsupported languages can also trigger generic errors.

    Service limits and quotas
    Check whether the Azure resource is hitting quota or regional limits (training jobs, storage, or concurrent operations). These conditions often surface as non-descriptive “general error” messages.

    Please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.

    Hope this helps.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.