A catalog of AI models in Microsoft Foundry that you can discover, compare, and deploy using Azure’s built‑in tools for evaluation, fine‑tuning, and inference
Hello Kunal Singal,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you are having 403 Forbidden: Resource temporarily blocked on gpt-5.5 - and need How to configure for legitimate bursty traffic.
I can say that this error is rampart recently for now with many services.
This issue is not a normal quota or rate-limit problem. A standard Azure OpenAI quota/rate-limit issue normally appears as HTTP 429, while the customer’s error is HTTP 403 with the message “Your resource has been temporarily blocked because we detected unusual behavior.” This points to a service-side protection / abuse-monitoring block, not something that can be fixed by simply increasing TPM/RPM or changing max_tokens. Azure abuse monitoring evaluates both content signals and usage behavior patterns, including recurrence, severity, and potential misuse indicators. - https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/abuse-monitoring, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/quota
Best thing to do is to stop all traffic to the affected resource, then open an Azure technical support request asking Microsoft to perform a temporary block / unusual-behavior review using the APIM request ID, timestamps, resource name, region, deployment name, and the full 403 response. Azure support is the only reliable path to both unblock the resource and confirm the backend trigger category. - https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request
After Microsoft clears the block, the workload should be redesigned for bursty Codex / coding-agent traffic. If the workload remains on Standard or Global Standard deployment, enforce client-side rate shaping, backoff, jitter, request smoothing, and gradual ramp-up. If the workload is production-critical and legitimately bursty, move it to Provisioned Throughput (PTU) because PTU is designed for predictable throughput and latency with allocated model-processing capacity. - https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/provisioned-throughput, and https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/latency
I hope this is helpful! Do not hesitate to let me know if you have any other questions, steps or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.