Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
APPLIES TO: Developer | Basic | Basic v2 | Standard | Standard v2 | Premium | Premium v2
You can create a unified model API in Azure API Management to expose multiple LLM backends through a single client-facing endpoint. Client applications use one familiar API format - the OpenAI Chat Completions API - while API Management automatically translates requests to the backend models using OpenAI Chat Completions API or Anthropic Messages API.
Note
The unified model API is in preview and is currently rolling out to customers. In the classic tiers, early access to this feature is available through the AI Gateway Early release channel.
By centralizing model access behind a single API layer, you can:
- Standardize on a single API format for clients independently from the formats used by backend models.
- Unify observability, security, and governance with policies across model providers.
- Configure model failover across model providers.
- Decouple client-facing model names from backend model names using aliases.
To learn more about managing AI APIs in API Management, see AI gateway capabilities in Azure API Management.
Supported backends
The unified model API supports the following backend API formats:
- OpenAI Chat Completions API
- Anthropic Messages API
Prerequisites
- An existing API Management instance. Create one if you haven't already.
- One or more model deployments in a supported backend.
- To track token usage by the API, see Emit custom metrics for prerequisites.
- To enforce content safety checks on the API, see Enforce content safety checks on LLM requests for prerequisites.
Create a unified model API - Azure portal
Use the following steps to create a unified model API in API Management.
When you create the API, API Management automatically configures:
- A
/modelsendpoint for model discovery that lists all configured models. - A single routing endpoint such as
/llm/v1/chat/completionsthat accepts requests in the OpenAI Chat Completions format. - Format translation logic for each backend model you add.
- Backend resources that direct requests to the correct provider endpoint.
To create a unified model API:
In the Azure portal, go to your API Management instance.
In the sidebar menu, under APIs, select Models > + Add > Unified model API.
On the Configure Unified Model API tab:
- Enter a Display name for the API. API Management automatically generates an API Name based on the display name, but you can edit it if you want.
- In API path, enter the path that clients use to call the API. The default is
/llm/v1, which results in a chat completions endpoint at/llm/v1/chat/completions. - Optionally select one or more Products to associate with the API.
- Select Next.
On the Configure models tab, select + Add to open the Add model pane, then configure the following settings for each model deployment:
Under Backend configuration:
- In Model, enter the backend model name (for example,
gpt-4oorclaude-sonnet-4.6). - In API format, select the format the backend model expects, such as OpenAI Chat Completions API or Anthropic Messages API.
- In URL, enter the backend endpoint URL, for example, a model deployment in Foundry or, for other providers, the provider's API endpoint URL.
- In Model, enter the backend model name (for example,
Under Authorization credentials, select how API Management authenticates to the backend:
- Headers: Enter a Header name (for example,
api-keyorAuthorization) and the corresponding Header value (your API key or secret). - Managed Identity: For model deployments in Azure, you can use the instance's system-assigned managed identity or a user-assigned managed identity to authenticate to the backend.
For an explanation of settings for the managed identity, see the reference for the authentication-managed-identity policy.
- Headers: Enter a Header name (for example,
On the Manage token consumption tab, optionally configure the following policies to monitor and manage token usage:
On the Set up AI content safety tab, optionally configure the Azure AI Content Safety service to block prompts with unsafe content:
Select Review + create, then select Create.
Manage model aliases
Model aliases give clients a stable, provider-neutral name to use when calling a model. By assigning an alias like gpt or claude-sonnet, you decouple the client-facing model name from the actual backend deployment. When you upgrade a model or want to run an A/B test, you can update the alias target without any changes to client code.
Update or add a model alias
To update a model alias after creating the unified model API:
- In the Azure portal, go to your API Management instance, then select APIs.
- Select the unified model API.
- Select the Models tab to update or add a model alias.
- To update a client-facing alias, select the alias you want to update, then update the Backend configuration to specify the backend model. Add Authorization credentials for the new backend.
- To add a new model, select + Add and configure the backend, authorization, and client settings as described in the previous section.
- Select Save.
Discover model aliases
Developers can discover available models and their aliases by calling the /models endpoint of the unified model API. API Management returns a list of models with their client-facing aliases.
Call the API from a client application
Client applications can call the unified model API using any OpenAI-compatible SDK. Point the SDK's base URL at your API Management endpoint and use an API Management subscription key or another supported authentication method for authentication.
The following example uses the Python OpenAI SDK and passes an API Management subscription key in the header for authentication. The request body specifies a client-facing model alias configured in API Management, for example, gpt or claude-sonnet.:
from openai import OpenAI
client = OpenAI(
base_url="https://<apim-instance>.azure-api.net/llm/v1",
api_key="<api-management-subscription-key>",
)
# Specify the client-facing model alias
response = client.chat.completions.create(
model="gpt", # or "claude-sonnet", "gemini", or any other configured alias
messages=[{"role": "user", "content": "What can you do?"}],
)
print(response.choices[0].message.content)
To switch to a different backend model, change only the model value. No other code changes are required.