Rediger

Hosted agent runtime contract

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

A hosted agent is a container that fulfills a specific runtime contract with the Microsoft Foundry platform. This reference describes what the platform expects from your container and how the SDK adapter packages help you meet those requirements.

The SDK adapter packages implement the entire contract for you. If you use azure-ai-agentserver-responses or azure-ai-agentserver-invocations, you implement only your handler logic.

Contract requirements

Your container must:

Requirement Detail
Listen on port 8088 HTTP/1.1, plain HTTP. The platform terminates TLS.
Serve a health probe Return 200 OK from GET /readiness.
Implement a protocol endpoint Serve at least one of POST /responses or POST /invocations.
Consume platform environment variables Read the variables the platform injects at startup.
Shut down gracefully Flush writes and close connections on SIGTERM.

Protocol endpoints

A protocol defines the HTTP contract between Foundry and your agent container. Your container implements at least one protocol endpoint.

Responses protocol

The responses protocol implements the OpenAI Responses API. The platform sends requests to POST /responses and expects either a JSON response or a Server-Sent Events (SSE) stream.

Aspect Detail
Endpoint POST /responses
Input OpenAI Responses API request (input, model, stream, and so on)
Output JSON response object or SSE stream of response events
Conversation history Hydrated automatically by the SDK adapter when conversation.id is present
Streaming SSE with the text/event-stream content type

Use the responses protocol as the standard choice. It's compatible with the OpenAI API ecosystem.

Invocations protocol

The invocations protocol is a minimal pass-through protocol. You define the payload structure, and the platform passes it through without interpretation.

Aspect Detail
Endpoint POST /invocations
Input Any JSON payload your handler expects
Output Any JSON response or SSE stream
Conversation history Not managed. Your code handles state if needed.
Streaming Optional, through SSE

Use the invocations protocol when you need full control over the request and response payloads.

SDK adapter packages

The adapter packages are protocol-specific and framework-agnostic. They work with any agent framework, including Microsoft Agent Framework, LangGraph, and custom code.

Protocol Python package .NET package
Responses azure-ai-agentserver-responses Azure.AI.AgentServer.Responses
Invocations azure-ai-agentserver-invocations Azure.AI.AgentServer.Invocations

The adapter handles the following parts of the contract for you:

  • HTTP server setup on port 8088.
  • The health probe endpoint (GET /readiness).
  • Protocol-specific request parsing and response formatting.
  • Conversation history hydration (responses protocol).
  • SSE streaming infrastructure.
  • OpenTelemetry instrumentation.
  • Graceful shutdown on SIGTERM.
  • Platform environment variable consumption.

You implement a handler function that receives parsed requests and returns responses.

Handler examples

The complete bring-your-own samples for both protocols and both languages are in the foundry-samples repository.

Responses protocol example

This minimal handler forwards user input to a model from the Foundry model catalog through the Responses API. The SDK adapter hydrates conversation history automatically through context.get_history() (Python) or context.GetHistoryAsync() (C#), so the agent maintains context across turns.

From bring-your-own/responses/hello-world/main.py:

import asyncio
import os

from azure.ai.agentserver.responses import (
    CreateResponse,
    ResponseContext,
    ResponsesAgentServerHost,
    ResponsesServerOptions,
    TextResponse,
)
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# FOUNDRY_PROJECT_ENDPOINT is auto-injected in hosted Foundry containers and
# set by 'azd ai agent run' for local development.
_endpoint = os.environ["FOUNDRY_PROJECT_ENDPOINT"]
_model = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]

_project_client = AIProjectClient(
    endpoint=_endpoint, credential=DefaultAzureCredential()
)
_responses_client = _project_client.get_openai_client().responses

app = ResponsesAgentServerHost(
    options=ResponsesServerOptions(default_fetch_history_count=20),
)


@app.response_handler
async def handler(
    request: CreateResponse,
    context: ResponseContext,
    _cancellation_signal: asyncio.Event,
):
    user_input = await context.get_input_text() or "Hello!"
    history = await context.get_history()

    # Build the model input from prior conversation turns + the current message.
    input_items = []
    for item in history:
        # Map history items to {"role": ..., "content": ...} dicts; see the
        # full sample for the unpacking helper.
        ...
    input_items.append({"role": "user", "content": user_input})

    response = await asyncio.get_running_loop().run_in_executor(
        None,
        lambda: _responses_client.create(
            model=_model,
            instructions="You are a helpful AI assistant.",
            input=input_items,
            store=False,  # platform manages history; don't store at model level
        ),
    )

    return TextResponse(context, request, text=response.output_text)


app.run()

Reference: ResponsesAgentServerHost, AIProjectClient, DefaultAzureCredential

Invocations protocol example

With the invocations protocol, your handler receives whatever JSON the caller posts and returns whatever JSON your code chooses. There's no built-in conversation history.

Pattern from bring-your-own/invocations/hello-world:

from starlette.requests import Request
from starlette.responses import JSONResponse, Response
from azure.ai.agentserver.invocations import InvocationAgentServerHost

app = InvocationAgentServerHost()


@app.invoke_handler
async def handle_invoke(request: Request) -> Response:
    data = await request.json()
    message = data.get("message", "Hello!")
    return JSONResponse({"echo": message})


if __name__ == "__main__":
    app.run()

The full samples also include conversation-history hydration, error handling, telemetry, toolbox integration, and Dockerfile and agent.yaml setup.

Health probe

The platform sends GET /readiness to determine whether your container is ready to serve traffic. Return 200 OK when the container is ready, or a non-200 status to signal that the platform should restart the instance. The SDK adapters register this endpoint automatically.

Network and transport

Property Value
Protocol HTTP/1.1
Default port 8088 (override with the PORT environment variable)
Bind address 0.0.0.0 (all interfaces)
TLS Terminated by the platform. Your container serves plain HTTP.

Graceful shutdown

When the platform sends SIGTERM, your container stops accepting new requests, finishes in-flight requests, flushes pending writes to $HOME (the session filesystem), and exits cleanly. The SDK adapters handle this sequence automatically.

Platform environment variables

The platform injects environment variables into your container at startup. Your code can read the following key variables:

Variable Purpose
FOUNDRY_PROJECT_ENDPOINT Foundry project endpoint for API calls
FOUNDRY_AGENT_NAME The agent's name
FOUNDRY_AGENT_VERSION The agent's version
FOUNDRY_AGENT_SESSION_ID The current session ID