Edit

Deploy the extension for Agentic Retrieval in Foundry Local

After you complete the prerequisite steps, complete the steps in this article to deploy the Agentic Retrieval extension.

To try Agentic Retrieval without the need for local hardware, see Quickstart: Install Agentic Retrieval.

Important

Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prerequisites

Before you begin, complete the deployment prerequisites for Agentic Retrieval.

Deploy the extension

Deploy Agentic Retrieval by using either the Azure portal or Azure CLI.

  1. In the Azure portal, go to the Azure Kubernetes cluster on Azure Local.

  2. Select Settings > Extensions > + Add, and Agentic Retrieval from the list.

  3. Select Create.

  4. On the Basics tab, provide the following information:

    Field Value
    Subscription Select the subscription that contains your Azure Kubernetes Service (AKS) cluster on Azure Local.
    Resource group Select the resource group that contains your AKS Arc cluster.
    Deployment name Provide a name for the deployment.
    Region Select the region to deploy Agentic Retrieval.
    Cluster Select the cluster that you want to deploy Agentic Retrieval to.

    Screenshot of the basic tab with fields to enter the project and instance details.

  5. Select Next.

  6. On the Configurations tab, provide the following information:

    Field Value
    Capabilities Select one or both components to include in the deployment.
    Agentic Retrieval Engine Select this option to install the agentic retrieval engine.
    Knowledge sources layer Select this option to install the knowledge sources layer.
    Deployment mode
    Deployment mode Select GPU or CPU based on your available hardware. This setting applies to the Knowledge Sources layer.
    SharePoint Server
    Enable SharePoint data source Optional. If you want to connect to SharePoint by using Key Vault authentication, select this option.
    Key Vault name Required only when SharePoint ingestion is selected. Enter the Azure Key Vault name.
    KV cert secret name Required only when SharePoint ingestion is selected. Enter the Key Vault secret name that stores the certificate.
    KV cert password secret name Required only when SharePoint ingestion is selected. Enter the Key Vault secret name that stores the certificate password.
    Workload identity client ID Required only when SharePoint ingestion is selected. Enter the workload identity client ID (GUID).
    NFS kerberos authentication
    Enable kerberos authentication Optional. If you want to connect to an NFS server by using Kerberos authentication, select this option.
    Kerberos SPN Required only when Kerberos is selected. Enter the SPN in the format service/host@REALM (for example, nfs/edgerag-svc@CONTOSO.COM).
    Inference model
    Language model source Select Foundry Local or Bring your own.
    Application ID Required only when Foundry Local is selected.
    Language model name Required. Enter your deployed language model name.
    LLM endpoint Required. Enter your OpenAI-compatible endpoint URL. For example: https://<Foundry_Resource_Name>.openai.azure.com/openai/deployments/<model_name>/chat/completions?api-version=<API_VERSION>. For Foundry Local on Azure Local, use your cluster-internal endpoint.
    Max token (K) Required. Enter a value from 4K to 2048K.

    Screenshot of the configuration tab where you select the model type and other configurations.

  7. Select Next.

  8. On the Access tab, provide the following information:

    Field Value
    SSL settings
    SSL CNAME Enter the domain name for your system. The domain name should match the redirect URI used during app registration and must not include the https:// prefix. For example, arcrag.contoso.com.
    Kubernetes SSL secret name Enter the name of the Kubernetes secret to store the SSL certificate. By default, Agentic Retrieval uses a self-signed SSL certificate in this secret. After installation, you can replace it with a signed certificate.
    Entra ID
    Entra application ID Enter the application ID from the enterprise application you registered for authentication.
    Entra tenant ID Enter the tenant ID from the enterprise application you registered for authentication.

    Screenshot of the access tab with SSL settings and Entra application fields.

  9. Select Review + create.

  10. Review and validate the parameters you provided.

  11. Select Create to complete the Agentic Retrieval deployment.

  12. When the deployment is complete, under Extensions, validate that the extension types microsoft.arc.rag and microsoft.extensiondiagnostics are listed.

The Agentic Retrieval extension deployment typically takes about 30 minutes but can take longer depending on your connectivity.

Verify deployment by mode

After deployment, verify the components running in the arc-rag namespace match your selected deployment mode:

Mode Expected pods
Combined All Knowledge Layer pods (ingestionapi, inferencingflow, vectordb-api-server, embedding models, docling, milvus, postgres) + all Agentic Layer pods (agent-manager, agents-runtime, knowledge-sources, indexed-sources-mcp-server)
Agentic Agentic Layer pods only (agent-manager, agents-runtime, knowledge-sources, indexed-sources-mcp-server, postgres)
Knowledge Knowledge Layer pods only (ingestionapi, inferencingflow, vectordb-api-server, embedding models, docling, milvus, postgres)

Run the following command to check:

kubectl get pods -n arc-rag

Verify end-to-end connectivity

After deployment, verify that the extension can communicate with Foundry Local:

  1. Check Foundry model availability (port-forward since endpoint.enabled=false):

    kubectl port-forward svc/gpt-oss-20b 5000:5000 -n foundry-local-operator
    # In another terminal:
    curl http://localhost:5000/v1/models
    
  2. Test a chat completion via port-forward:

    curl -X POST http://localhost:5000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-oss-20b",
        "messages": [
          {"role": "user", "content": "Hello, what is 2+2?"}
        ],
        "temperature": 0.7,
        "max_tokens": 100
      }'
    
  3. Test the extension's inference endpoint:

    curl -X POST http://localhost:3001/edgeai/chat/completions?api-version=2024-10-01-preview \
      -H "Content-Type: application/json" \
      -H "x-user-role: dev" \
      -d '{
        "messages": [{"role": "user", "content": "Test question"}],
        "data_sources": [{"type": "milvus", "parameters": {"endpoint": "", "index_name": "edgeragapp"}}]
      }'
    

Configure post-deployment authentication

After deploying the Agentic Retrieval extension, complete the authentication task that matches your language model source: