# Learning APIM Series (3) - Configuring multiple Azure OpenAI instances with APIM and managed identity

# Enable APIM managed identity

Enable APIM managed identity

Ensure you have enabled APIM managed identity on APIM resource.

alt text

# Prepare Azure OpenAI resource

Add role assignment to APIM managed identity

(1) Add role assignment with Cognitive Services OpenAI User role

alt text

(2) Assign to APIM managed identity

alt text

Note: You have to add role assignment on all Azure OpenAI resources

# Configure APIM with Azure OpenAI backends

Add Azure OpenAI backend

alt text

# Add Azure API with OpenAI

Add Azure API with OpenAI

(1) Add OpenAPI with Add API on APIM API pane

alt text

Create from Open API specification

(1) Select Full on Create from OpenAPI specification

(2) Download the OpenAPI specification from the URL https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2023-05-15/inference.json

(3) Modify the Server setting to the OpenAI endpoint url

alt text

(4) Upload the json file to OpenAPI setting

(5) Add openai as API URL suffix

(6) Add Starter and Unlimited as products

alt text

# Set the Inbound policy to add the managed identity token

Add the following policy to the Inbound policy

alt text

<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
    <value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>

alt text

# Add Inbound policy for Azure OpenAI redundancy

Add the following policy to the Inbound policy

<set-variable name="urlId" value="@{
    Random rnd = new Random();
    int urlId = rnd.Next(1, 3);
    return urlId.ToString();
}" />
<cache-lookup-value key="@("instance-01-is-down")" variable-name="instance-01-is-down" />
<cache-lookup-value key="@("instance-02-is-down")" variable-name="instance-02-is-down" />
<choose>
    <when condition="@(context.Variables.GetValueOrDefault<string>("urlId").Equals("1") && (!context.Variables.GetValueOrDefault<bool>("instance-01-is-down")))">
        <set-backend-service backend-id="aoai-jpe-i-01-20230814" />
    </when>
    <when condition="@(context.Variables.GetValueOrDefault<string>("urlId").Equals("2") && (!context.Variables.GetValueOrDefault<bool>("instance-02-is-down")))">
        <set-backend-service backend-id="aoai-jpe-i-02-20230814" />
    </when>
    <otherwise>
        <return-response>
            <set-status code="500" reason="InternalServerError" />
            <set-header name="Microsoft-Azure-Api-Management-Correlation-Id" exists-action="override">
                <value>@{return Guid.NewGuid().ToString();}</value>
            </set-header>
            <set-body>A gateway-related error occurred while processing the request.@(context.Variables.GetValueOrDefault&amp;amp;lt;string&amp;amp;gt;("urlId"))</set-body>
        </return-response>
    </otherwise>
</choose>

alt text

# Add retry Backend policy

Add the following policy to the Backend policy

<retry condition="@(context.Response.StatusCode == 429)" count="2" interval="1" max-interval="10" delta="1" first-fast-retry="true">
    <choose>
        <when condition="@(context.Response != null && (context.Response.StatusCode == 429))">
            <choose>
                <when condition="@(context.Variables.GetValueOrDefault<string>("urlId").Equals("1"))">
                    <cache-store-value key="@("instance-01-is-down")" value="@(true)" duration="10" />
                    <set-backend-service backend-id="aoai-jpe-i-02-20230814" />
                </when>
                <when condition="@(context.Variables.GetValueOrDefault<string>("urlId").Equals("2"))">
                    <cache-store-value key="@("instance-02-is-down")" value="@(true)" duration="10" />
                    <set-backend-service backend-id="aoai-jpe-i-01-20230814" />
                </when>
                <otherwise />
            </choose>
        </when>
        <otherwise />
    </choose>
    <forward-request />
</retry>

alt text

# Configure APIM subscription to accept `api-key` header and query parameter

alt text

# Test the APIM in a VM with VNet integration

Get the subscription key from APIM

alt text

Add APIM fqdn to hosts file

vi /etc/hosts

add the following line

10.0.4.4        apim-jpe-20230818.azure-api.net

alt text

Using curl to test the APIM

curl -X POST "https://apim-jpe-20230818.azure-api.net/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
     -H "Host: apim-jpe-20230818.azure-api.net" \
     -H "Content-Type: application/json" \
     -H "api-key: <your_APIM_subscription_key" \
     -d '{
           "model": "gpt-35-turbo",
           "messages": [
        	{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        	{"role": "user", "content": "Who were the founders of Microsoft?"}
    	   ]
         }'

alt text

# Reference

Azure OpenAI API 2023-05-15 Swagger spec
Authenticate with managed identity
api-management-policy-snippets-Back-end API redundancy.policy.xml

# Learning APIM Series (3) - Configuring multiple Azure OpenAI instances with APIM and managed identity

# Enable APIM managed identity

# Prepare Azure OpenAI resource

# Configure APIM with Azure OpenAI backends

# Add Azure API with OpenAI

# Set the Inbound policy to add the managed identity token

# Add Inbound policy for Azure OpenAI redundancy

# Add retry Backend policy

# Configure APIM subscription to accept api-key header and query parameter

# Test the APIM in a VM with VNet integration

# Reference

Enable Custom Translator with VNet integration

Learning APIM Series (4) - Enabling API activity diagnostics logs with APIM

# Configure APIM subscription to accept `api-key` header and query parameter