Create Self Service AI using Azure Foundry with user token monitoring and limit

👁 Image

Stavros Koureas 16 Reputation points

There is the need to have a process where a user requests access for AI usage, we as administrators approve requests, the users get the api keys, be aware of available APIs and configure them within AI Tools.

The only close way to do this seems to be Azure API Management Service connected with Azure Foundry (aka AI Gateway). There we can define the available APIs like completions, responses, messages, etc and configure polices.

One challenge is that in order this to work the Client should sent the subscription key either via Header or via Query parameter while the Subscription required is set to true.

👁 APIM

This works nice from Postman by sending the subscription key as header Ocp-Apim-Subscription-Key, or as header api-key as long it matches with the configuration in APIM.

But there are some AI Tools like VSCode which emmit the Header as "Authorization" and even this matches with configuration in APIM, it would be unable to parse the key as the Header "Authorization" is combination of "Bearer" and the key, like the below:

Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

So I tried tone of ways like building an inbound policy like the below, which extracts the key even from "Authorization" header from the part after the Bearer, or from "api-key" header or from "Ocp-Apim-Subscription-Key" header. Then I try to override the Ocp-Apim-Subscription-Key as I need the APIM update the context.Subscription?.Key, but this does not happen.

 <set-variable name="clientKey" value="@{
 var auth = context.Request.Headers.GetValueOrDefault("Authorization", "");
 if (auth.StartsWith("Bearer ")) {
 var token = auth.Substring(7).Trim();
 if (token != "") { return token; }
 }

 var apiKey = context.Request.Headers.GetValueOrDefault("api-key", "").Trim();
 if (apiKey != "") { return apiKey; }

 var subKey = context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key", "").Trim();
 if (subKey != "") { return subKey; }

 return "";
 }" />
 <set-header name="Ocp-Apim-Subscription-Key" exists-action="override">
 <value>@((string)context.Variables["clientKey"])</value>
 </set-header>

So for debugging purposes, if we use the below outbound policy, we will see that the clientKey is populated

<return-response>
	<set-status code="200" reason="DEBUG" />
	<set-body>@((string)context.Variables["clientKey"])</set-body>
</return-response>

But if we use the below outbound policy, we will see that the context.Subscription is null

<return-response>
	<set-status code="200" reason="DEBUG" />
	<set-body>
	 @{ return "Subscription Key=" + (context.Subscription?.Key); }
	</set-body>
</return-response>

So it seems that the evaluation of context.Subscription happends before and does not re-evaluated.
As a result apps like VSCode GitHub Copilot will be unable to be validated properly and also monitored.

The policy works to extract subscription key and use it for example to limit the tokens per subscription based on another policy on this variable but it does not contribute into APIMs internal mechanism.

👁 APIM2

Any idea?

👁 Image
Pravallika KV 17,025 Reputation points • Microsoft External Staff • Moderator
Hi @Stavros Koureas ,

Thanks for reaching out to Microsoft Q&A.

It sounds like you’re trying to build a self-service / governance flow for AI usage on Azure AI Foundry using AI Gateway (APIM), where:

Users request access

Admins approve

Users get an API key / token

APIM (via AI Gateway) enforces limits/quotas and provides monitoring per user/subscription

And the specific issue is that when clients (ex: VS Code Copilot) send credentials in an Authorization: Bearer <key> format, your attempt to extract the key in an APIM policy (and set Ocp-Apim-Subscription-Key) successfully for your own custom logic works but APIM’s built-in “subscription” evaluation / internal throttling + metrics still doesn’t use it (so context.Subscription stays null).

What's happening:

In APIM, the platform’s subscription resolution and the population of context.Subscription are typically done as part of the APIM pipeline before your later inbound policy logic runs (or at least before the internal mechanisms that depend on that resolved subscription get updated). So even if you:

successfully parse the key into a variable (clientKey), and

override/set Ocp-Apim-Subscription-Key,

…the APIM internals that determine context.Subscription (and therefore subscription-based throttling/monitoring) may not be re-evaluated based on that overridden value. That matches your “clientKey is populated in debug, but context.Subscription is null” result.

From the Foundry “AI Gateway (APIM behind the scenes)” documentation, the key point is that AI Gateway uses APIM to govern, secure, and monitor Foundry resources and that token limits can be applied (including per-project controls) once AI Gateway is configured. In other words, you want the traffic to be governed by APIM’s normal mechanisms rather than just your custom variable parsing.

So, the recommendation to align with APIM’s expected subscription validation path is:

Ensure the client uses the header/query format APIM is configured to require

Your own screenshots show you can configure APIM to require:

Header name: Ocp-Apim-Subscription-Key

Query parameter: Ocp-Apim-Subscription-Key

If a client can’t be configured to send that header exactly (because it always uses Authorization: Bearer ...), then APIM may never correctly associate the request to a subscription, which in turn breaks subscription-scoped monitoring/throttling.

If the client can only send Authorization: Bearer ..., then treat this as an authentication mapping problem

Your policy extraction works for your own custom throttling variables, but it doesn’t hook into APIM’s subscription mechanism in time (your context.Subscription stays null).

So, for Copilot/VS Code style clients, the usual framing is:

you need a supported way to map the presented bearer token/identity to an APIM “subscription” (or to an APIM policy mechanism that doesn’t rely on context.Subscription)

otherwise APIM’s built-in subscription-based quotas/metrics won’t work as expected.

Because the provided documentation set doesn’t include any APIM policy pattern or official guidance for “rewrite Bearer -> make APIM treat it as Ocp-Apim-Subscription-Key for context.Subscription”, I can’t accurately suggest a specific policy approach that will reliably repopulate context.Subscription.

Confirm you’re using AI Gateway correctly and validating via APIM metrics/logs

AI Gateway verification steps include checking:

APIM Monitoring → Metrics → Requests

APIM Monitoring → Logs using the GatewayLogs table

ensuring any configured token limits produce 429 Too Many Requests when exceeded

That validation helps distinguish “gateway isn’t seeing the request properly” vs “APIM sees the request but can’t attribute it to a subscription.”
👁 Image

Stavros Koureas 16 Reputation points

Hi @Pravallika KV , this is exacly the case and the problem the api key cannot mapped properly, it would be nice to have more flexibility into this part f.e. use again policy into this level to extract subscription key. This way it would be possible also to check other AI Tools like VSCode OpenAI Codex and VSCode Claude Code etc.

Currently I have build my own custom trottling policy respecting my own variable and not using context.Subscription and the meter key. But still there are more issues like, the Developer portal would be unable to report to the user his statistics. Developer Portal statistics identifies user and subscription though this internal mechanism.

I tested Azure Foundry and it works both with header "api-key" and header"Authorization" -> "Bearer xxxxxxxxxxxxx". So it make sense to cover this too as it is AI Gateway for Foundry.

I tested many AI IDE Tools like VSCode GitHub Copilot extension, OpenAI Codex extension, Anthropic Claude Code extension and and AI Desktop tools like OpenAI Codex desktop and Anthropic Claude Code desktop and all emmit header"Authorization" -> "Bearer xxxxxxxxxxxxx".

Also an idea was to use Azure FrontDoor in front of APIM to manipulate the header and remove the "Bearer " part but Azure FrontDoor has similar limitations, it cannot override the header with regex but only to match if there is a regex format.
👁 Image
Anshika Varshney 13,405 Reputation points • Microsoft External Staff • Moderator
Hi @Stavros Koureas

Thank you for the detailed feedback and for sharing your findings from testing various AI tools and authentication patterns.

I understand the challenge you're describing. Many AI clients and IDE integrations (such as OpenAI-compatible tools, coding assistants, and desktop AI applications) commonly send credentials using the:

Authorization: Bearer <token>

header rather than the subscription key mechanisms traditionally used by API Management subscriptions.

Given your testing results, it makes sense that additional flexibility in how APIM identifies and maps callers could simplify scenarios involving Azure AI Foundry and other OpenAI-compatible services.

Your observations regarding:

Custom throttling policies based on your own variables rather than context.Subscription

Difficulty exposing usage statistics through the Developer Portal

The need to support both api-key and Authorization: Bearer authentication patterns

Front Door limitations around header transformations

provide useful real-world feedback for this scenario.

From a practical standpoint, your current custom throttling approach appears to be a reasonable workaround when subscription-based identification doesn't align with the authentication model used by the client application.

Since you've already validated that Azure AI Foundry accepts both API key and Bearer token authentication methods, documenting these findings and sharing them as product feedback may help the team better understand the requirements for AI Gateway and OpenAI-compatible client integrations.

Thank you for sharing the testing results across multiple AI tools and IDE extensions. This additional context is helpful for others who may be implementing similar architectures using Azure AI Foundry, APIM, and AI Gateway capabilities.

Do let me know if you have any further queries.

Thankyou!
👁 Image

Stavros Koureas 16 Reputation points

Hello @Anshika Varshney , thanks for your input, but I am wondering how can I proceed from there as there is no way that everything works as expected and we would like to continue using Azure Foundry in my organization.

Can you pass this feedback to Azure API Management Service engineering team?

I think it would be easy to implement either a way by giving policy or regex or just define the authentication scheme.
👁 Image
Stavros Koureas 16 Reputation points
After a lot of hours these days we managed to find a workarround.

We used Azure FrontDoor in front of the APIM and we manipulated the header.

Manipulation took us a while to understand how this works in FrontDoor, the solution is to match header Authorization by starting with Bearer and then override this with this magic expression:

{http_req_header_authorization:7}

Definitely it would be nice if Azure fix this, but for now it makes the trick.

URL: https://learn.microsoft.com/en-us/answers/questions/5922844/create-self-service-ai-using-azure-foundry-with-us

⇱ Create Self Service AI using Azure Foundry with user token monitoring and limit - Microsoft Q&A

Create Self Service AI using Azure Foundry with user token monitoring and limit

Your answer