![]() |
VOOZH | about |
dotnet add package Azure.AI.Inference --version 1.0.0-beta.5
NuGet\Install-Package Azure.AI.Inference -Version 1.0.0-beta.5
<PackageReference Include="Azure.AI.Inference" Version="1.0.0-beta.5" />
<PackageVersion Include="Azure.AI.Inference" Version="1.0.0-beta.5" />Directory.Packages.props
<PackageReference Include="Azure.AI.Inference" />Project file
paket add Azure.AI.Inference --version 1.0.0-beta.5
#r "nuget: Azure.AI.Inference, 1.0.0-beta.5"
#:package Azure.AI.Inference@1.0.0-beta.5
#addin nuget:?package=Azure.AI.Inference&version=1.0.0-beta.5&prereleaseInstall as a Cake Addin
#tool nuget:?package=Azure.AI.Inference&version=1.0.0-beta.5&prereleaseInstall as a Cake Tool
The client library (in preview) does inference, including chat completions, for AI models deployed by Azure AI Foundry and Azure Machine Learning Studio. It supports Serverless API endpoints and Managed Compute endpoints (formerly known as Managed Online Endpoints). The client library makes services calls using REST API version 2024-05-01-preview, as documented in Azure AI Model Inference API. For more information see Overview: Deploy AI models in Azure AI Foundry portal.
Use the model inference client library to:
With some minor adjustments, this client library can also be configured to do inference for Azure OpenAI endpoints. See samples with azure_openai in their name, in the samples folder.
Product documentation | Samples | API reference documentation | Package (NuGet) | SDK source code
https://your-host-name.your-azure-region.inference.ai.azure.com, where your-host-name is your unique model deployment host name and your-azure-region is the Azure region where the model is deployed (e.g. eastus2).Install the client library for .NET with NuGet:
dotnet add package Azure.AI.Inference --prerelease
The package makes use of common Azure credential providers. To use credential providers provided with the Azure SDK, please install the Azure.Identity package:
dotnet add package Azure.Identity
The package includes ChatCompletionsClient . It is created by providing your endpoint and credential information to the object:
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
All clients provide a get_model_info method to retrive AI model information. This makes a REST call to the /info route on the provided endpoint, as documented in the REST API reference.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
Response<ModelInfo> modelInfo = client.GetModelInfo();
Console.WriteLine($"Model name: {modelInfo.Value.ModelName}");
Console.WriteLine($"Model type: {modelInfo.Value.ModelType}");
Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}");
AI model information is cached in the client, and futher calls to get_model_info will access the cached value and wil not result in a REST API call.
The ChatCompletionsClient has a method named complete. The method makes a REST API call to the /chat/completions route on the provided endpoint, as documented in the REST API reference.
See simple chat completion examples below. More can be found in the samples folder.
The EmbeddingsClient has a method named embed. The method makes a REST API call to the /embeddings route on the provided endpoint, as documented in the REST API reference.
See simple text embedding example below. More can be found in the samples folder.
The REST API defines common model parameters for chat completions. If the model you are targeting has additional parameters you would like to set, the client library allows you easily do so. See Chat completions with additional model-specific parameters.
The request and response payloads of the Azure AI Model Inference API is mostly compatible with OpenAI REST APIs for chat completions. Therefore, with some minor adjustments, this client library can be configured to do inference using Azure OpenAI endpoints. See samples with azure_openai in their name, in the samples folder, and the comments there.
We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.
Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime
In the following sections you will find simple examples of:
The examples create a client as mentioned in Create and authenticate a client directly, using key. Only mandatory input settings are shown for simplicity.
See the Samples folder for full working samples for synchronous and asynchronous handling.
This example demonstrates how to generate a single chat completions, with key authentication.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);
The following types or messages are supported: SystemMessage,UserMessage, AssistantMessage, ToolMessage. See also samples:
UserMessage that includes sending an image URL or image data from a local file.ToolMessage.Alternatively, you can read a BinaryData object based on a JSON string instead of using the strongly typed classes like ChatRequestSystemMessage and ChatRequestUserMessage:
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
string jsonMessages = "{\"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"How many feet are in a mile?\"}]}";
BinaryData messages = BinaryData.FromString(jsonMessages);
requestOptions = ModelReaderWriter.Read<ChatCompletionsOptions>(messages);
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);
To generate completions for additional messages, simply call client.Complete multiple times using the same client.
This example demonstrates how to generate a single chat completions with streaming response, with key authentication.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
};
StreamingResponse<StreamingChatCompletionsUpdate> response = await client.CompleteStreamingAsync(requestOptions);
StringBuilder contentBuilder = new();
await foreach (StreamingChatCompletionsUpdate chatUpdate in response)
{
if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate))
{
contentBuilder.Append(chatUpdate.ContentUpdate);
}
}
System.Console.WriteLine(contentBuilder.ToString());
In the above foreach loop, the updates are progressively added to a string builder as they are streamed in, and then printed out once complete. The updates could be printed as they come in as well.
To generate completions for additional messages, simply call client.complete multiple times using the same client.
In this example, extra JSON elements are inserted at the root of the request body by setting AdditonalProperties when calling the Complete method. These are intended for AI models that require extra parameters beyond what is defined in the REST API.
Note that by default, the service will reject any request payload that includes unknown parameters (ones that are not defined in the REST API Request Body table). In order to change the default service behaviour, when the Complete method includes AdditonalProperties, the client library will automatically add the HTTP request header "unknown_params": "pass-through".
Azure_AI_Inference_ChatCompletionsWithAdditionalPropertiesScenario
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_CHAT_KEY"));
var client = new ChatCompletionsClient(endpoint, credential, new AzureAIInferenceClientOptions());
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("How many feet are in a mile?"),
},
AdditionalProperties = { { "foo", BinaryData.FromString("\"bar\"") } }, // Optional, add additional properties to the request to pass to the model
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Choices[0].Message.Content);
This example demonstrates how to get text embeddings, with key authentication, assuming endpoint and key are already defined.
var endpoint = new Uri(System.Environment.GetEnvironmentVariable("AZURE_AI_EMBEDDINGS_ENDPOINT"));
var credential = new AzureKeyCredential(System.Environment.GetEnvironmentVariable("AZURE_AI_EMBEDDINGS_KEY"));
var client = new EmbeddingsClient(endpoint, credential, new AzureAIInferenceClientOptions());
var input = new List<string> { "King", "Queen", "Jack", "Page" };
var requestOptions = new EmbeddingsOptions(input);
Response<EmbeddingsResult> response = client.Embed(requestOptions);
foreach (EmbeddingItem item in response.Value.Data)
{
List<float> embedding = item.Embedding.ToObjectFromJson<List<float>>();
Console.WriteLine($"Index: {item.Index}, Embedding: <{string.Join(", ", embedding)}>");
}
The length of the embedding vector depends on the model, but you should see something like this:
data[0]: length=1024, [0.0013399124, -0.01576233, ..., 0.007843018, 0.000238657]
data[1]: length=1024, [0.036590576, -0.0059547424, ..., 0.011405945, 0.004863739]
data[2]: length=1024, [0.04196167, 0.029083252, ..., -0.0027484894, 0.0073127747]
To generate embeddings for additional phrases, simply call client.embed multiple times using the same client.
Azure AI Inference client library supports tracing and metrics with OpenTelemetry. Refer to Azure SDK Diagnostics documentation for general information on OpenTelemetry support in Azure client libraries.
Distributed tracing and metrics with OpenTelemetry are supported in Azure AI Inference in experimental mode and could be enabled through either of these steps:
AZURE_EXPERIMENTAL_ENABLE_ACTIVITY_SOURCE environment variable to true.Azure.Experimental.EnableActivitySource context switch to true in your application codeRefer to Azure Monitor documentation on how to use Azure Monitor OpenTelemetry Distro.
With the Azure Monitor OpenTelemetry Distro, you only need to opt-into Azure SDK experimental telemetry features with one of the ways documented at the beginning of this section. The distro enables activity sources and meters for Azure AI Inference automatically.
The following section provides an example on how to configure OpenTelemetry and enable Azure AI Inference tracing and metrics if your OpenTelemetry distro does not include Azure AI Inference by default.
In this example we're going to export traces and metrics to console, and to the local OTLP destination. Aspire dashboard can be used for local testing and exploration.
To run this example, you'll need to install the following dependencies (HTTP tracing and metrics instrumentation as well as console and OTLP exporters):
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.Console
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
These packages also bring OpenTelemetry SDK as a dependency.
// Enables experimental Azure SDK observability
AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true);
// By default instrumentation captures chat messages without content
// since content can be very verbose and have sensitive information.
// The following AppContext switch enables content recording.
AppContext.SetSwitch("Azure.Experimental.TraceGenAIMessageContent", true);
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddHttpClientInstrumentation()
.AddSource("Azure.AI.Inference.*")
.ConfigureResource(r => r.AddService("sample"))
.AddConsoleExporter()
.AddOtlpExporter()
.Build();
using var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddHttpClientInstrumentation()
.AddMeter("Azure.AI.Inference.*")
.ConfigureResource(r => r.AddService("sample"))
.AddConsoleExporter()
.AddOtlpExporter()
.Build();
Check out OpenTelemetry .NET and your observability provider documentation on how to configure OpenTelemetry.
The complete, get_model_info methods raise a RequestFailedException for a non-success HTTP status code response from the service. The exception's code will hold the HTTP response status code. The exception's message contains a detailed message that may be helpful in diagnosing the issue:
try
{
client.Complete(requestOptions);
}
catch (RequestFailedException e)
{
Console.WriteLine($"Exception status code: {e.Status}");
Console.WriteLine($"Exception message: {e.Message}");
Assert.IsTrue(e.Message.Contains("Extra inputs are not permitted"));
}
To report issues with the client library, or request additional features, please open a GitHub issue here
Have a look at the Samples folder, containing fully runnable C# code for doing inference using synchronous and asynchronous methods.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 net5.0 was computed. net5.0-windows net5.0-windows was computed. net6.0 net6.0 was computed. net6.0-android net6.0-android was computed. net6.0-ios net6.0-ios was computed. net6.0-maccatalyst net6.0-maccatalyst was computed. net6.0-macos net6.0-macos was computed. net6.0-tvos net6.0-tvos was computed. net6.0-windows net6.0-windows was computed. net7.0 net7.0 was computed. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 netcoreapp2.0 was computed. netcoreapp2.1 netcoreapp2.1 was computed. netcoreapp2.2 netcoreapp2.2 was computed. netcoreapp3.0 netcoreapp3.0 was computed. netcoreapp3.1 netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 netstandard2.0 is compatible. netstandard2.1 netstandard2.1 was computed. |
| .NET Framework | net461 net461 was computed. net462 net462 was computed. net463 net463 was computed. net47 net47 was computed. net471 net471 was computed. net472 net472 was computed. net48 net48 was computed. net481 net481 was computed. |
| MonoAndroid | monoandroid monoandroid was computed. |
| MonoMac | monomac monomac was computed. |
| MonoTouch | monotouch monotouch was computed. |
| Tizen | tizen40 tizen40 was computed. tizen60 tizen60 was computed. |
| Xamarin.iOS | xamarinios xamarinios was computed. |
| Xamarin.Mac | xamarinmac xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos xamarinwatchos was computed. |
Showing the top 5 NuGet packages that depend on Azure.AI.Inference:
| Package | Downloads |
|---|---|
|
Microsoft.Extensions.AI.AzureAIInference
Implementation of generative AI abstractions for Azure.AI.Inference. |
|
|
AutoGen.AzureAIInference
Azure AI Inference Intergration for AutoGen. |
|
|
Aspire.Azure.AI.Inference
A client for Azure AI Inference SDK that integrates with Aspire, including logging and telemetry. |
|
|
Azure.Projects
Azure.Projects simplifies getting started with Azure in .NET. |
|
|
JS.Abp.AI.Azure
Package Description |
Showing the top 7 popular GitHub repositories that depend on Azure.AI.Inference:
| Repository | Stars |
|---|---|
|
microsoft/aspire
Aspire is the tool for code-first, extensible, observable dev and deploy.
|
|
|
danielgerlag/workflow-core
Lightweight workflow engine for .NET Standard
|
|
|
axzxs2001/Asp.NetCoreExperiment
原来所有项目都移动到**OleVersion**目录下进行保留。新的案例装以.net 5.0为主,一部分对以前案例进行升级,一部分将以前的工作经验总结出来,以供大家参考!
|
|
|
rwjdk/MicrosoftAgentFrameworkSamples
Samples demonstrating the Microsoft Agent Framework in C#
|
|
|
bingbing-gui/dotnet-agent-playbook
一个面向 .NET + AI Agent 开发的实践型仓库,涵盖 Web、云原生与微服务场景,聚焦智能应用的工程化落地。
|
|
|
microsoft/AIforITOps
Workshop for IT/Ops teams to learn how to manage AI-enabled applications on Microsoft Azure.
|
|
|
AzureCosmosDB/cosmosdb-nosql-copilot
Build a copilot application with Azure OpenAI Service, Azure Cosmos DB & Azure App Service.
|
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0-beta.5 | 839,915 | 5/14/2025 |
| 1.0.0-beta.4 | 962,352 | 3/19/2025 |
| 1.0.0-beta.3 | 162,254 | 2/14/2025 |
| 1.0.0-beta.2 | 385,306 | 10/24/2024 |
| 1.0.0-beta.1 | 191,601 | 8/7/2024 |