azure-ai-inference-py

📁 microsoft/agent-skills 📅 Jan 29, 2026

总安装量

周安装量

#50042

全站排名

安装命令

npx skills add https://github.com/microsoft/agent-skills --skill azure-ai-inference-py

Agent 安装分布

gemini-cli 1

opencode 1

antigravity 1

qwen-code 1

github-copilot 1

windsurf 1

Skill 文档

Azure AI Inference SDK for Python

Client library for Azure AI model inference with chat completions and embeddings.

Installation

pip install azure-ai-inference

# With OpenTelemetry tracing
pip install azure-ai-inference[opentelemetry]

Environment Variables

# Inference endpoint
AZURE_INFERENCE_ENDPOINT=https://<resource>.services.ai.azure.com/models
AZURE_INFERENCE_CREDENTIAL=<your-api-key>  # If using API key

# Optional: specific model deployment
AZURE_INFERENCE_MODEL=gpt-4o-mini

Authentication

API Key

from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
import os

client = ChatCompletionsClient(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"])
)

Entra ID (Recommended)

from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

client = ChatCompletionsClient(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=DefaultAzureCredential()
)

Chat Completions

Basic Completion

from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="What is Azure AI?")
    ],
    model="gpt-4o-mini"  # Optional for single-model endpoints
)

print(response.choices[0].message.content)

Streaming Completions

response = client.complete(
    stream=True,
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Write a poem about Azure.")
    ]
)

for update in response:
    if update.choices:
        print(update.choices[0].delta.content or "", end="")

With Default Settings

# Set defaults at client creation
client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
    temperature=0.5,
    max_tokens=1000
)

# Defaults applied to all calls, can be overridden per-call
response = client.complete(
    messages=[UserMessage(content="Hello")],
    temperature=0.8  # Overrides default
)

Embeddings

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

client = EmbeddingsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"])
)

response = client.embed(
    input=["Your text string goes here"],
    model="text-embedding-3-small"
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Async Client

import asyncio
from azure.ai.inference.aio import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

async def main():
    client = ChatCompletionsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key)
    )
    
    response = await client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="What is Azure AI?")
        ]
    )
    
    print(response.choices[0].message.content)
    await client.close()

asyncio.run(main())

Model Information

# Get model info (Serverless API / Managed Compute only)
model_info = client.get_model_info()

print(f"Model name: {model_info.model_name}")
print(f"Model provider: {model_info.model_provider_name}")
print(f"Model type: {model_info.model_type}")

Using load_client

from azure.ai.inference import load_client
from azure.core.credentials import AzureKeyCredential

# Auto-detect client type based on endpoint
client = load_client(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

print(f"Created client of type: {type(client).__name__}")

Tool Calling

from azure.ai.inference.models import (
    SystemMessage, UserMessage, AssistantMessage, ToolMessage,
    ChatCompletionsToolDefinition, FunctionDefinition
)

tools = [
    ChatCompletionsToolDefinition(
        function=FunctionDefinition(
            name="get_weather",
            description="Get current weather for a location",
            parameters={
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        )
    )
]

response = client.complete(
    messages=[UserMessage(content="What's the weather in Seattle?")],
    tools=tools
)

# Handle tool calls in response
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    # Execute tool and send result back

Message Types

Type	Description
`SystemMessage`	System instructions for the model
`UserMessage`	User input (text, images, audio)
`AssistantMessage`	Model responses
`ToolMessage`	Tool execution results
`DeveloperMessage`	Developer-level instructions

Client Types

Client	Purpose
`ChatCompletionsClient`	Chat and text completions
`EmbeddingsClient`	Text and image embeddings
`load_client`	Auto-detect client from endpoint

Best Practices

Use Entra ID for production authentication
Set defaults at client creation for consistent behavior
Handle streaming for long responses to improve UX
Close async clients explicitly or use context managers
Specify model when endpoint serves multiple deployments
Use load_client when you don’t know the endpoint type
Cache model_info â it’s cached after first call

Reference Files

File	Contents
references/streaming.md	Streaming responses, async iteration, SSE patterns, error handling
references/tool-calling.md	Function/tool calling patterns, tool registry, multi-turn conversations
scripts/chat_completion.py	CLI for chat completions, embeddings, and interactive mode

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台