Skip to content

Azure AI Inference

Logfire supports instrumenting calls to Azure AI Inference with the logfire.instrument_azure_ai_inference() method.

from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

import logfire

client = ChatCompletionsClient(
    endpoint='https://my-endpoint.inference.ai.azure.com',
    credential=AzureKeyCredential('my-api-key'),
)

logfire.configure()
logfire.instrument_azure_ai_inference(client)

response = client.complete(
    model='gpt-4',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Please write me a limerick about Python logging.'},
    ],
)
print(response.choices[0].message.content)

With that you get:

  • a span around the call which records duration and captures any exceptions that might occur
  • Human-readable display of the conversation with the agent
  • details of the response, including the number of tokens used

Installation

Install Logfire with the azure-ai-inference extra:

pip install 'logfire[azure-ai-inference]'
uv add 'logfire[azure-ai-inference]'

Methods covered

The following methods are covered:

All methods are covered with both sync (azure.ai.inference) and async (azure.ai.inference.aio) clients.

Streaming Responses

When instrumenting streaming responses, Logfire creates two spans - one around the initial request and one around the streamed response.

from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

import logfire

client = ChatCompletionsClient(
    endpoint='https://my-endpoint.inference.ai.azure.com',
    credential=AzureKeyCredential('my-api-key'),
)

logfire.configure()
logfire.instrument_azure_ai_inference(client)

response = client.complete(
    model='gpt-4',
    messages=[{'role': 'user', 'content': 'Write Python to show a tree of files.'}],
    stream=True,
)
for chunk in response:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta and delta.content:
            print(delta.content, end='', flush=True)

Embeddings

You can also instrument the EmbeddingsClient:

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

import logfire

client = EmbeddingsClient(
    endpoint='https://my-endpoint.inference.ai.azure.com',
    credential=AzureKeyCredential('my-api-key'),
)

logfire.configure()
logfire.instrument_azure_ai_inference(client)

response = client.embed(
    model='text-embedding-ada-002',
    input=['Hello world'],
)
print(len(response.data[0].embedding))

Async Support

Async clients from azure.ai.inference.aio are fully supported:

from azure.ai.inference.aio import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

import logfire

client = ChatCompletionsClient(
    endpoint='https://my-endpoint.inference.ai.azure.com',
    credential=AzureKeyCredential('my-api-key'),
)

logfire.configure()
logfire.instrument_azure_ai_inference(client)

Global Instrumentation

If no client is passed, all ChatCompletionsClient and EmbeddingsClient classes (both sync and async) are instrumented:

import logfire

logfire.configure()
logfire.instrument_azure_ai_inference()