Skip to main content

Command Palette

Search for a command to run...

Programmatically Comparing Draft vs Production Fabric Data Agent Responses

Using Fabric data agent Python SDK

Updated
S

Principal Program Manager, Microsoft Fabric CAT helping users and organizations build scalable, insightful, secure solutions. Blogs, opinions are my own and do not represent my employer.

Fabric data agent has a draft and a published mode. This helps the developer test the configurations before publishing it.

You can also use the data agent SDK to test the agent programmatically. You can learn more about it here and notebook samples from this repo. Let me show you how you can compare the data agent response from the two stages.

Imagine I am testing new instructions:

  • In Draft stage, I used agent instruction: Always return amounts rounded to nearest hundred, e.g. 1451 should be 1500, and 45,179 should be 45100

  • For published stage, the instructions are : Always return amounts with $xyz, e.g. $123.4

I should get same answer but formatted differently based on the instructions. Rounded number for draft and precise answer with a $ for production version.

Code

The trick is to set the stage ai_skill_stage= as “sandbox” vs “production”

%pip install fabric-data-agent-sdk --q

import time
from fabric.dataagent.client import FabricOpenAI

DATA_AGENT_NAME = "<DataAgentName>"
MODEL = "gpt-4o"

sbx  = FabricOpenAI(artifact_name=DATA_AGENT_NAME, ai_skill_stage="sandbox")
prod = FabricOpenAI(artifact_name=DATA_AGENT_NAME, ai_skill_stage="production")

asst_sbx  = sbx.beta.assistants.create(model=MODEL, instructions="You are the DRAFT (sandbox) data agent.").id
asst_prod = prod.beta.assistants.create(model=MODEL, instructions="You are the PUBLISHED (production) data agent.").id


def ask(client, assistant_id, q, *, timeout_s=300):
    tid = client.beta.threads.create().id
    client.beta.threads.messages.create(thread_id=tid, role="user", content=q)
    run = client.beta.threads.runs.create(thread_id=tid, assistant_id=assistant_id)

    end = time.time() + timeout_s
    while run.status not in {"completed", "failed", "cancelled", "expired", "incomplete"}:
        if time.time() > end:
            raise TimeoutError(f"timeout (status={run.status})")
        time.sleep(2)
        run = client.beta.threads.runs.retrieve(thread_id=tid, run_id=run.id)

    if run.status != "completed":
        raise RuntimeError(f"run status={run.status}")

    for m in client.beta.threads.messages.list(thread_id=tid, order="desc").data:
        if m.role == "assistant":
            return m.content[0].text.value
    return ""


def compare(q):
    return ask(sbx, asst_sbx, q), ask(prod, asst_prod, q)


q = "what's the total transaction amount"
draft, production = compare(q)

print("DRAFT:", draft)
print("\nPRODUCTION:", production)

Result

This is handy if you want to tune the data agent performance and compare it vs production before publishing.