Programmatically Comparing Draft vs Production Fabric Data Agent Responses

Fabric data agent has a draft and a published mode. This helps the developer test the configurations before publishing it.

You can also use the data agent SDK to test the agent programmatically. You can learn more about it here and notebook samples from this repo. Let me show you how you can compare the data agent response from the two stages.

Imagine I am testing new instructions:

In Draft stage, I used agent instruction: Always return amounts rounded to nearest hundred, e.g. 1451 should be 1500, and 45,179 should be 45100
For published stage, the instructions are : Always return amounts with $xyz, e.g. $123.4

I should get same answer but formatted differently based on the instructions. Rounded number for draft and precise answer with a $ for production version.

Code

The trick is to set the stage ai_skill_stage= as “sandbox” vs “production”

%pip install fabric-data-agent-sdk --q

import time
from fabric.dataagent.client import FabricOpenAI

DATA_AGENT_NAME = "<DataAgentName>"
MODEL = "gpt-4o"

sbx  = FabricOpenAI(artifact_name=DATA_AGENT_NAME, ai_skill_stage="sandbox")
prod = FabricOpenAI(artifact_name=DATA_AGENT_NAME, ai_skill_stage="production")

asst_sbx  = sbx.beta.assistants.create(model=MODEL, instructions="You are the DRAFT (sandbox) data agent.").id
asst_prod = prod.beta.assistants.create(model=MODEL, instructions="You are the PUBLISHED (production) data agent.").id


def ask(client, assistant_id, q, *, timeout_s=300):
    tid = client.beta.threads.create().id
    client.beta.threads.messages.create(thread_id=tid, role="user", content=q)
    run = client.beta.threads.runs.create(thread_id=tid, assistant_id=assistant_id)

    end = time.time() + timeout_s
    while run.status not in {"completed", "failed", "cancelled", "expired", "incomplete"}:
        if time.time() > end:
            raise TimeoutError(f"timeout (status={run.status})")
        time.sleep(2)
        run = client.beta.threads.runs.retrieve(thread_id=tid, run_id=run.id)

    if run.status != "completed":
        raise RuntimeError(f"run status={run.status}")

    for m in client.beta.threads.messages.list(thread_id=tid, order="desc").data:
        if m.role == "assistant":
            return m.content[0].text.value
    return ""


def compare(q):
    return ask(sbx, asst_sbx, q), ask(prod, asst_prod, q)


q = "what's the total transaction amount"
draft, production = compare(q)

print("DRAFT:", draft)
print("\nPRODUCTION:", production)

Result

This is handy if you want to tune the data agent performance and compare it vs production before publishing.

Programmatically Comparing Draft vs Production Fabric Data Agent Responses

Code

Result

Comments

More from this blog

RAG in Fabric Notebook Using Microsoft Harrier Multilingual Text Embedding Model

Programmatically Retrieve Prep Data For AI Configuration of Semantic Models

Cross-referencing Notebooks In The Updated Fabric Notebook Copilot

Monitoring Power BI Modeling MCP Server Usage and Adoption

Command Palette

Code

Result

Comments

More from this blog