Quick Test : Fabric AI Skills As A Function

Can AI Skills be used as a function to generate data?

In the last blog I wrote, I showed how to call the AI Skills endpoint in a Fabric notebook. Being able to call the endpoint programmatically creates many opportunities to integrate AI Skills in different applications. One that I thought of was using AI Skills as a function. Function Calling or Tools is a specific use case in Gen AI to create structured output based on a function or behavior instructed by the user. I am not referring to that as AI Skills can only return a table. Instead, what if you created a number of these AI Skills that are grounded in your data with specific functions built to get the intended output? You could serve/share these with end users who can call these functions to generate data/results. Think of these as macros in Excel. See the example below:

I have a table of customers and I need to get the total order amount for each customer for last 5 transactions and their last order date. Yes, you can easily write a Python function to achieve this but what if I could use natural language to get the data I need for my specific use case without knowing Python or any other language? AI Skills to the rescue.

NameLastN
0Krystal Lu11
1Anne Hernandez9
2Richard Bailey8
3Carlos Carter3
4Xavier Roberts12
5Rebecca Wright6

Structured Output

Below function is similar to the one I shared in the last blog with couple of changes:

  • Still using Pydantic but this time I am including schema and data validation to ensure the amount returned cannot be negative and date can be parse as a date. AI Skills returns the date as a string. This ensures that my downstream application receives valid (not necessarily accurate) data.

  • The function returns the results as a JSON in the format:

      #sample output
      {
          "headers": ["TotalOrderAmount", "LastPurchaseDate"],
          "data": [
              {
                  "TotalOrderAmount": 4347.0,
                  "LastPurchaseDate": "2021-10-18"
              },
              {
                  "TotalOrderAmount": 1234.56,
                  "LastPurchaseDate": null
              },
              {
                  "TotalOrderAmount": 5678.9,
                  "LastPurchaseDate": "2023-05-12"
              }
          ]
      }
    

    This is to ensure I can use it further downstream in another applications, just a nice to have.

    Ask AI Skills:

      %pip install semantic-link pydantic --q
    
      import sempy.fabric as fabric
      from pydantic import BaseModel, field_validator
      from typing import List, Optional, Union, Any
      import pandas as pd
      import requests
      from datetime import datetime
    
      client = fabric.FabricRestClient()
    
      def ask_aiskill(workspace: str, aiskill: str, question: str) -> Optional[dict]:
          """
          Sandeep Pawar  |  fabric.guru  |  Aug-08-2024  | v0.5
          Returns results from AI Skills as a valid JSON object
          Parameters:
           - workspace (str): The workspace id
           - aiskill (str): AI skill id
           - question (str): The user question to pass to the AI skill in English.
    
          Returns:
           - JSON object with headers and validated values
    
          """
    
          class ModelDetails(BaseModel):
              sqlQueryVariations: int
              showExecutedSQL: bool
              includeSQLExplanation: bool
              enableExplanations: bool
              enableSemanticMismatchDetection: bool
              executeSql: bool
              enableBlockAdditionalContextByLength: bool
              additionalContextLanguageDetection: bool
              fewShotExampleCount: int
    
          class Response(BaseModel):
              artifactId: str
              promptContextId: str
              queryEndpoint: str
              result: Union[str, float]
              ResultRows: Any 
              ResultHeaders: List[str]
              ResultTypes: List[str]
              executedSQL: str
              additionalMessage: Optional[Union[str, dict]]
              prompt: str
              userQuestion: str
              modelBehavior: ModelDetails
              additionalContext: str
    
          class ValidatedRow(BaseModel): ## for schema and dta validation
              TotalOrderAmount: Optional[float]
              LastPurchaseDate: Optional[str]
    
              @field_validator('TotalOrderAmount')
              def check_total_order_amount(cls, value):
                  if value is not None and value < 0:
                      raise ValueError('TotalOrderAmount cannot be negative')
                  return value
    
              @field_validator('LastPurchaseDate', mode='before')
              def validate_last_purchase_date(cls, value):
                  if value is None:
                      return value
                  try:
                      datetime.strptime(value, '%Y-%m-%d')
                  except ValueError:
                      raise ValueError('LastPurchaseDate must be a valid date in yyyy-mm-dd format or null')
                  return value
    
          payload = {"userQuestion": question}
          try:
              wsid = fabric.resolve_workspace_id(workspace)
          except:
              print("Check workspace name/id")    
          url = f"v1/workspaces/{wsid}/aiskills/{aiskill}/query/deployment"
    
          try:
              response = client.post(url, json=payload)
              result = response.json()
              aiskill_response = Response(**result)
    
              # Responses
              result_header = aiskill_response.ResultHeaders
              result_rows = aiskill_response.ResultRows
    
              # Validate and construct JSON
              validated_data = []
              for row in result_rows:
                  validated_row = ValidatedRow(
                      TotalOrderAmount=row[0],
                      LastPurchaseDate=row[1]
                  )
                  validated_data.append(validated_row.dict())
    
              return {'headers': result_header, 'data': validated_data}
    
          except requests.exceptions.HTTPError as h_err:
              print(f"HTTP error: {h_err}")
          except requests.exceptions.RequestException as r_err:
              print(f"Request error: {r_err}")
          except Exception as err:
              print(f"Error: {err}")
    
          return None
    
  • Use AI Skills as a Function:

    Apply above AI Skills as a function to below df:

  •     from tqdm import tqdm
        import pandas as pd
    
        def apply_aiskills(df, workspace, aiskill):
            """
            Apply the AI Skills to the specified df to generate results, experimental.
    
            """
            all_results = []  
            for _, row in tqdm(df.iterrows(), total=df.shape[0], desc="Processing Rows"):
                name = row['Name']
                last_n = row['LastN']
                question = f"total order amount by {name} and date of last purchase based on last {last_n} orders"
                result = ask_aiskill(workspace, aiskill, question)
                if result and 'data' in result:
                    for item in result['data']:
                        result_df = pd.DataFrame({
                            'Name': [name],
                            'TotalOrderAmount': [item.get('TotalOrderAmount')],
                            'LastPurchaseDate': [item.get('LastPurchaseDate')]
                        })
                        all_results.append(result_df)
                else:
                    result_df = pd.DataFrame({
                        'Name': [name],
                        'TotalOrderAmount': [None],
                        'LastPurchaseDate': [None]
                    })
                    all_results.append(result_df)
    
            # Combne teh dfs
            results_df = pd.concat(all_results, ignore_index=True)
            return results_df
    
        workspace = "fd91c6-xxxxx-89dbecdb27ea"
        aiskill = "846438f3-xxxxx-6210"
    
        results_df = apply_aiskills(df, workspace, aiskill)
        results_df
    

Here is the SQL query generated by AI Skills:

SELECT SUM(CAST(Quantity AS DECIMAL) * CAST(UnitPrice AS DECIMAL)) AS TotalOrderAmount, MAX(CAST(OrderDate AS DATE)) AS LastOrderDate
FROM (
    SELECT TOP 5 Quantity, UnitPrice, OrderDate
    FROM [dbo].[sales]
    WHERE FirstName = 'Krystal' AND LastName = 'Lu'
    ORDER BY OrderDate DESC
) subquery;

There you have it. Again, in this case it would have been easier to just write a Python function but here are the applications where I think this could potentially be used for :

  • Being able to ask natural language questions and generate structured data

  • Integrate in other applications

  • This AI Skills is grounded in lakehouse data but like Azure AI Copilot Studio or Google AI Studio what if AI Skills also supported unstructured data? That would be a game changer. Being able to generate structured responses like above from varying data sources (Lakehouse, Files, SharePoint, Onedrive etc.) would be fantastic.

  • Based on my tests, I am 99% sure I can build this as a Python function in Excel. In a notebook, you could use sparkSQL but what if you just published this AI Skills and user could just enter this as a function in an Excel cell and get the results? Amazing, right !!!

Notes

  • Take a look at the SQL query. There two columns in the table FirstName and LastName. I did not instruct the model to combine these two columns. My table has had just name, NL2SQL model combined it to mean Name. I did add additional context that unit price and quantity are varchar in the table and sales amount, order amount, revenue should be calculated as quantity * unit price.

  • This was more of an experiment than anything else. I am super excited for all the applications and opportunities that are possible.

  • I had to add instructions and give several SQL examples as few shot to ensure I get the response I want.

  • Here it is assumed that the user will systematically validate AI Skills before publishing. I will write more about it soon. It's a heavy and important topic so need to find more time.

  • I am not sure if I can batch the API calls to reduce latency, if anyone knows, let me know.

  • I would like to see an AI Skills Hub or Library in Fabric for discoverability. The hub can have all the published AI Skills I have access to, lineage of data sources used, example use cases, prompts used, examples, examples of where it doesn't work (based on developer's or user's submissions), validation etc. Making it open, discoverable, auditable and transparent will help gain trust and adoption.

  • I have few more experiments & applications in mind, stay tuned!

Did you find this article valuable?

Support Sandeep Pawar by becoming a sponsor. Any amount is appreciated!