# Extracting Fabric Spark Driver Logs Using API

I am back from another successful Fabric Conference. Hundreds of new features have been released. Marc Reguera has an excellent [list of all the blogs published](https://www.linkedin.com/pulse/all-published-blogs-from-fabcon-marc-reguera-raxec/?trackingId=0VqW5vOuzs%2BqJ42wmcvnfw%3D%3D). However, not everything made it to the official blog. One feature I was eagerly anticipating is the API to get Spark driver logs. Previously, you could save the logs to [Log Analytics, a blog, or an Eventhouse](https://blog.fabric.microsoft.com/en-US/blog/announcing-the-fabric-apache-spark-diagnostic-emitter-collect-logs-and-metrics/), but there was no API to access the logs—until now. In this blog, I use semantic link to get the latest log for a PySpark notebook. You can use a similar approach for an SJD. Also, below I use user identity, but the API supports SPN. You can read my [last blog](https://fabric.guru/using-service-principal-authentication-with-fabricrestclient) to learn how to use SPN authentication with `FabricRestClient`. Semantic Link is optional, you can use `request` or any other method to call the API, including the brand new [Fabric CLI](https://microsoft.github.io/fabric-cli/).

Read the official documentation for details, requirements and limitations : [Get Spark driver logs using Spark monitoring APIs. - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-engineering/driver-log)

## Get the session info

Get the latest Livy id and Application id of a notebook. If you want to extract all the logs, modify the function (see the commented part below).

```python
## Python notebook or Pyspark notebook

import sempy.fabric as fabric
from sempy.fabric.exceptions import FabricHTTPException
import re
import json
import time

def get_latest_session_info(notebook_id, workspace_id):
    """
    By: Sandeep Pawar | fabric.guru | Apr 06, 2025

    Get the latest Livy session and application ID for a specific notebook.   
    Returns a dictionary with livyId and applicationId if found

    Read the API documentation for requirements, limitations and details.
    """
    client = fabric.FabricRestClient()
    workspace_id = fabric.resolve_workspace_id(workspace_id)    
    try:
        url = f"v1/workspaces/{workspace_id}/items/{notebook_id}"
        response = client.get(url)
        
        if response.status_code != 200:
            print(f"Error getting notebook info: {response.status_code}")
            return None
            
        notebook_info = response.json()
        print(f"Found notebook: {notebook_info.get('displayName', 'Unknown')}")
    except Exception as e:
        print(f"Error accessing notebook: {str(e)}")
        return None

    try:
        # different endpoints to get the session info
        urls = [
            f"v1/workspaces/{workspace_id}/notebooks/{notebook_id}/livySessions",
            f"v1/workspaces/{workspace_id}/notebooks/{notebook_id}/activeSessions"
        ]
        
        for url in urls:
            try:
                print(f"Trying endpoint: {url}")
                response = client.get(url)
                
                if response.status_code == 200:
                    sessions = response.json().get('value', [])
                    if sessions:
                        print(f"Found {len(sessions)} sessions")
                        ## GET THE LATEST LOG ONLY ##
                        ## IF YOU WANT ALL LOGS, SAVE ALL SESSIONS ##                             
                        if 'creationTime' in sessions[0]: 
                            sessions.sort(key=lambda x: x.get('creationTime', ''), reverse=True)
                        
                        latest_session = sessions[0]
                        livy_id = latest_session.get('livyId')
                        app_id = latest_session.get('applicationId')
                        
                        if not app_id:
                            session_str = json.dumps(latest_session)
                            app_match = re.search(r'application_\d+_\d+', session_str)
                            if app_match:
                                app_id = app_match.group(0)
                                
                        if livy_id and app_id:
                            print(f"Found livyId: {livy_id}")
                            print(f"Found applicationId: {app_id}")
                            return {"livyId": livy_id, "applicationId": app_id}
            except Exception as e:
                print(f"Error {url}: {str(e)}")
                continue
    except Exception as e:
        print(f"Error: {str(e)}")
    
    print("Could not find livy session and application ID info")
    return None
```

## Get the driver log

Below function retrieves the logs based on the application id found above.

```python
## Python notebook or Pyspark notebook

def get_driver_logs(notebook_id, workspace_id, livy_id=None, app_id=None, file_name="stderr"):
    """
    By: Sandeep Pawar | fabric.guru | Apr 06, 2025

    Get driver logs for a pyspark notebook using the API. Returns log data or metadata
    Read the API documentation for requirements, limitations and details.

    """
    workspace_id = fabric.resolve_workspace_id(workspace_id)
    
    if not livy_id or not app_id:
        session_info = get_latest_session_info(notebook_id, workspace_id)
        if not session_info:
            return None
            
        livy_id = session_info.get('livyId')
        app_id = session_info.get('applicationId')
    
    client = fabric.FabricRestClient()
        
    try:
        meta_url = f"v1/workspaces/{workspace_id}/notebooks/{notebook_id}/livySessions/{livy_id}/applications/{app_id}/logs?type=driver&meta=true&fileName={file_name}"
        print(f"Getting log metadata: {meta_url}")
        meta_response = client.get(meta_url)
        
        if meta_response.status_code != 200:
            print(f"Error: {meta_response.status_code}")
            return None
            
        log_meta = meta_response.json()
        print(f"Successfully retrieved metadata")
        
        # logs
        content_url = f"v1/workspaces/{workspace_id}/notebooks/{notebook_id}/livySessions/{livy_id}/applications/{app_id}/logs?type=driver&fileName={file_name}"
        print(f"Getting logs: {content_url}")
        content_response = client.get(content_url)
        
        if content_response.status_code != 200:
            print(f"Error: {content_response.status_code}")
            return {"metadata": log_meta}
            
        log_content = content_response.text
        return {"metadata": log_meta, "content": log_content}
    except Exception as e:
        print(f"Error: {str(e)}")
        return None
```

## Example:

```python
notebook_id = "948220f4-9f6f-49eb-b2ab-e518a42ead3a"  #notebookid, you can use labs to resolve name if needed
workspace_id = "Sales" #either workspace name or workspace id
latest_log = get_driver_logs(notebook_id, workspace_id)['content'] #either 'metadata' or 'content'
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1743992660974/18fe7d60-22e8-4e53-969c-a1783c7d49d8.png align="center")

## Use case

What do we do with the logs? Many use cases - monitoring, optimization, resource management… so many that I could probably write a new blog every week for the next couple of months. Below is one example (I will write a separate blog on this) - here I identify if there was a [fallback](https://learn.microsoft.com/en-us/fabric/data-engineering/native-execution-engine-overview?tabs=sparksql#fallback-mechanism) in [Native Execution Engine](https://fabric.guru/eli5-what-is-native-execution-engine-in-fabric) and the reasons. (Velox doesn’t support JSON so below fallback is expected).

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1743992897751/8e50fb9a-84ed-45be-9137-4e0623971009.png align="center")

Here are other APIs : [Spark monitoring APIs to get Spark application details - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-engineering/spark-monitoring-api-overview) . Above I showed the driver log API but similarly you can get the executor logs for I/O details, resource consumption etc.

More blogs to come, stay tuned !
