# Extracting Spark Event Logs in Fabric for Monitoring & Optimization

I wrote [a blog a couple of weeks ago](https://fabric.guru/extracting-fabric-spark-driver-logs-using-api) about extracting Spark driver logs using REST API. In this blog, I will share how to call APIs to get the spark event logs, parse the logs for spark performance metrics. This can be used for debugging, optimizing & monitoring spark applications in Fabric. Note that in this case I am retrieving the logs for an executed application. If you want to do real time monitoring, use the [spark emitter](https://blog.fabric.microsoft.com/en-us/blog/announcing-the-fabric-apache-spark-diagnostic-emitter-collect-logs-and-metrics/) + Eventstream instead. [Workspace Monitoring](https://learn.microsoft.com/en-us/fabric/fundamentals/workspace-monitoring-overview) in Fabric does not *yet* include spark logs.

### Steps:

* Get application id and livy id of the application (notebook or SJD)
    
* Retrieve event log zip and save it to a lakehouse
    
* Extract the event log JSON and save it to a lakehouse
    
* Parse the event log for spark metrics
    

## Get Session Info

Use the `get_latest_session_info()` function from my [previous blog](https://fabric.guru/extracting-fabric-spark-driver-logs-using-api). This will give you the **latest** application id and the livy id which will be required for further API calls. Note that if you want to get ids for all previous sessions modify the function accordingly.

```python
## Fabric Python or Pyspark notebook
## Function from https://fabric.guru/extracting-fabric-spark-driver-logs-using-api

notebook_id = "996b5d64-xxxxxxxx-ff2ba76d0fc8"
workspace_id = fabric.get_notebook_workspace_id() #or replace with your workspace id
session = get_latest_session_info(notebook_id, workspace_id)
livy_id = session['livyId']
app_id = session['applicationId']
```

## Retrieve Spark Event Log

Spark event log can be several hundred MBs or even GBs so attach a default lakehouse to save the log zip file to a lakehouse.

```python
import os
import zipfile
import glob
import sempy.fabric as fabric

def get_spark_event_log(notebook_id, workspace_id, output_path, livy_id=None, app_id=None):
    """
    Sandeep Pawar | fabric.guru | May 19,2025
    Gets Spark event logs and saves to specified path in an attached lakehouse.
    """
    if not livy_id or not app_id:
        session_info = get_latest_session_info(notebook_id, workspace_id)
        if not session_info:
            return "Error: Could not retrieve session info"
        livy_id = session_info.get('livyId')
        app_id = session_info.get('applicationId')
    
    if os.path.isdir(output_path) or output_path.endswith('/'):
        os.makedirs(output_path, exist_ok=True)
        output_path = os.path.join(output_path, f"spark_log_{app_id}.zip")
    else:
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
    
    client = fabric.FabricRestClient()
    #refer to https://learn.microsoft.com/en-us/rest/api/fabric/spark/livy-sessions for API details
    #/1/ below is for the first try
    endpoint = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/notebooks/{notebook_id}/livySessions/{livy_id}/applications/{app_id}/1/logs"
    
    try:
        print(f"Retrieving logs from endpoint: {endpoint}")
        response = client.get(endpoint)
        
        if response.status_code != 200:
            return f"Error: Received status code {response.status_code}. Response: {response.text}"
        
        with open(output_path, "wb") as f:
            f.write(response.content)
        
        file_size = len(response.content)
        return f"Successfully saved event logs ({file_size/1024/1024:.2f} MB) to {output_path}"
    
    except Exception as e:
        return f"Error retrieving logs: {str(e)}"
```

## Extract Event Log

The zip file contains the event log which needs to be extracted. Below functions unzips the file and save it to the specified path in the lakehouse.

```python
## Be sure to attach a default lakehouse
# I assumed it's .zip, adjust if other zip compression in the future
def unzip_spark_log(zip_path, extract_path):
    """
    Sandeep Pawar | fabric.guru | May 19, 2025
    Extracts Spark event log zip to specified directory. 
  
    """
    if os.path.isdir(zip_path):
        zip_files = glob.glob(os.path.join(zip_path, "*.zip"))
        if not zip_files:
            return f"Error: {zip_path} is a directory and no zip files were found within it"
        zip_path = zip_files[0]
        print(f"Using zip file: {zip_path}")
    
    if not os.path.exists(zip_path):
        return f"Error: Zip file {zip_path} doesn't exist"
    
    os.makedirs(extract_path, exist_ok=True)
    
    try:
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            file_list = zip_ref.namelist()
            zip_ref.extractall(extract_path)
            
        return f"Extracted {len(file_list)} files to {extract_path}: {', '.join(file_list)}"
        
    except Exception as e:
        return f"Error extracting logs: {str(e)}"
```

Example:

```python
##python or pyspark notebook
output_path = "/lakehouse/default/Files/rawlogs"
get_spark_event_log(notebook_id, workspace_id,output_path , livy_id=livy_id, app_id=app_id) #saves the zip file
unzip_spark_log(output_path, output_path) #unzips the zip file
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1747715798186/c4ff546e-6e8c-4a11-af7c-cbfe6cb8b2c1.png align="center")

## Parse log for spark metrics:

Parse the event log JSON to get the performance metrics. Below I am extracting the stage metrics across all the stages. From this you can get metrics like data spill, GC time, shuffle read/write, CPU time, idle time, memory used etc which can help you optimize the spark application (more on this later).

```python
##pyspark notebook
%%pyspark
df = spark.read.json(f"Files/rawlogs/application_xxxxxxx")
df1 = df.filter("Event='SparkListenerStageCompleted'").select("`Stage Info`.*")
df1.createOrReplaceTempView("t2")
df2 = spark.sql("select 'Submission Time','Completion Time', 'Number of Tasks', 'Stage ID', t3.col.* from t2 lateral view explode(Accumulables) t3")
df2.createOrReplaceTempView("t4")
result = spark.sql("select Name, sum(Value) as value from t4 group by Name order by Name")
display(result)
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1747716080783/3257cd75-a60e-49b7-9451-9c462c8ce7ac.png align="center")

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1747716130137/2bcb4dbb-c5ca-477c-8b51-ad6cfc2d0220.png align="center")

Similar to stage-level, you can also get task-level metrics, or aggregate metrics for all jobs in a workspace etc. There are other couple of interesting APIs which I will cover in future blogs.

I will share more on how to interpret these metrics and how they can provide insights into the application. Note that Fabric offers many built-in [monitoring capabilities](https://learn.microsoft.com/en-us/fabric/data-engineering/spark-monitoring-overview). APIs give you the ability to access detailed metrics and create additional custom metrics.

You can download the event log manually as well from the spark history server:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1747755139148/8793f06c-74ae-4dfc-a56a-c4ffb912e6a5.png align="center")

I would like to thank [Jenny Jiang from Microsoft](https://www.linkedin.com/in/jenny-jiang-8b57036/) for answering my questions.

## References:

* [Extracting Fabric Spark Driver Logs Using API](https://fabric.guru/extracting-fabric-spark-driver-logs-using-api)
    
* [Announcing the Fabric Apache Spark Diagnostic Emitter: Collect Logs and Metrics | Microsoft Fabric Blog | Microsoft Fabric](https://blog.fabric.microsoft.com/en-us/blog/announcing-the-fabric-apache-spark-diagnostic-emitter-collect-logs-and-metrics/)
    
* [Livy Sessions - REST API (Spark) | Microsoft Learn](https://blog.fabric.microsoft.com/en-us/blog/announcing-the-fabric-apache-spark-diagnostic-emitter-collect-logs-and-metrics/)
    
* [Apache Spark monitoring overview - Microsoft Fabric | Microsoft L](https://blog.fabric.microsoft.com/en-us/blog/announcing-the-fabric-apache-spark-diagnostic-emitter-collect-logs-and-metrics/)[earn](https://learn.microsoft.com/en-us/fabric/data-engineering/spark-monitoring-overview)
    
* [https://github.com/LucaCanali/sparkMeasure](https://github.com/LucaCanali/sparkMeasure)
    
* [Monitoring and Instrumentation - Spark 3.5.5 Documentation](https://spark.apache.org/docs/latest/monitoring.html)
    
* [How to explore Apache Spark metrics with Spark listeners - Databricks](https://kb.databricks.com/metrics/explore-spark-metrics)
    
* [groupon/spark-metrics: A library to expose more of Apache Spark's metrics system](https://github.com/groupon/spark-metrics)
    
* [Workspace monitoring overview - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/fundamentals/workspace-monitoring-overview)
