Checking If Delta Table in Fabric is V-order Optimized
V-order is highly recommended for optimal Delta & Direct Lake performance in Fabric. Here I show how to check if the table has been V-order optimized.
Table of contents
V-Order
V-order is a write time optimization to the parquet file format. When the delta table is created using any of the Fabric engines (Dataflow Gen2, Spark notebooks, Pipelines, DWH), Delta tables are automatically are V-Order'd. This not only helps with size of the table but can significantly improve Direct Lake dataset read performance. While it's not required, Delta tables with V-order are highly recommended for fastest Direct Lake and Delta read performance. For more ob V-order, read this official article.
However, there is no direct way to identify if a table already has V-order or not. There three ways to check but let me show two easy ways. I will cover the third in a future blog post. When a Delt atable is created, the transaction logs have a metadata property related to V-order.
I created two Delta tables in a Fabric lakehouse, one with and the other without the V-order by changing the spark configuration.
Manually
You can manually inspect the transaction logs in the Lakehouse by going to Lakehouse > Table name > Right Click > View Files > _delta_log
, inspect the latest .json file. In the transaction logs you will either see VORDER
set to false
or missing if the table is not V-Order'd. If the Delta table is V-order, the property is set to true
.
Programmatically
If you have several Delta tables, many transaction logs or if you need to check it as a part of your validation process for DataOps, you can use pyarrow
to check the schema metadata. This only checks the metadata so the table is not required to be read. Below is the Python code I used:
def check_vorder(table_name_path):
'''
Author: Sandeep Pawar | fabric.guru | Jun 6, 2023
Provide table_name_path as '//lakehouse/default/Tables/<table_name>'
If the Delta table is V-ordered, returns true; otherwise, false.
You must first mount the lakehouse to use the local filesystem API.
'''
import os
if not os.path.exists(table_name_path):
print(f'{os.path.basename(table_name_path)} does not exist')
result = None # Initialize the variable with a default value
else:
import pyarrow.dataset as ds
schema = ds.dataset(table_name_path).schema.metadata
is_vorder = any(b'vorder' in key for key in schema.keys())
if is_vorder:
result = str(schema[b'com.microsoft.parquet.vorder.enabled'])
else:
result = "Table is not V-ordered"
return result
There is another robust method using Spark that can be used to detect parquet files in the Delta table that are non V-Order'd but I will cover that in a future blog post.