Cross-referencing Notebooks In The Updated Fabric Notebook Copilot

At FabCon Atlanta last week, the updated notebook Copilot for data engineering and data science was announced. It brings agentic capabilities to the Copilot and is much more intelligent and Fabric-aware than the previous version. You can read the documentation here. For example, you can now do and ask the Copilot following things which you couldn't previously:

Use the Copilot without starting a session. Just open the notebook, Copilot and start asking questions and making changes. Saves you CUs.
list items in the workspace : list all the lakehouses in this workspace
take actions : mount the and lakehouses and make the default lakehouse
get ABFSS paths : give me abfs path of lakehouse <lakehouse_1>.
create spark pool configurations using %%configure : add configuration to use 4 cores
refer to content in a cell by cell # : explain cell 11,

Give it a try and you will be surprised how well it works.

But one of my most favorite features is being able to read and refer to other notebooks. For example, I can ask the Copilot to read notebook_1 from the same workspace. Think of the implications for a second. Below is one example, how this can be helpful.

Cross-referencing notebooks

In a Fabric workspace I created a notebook with a markdown that includes rules from Palantir PySpark style guide. This style guide is an opinionated guide to PySpark code style for common situations and the associated best practices based on the most frequent recurring topics across the PySpark. Below is a summarized version in a markdown:

PySpark Style Guide

Purpose: This notebook is a style contract for AI-assisted code generation and review. When referenced from another notebook (e.g., refer to @pyspark_style_guide), you MUST apply every rule below to all PySpark code produced or reviewed in that session.

Adapted from Palantir PySpark Style Guide (MIT License).

VERSIONS

Use features and API supported by following versions:

Spark 3.5

Delta 3.2

Python 3.11

Enforcement Checklist

When reviewing or generating PySpark code, walk through each check below in order. Flag every violation found. Do not skip checks.

# Check What to look for

C1 Imports Any bare from pyspark.sql.functions import ... or alias other than F, T, W

C2 Column access Any df.colName dot-access outside of a join on= clause

C3 String column refs Any F.col('x') that could just be 'x' (Spark 3.0+)

C4 Variable names Any single-letter dataframe names (df, o, d, t)

C5 Magic values Any literal string, number, or threshold inline in filter, when, withColumn, select that is not a named constant

C6 Select contract More than one function per column in a select, or a .when() expression inside a select

C7 withColumnRenamed Any use. Replace with select + .alias()

C8 Empty columns Any lit(''), lit('NA'), lit('N/A'). Must be lit(None)

C9 Logical density More than 3 boolean expressions in a single .filter() or F.when() without named variables

C10 Chain length More than 5 chained statements in one block

C11 Chain mixing Joins, filters, withColumn, and selects mixed in the same chain

C12 Join hygiene Any .join() missing explicit how=

C13 Right joins Any how='right'. Swap df order, use left

C14 Window frames Any Window.partitionBy(...).orderBy(...) without explicit .rowsBetween() or .rangeBetween()

C15 Window nulls F.first() or F.last() without ignorenulls=True

C16 Global windows Empty W.partitionBy() or window without orderBy used for aggregation. Use .agg() instead

C17 Otherwise fallback .otherwise(<catch-all value>) masking unexpected data. Use None or omit

C18 Line continuation Any \ for multiline. Wrap in parentheses instead

C19 UDFs Any @udf or F.udf(). Rewrite with native functions

C20 Comments Comments that describe what code does instead of why a decision was made

C21 Dead code Commented-out code blocks. Remove them

C22 Function size Functions over ~70 lines or files over ~250 lines

Anti-Patterns (find and fix these)

Each pattern below is a regex-like signature. If you see it, it is a violation.

AP1: Bare function imports
# VIOLATION: any of these
from pyspark.sql.functions import col, when, sum, lit
import pyspark.sql.functions as func

# FIX: always
from pyspark.sql import functions as F
from pyspark.sql import types as T
from pyspark.sql import Window as W
AP2: Dot-access column references
# VIOLATION: df.column_name anywhere except join on=
df.select(df.order_id, df.amount)
df.withColumn('x', df.price * df.qty)

# FIX: use string refs
df.select('order_id', 'amount')
df.withColumn('x', F.col('price') * F.col('qty'))
AP3: Inline magic values
# VIOLATION: bare literals in logic
df.filter(F.col('amount') > 500)
F.when(F.col('status') == 'shipped', 'In Transit')
df.filter(F.col('days') < 365)

# FIX: named constants at top of cell/function
HIGH_VALUE_THRESHOLD = 500
STATUS_SHIPPED = 'shipped'
LABEL_IN_TRANSIT = 'In Transit'
ONE_YEAR_DAYS = 365

df.filter(F.col('amount') > HIGH_VALUE_THRESHOLD)
F.when(F.col('status') == STATUS_SHIPPED, LABEL_IN_TRANSIT)
df.filter(F.col('days') < ONE_YEAR_DAYS)
AP4: Complex logic inside .when() or .filter()
# VIOLATION: more than 3 conditions inline
df.filter(
    (F.col('a') == 'x') & (F.col('b') > 10) & (F.col('c') != 'y')
    & ((F.col('d') == 'online') | (F.col('d') == 'partner'))
)

# FIX: named boolean expressions, max 3 in the final filter
is_valid_type = (F.col('a') == TYPE_X)
above_threshold = (F.col('b') > MIN_THRESHOLD)
not_excluded = (F.col('c') != EXCLUDED_STATUS)
is_target_channel = (F.col('d') == CHANNEL_ONLINE) | (F.col('d') == CHANNEL_PARTNER)

flagged = is_valid_type & above_threshold & not_excluded & is_target_channel
df.filter(flagged)
AP5: .when() inside select
# VIOLATION: conditional logic embedded in select
df.select(
    'order_id',
    F.when(F.col('status') == 'shipped', 'In Transit')
     .when(F.col('status') == 'delivered', 'Complete')
     .alias('status_label'),
)

# FIX: select plain columns, then withColumn for derived logic
df = df.select('order_id', 'status')
df = df.withColumn(
    'status_label',
    F.when(F.col('status') == STATUS_SHIPPED, LABEL_IN_TRANSIT)
     .when(F.col('status') == STATUS_DELIVERED, LABEL_COMPLETE)
)
AP6: Empty column sentinels
# VIOLATION
df.withColumn('notes', F.lit(''))
df.withColumn('review_date', F.lit('N/A'))

# FIX
df.withColumn('notes', F.lit(None))
df.withColumn('review_date', F.lit(None))
AP7: Missing window frame
# VIOLATION: implicit frame
w = W.partitionBy('customer_id').orderBy('order_date')

# FIX: always explicit
w = (W.partitionBy('customer_id')
      .orderBy('order_date')
      .rowsBetween(W.unboundedPreceding, 0))
AP8: Blanket .otherwise()
# VIOLATION: masks unexpected values
F.when(..., 'A').when(..., 'B').otherwise('Unknown')

# FIX: omit otherwise (returns null) or use lit(None) explicitly
F.when(..., 'A').when(..., 'B')
AP9: Monster chains
# VIOLATION: mixed concerns, too long
df = (df.select(...).filter(...).withColumn(...).join(...).drop(...).withColumn(...))

# FIX: separate by concern, max 5 per block
df = (
    df
    .select(...)
    .filter(...)
)
df = df.withColumn(...)
df = df.join(..., how='inner')
AP10: Backslash continuation
# VIOLATION
df = df.filter(F.col('a') == 'x') \
       .filter(F.col('b') > 10)

# FIX: parentheses
df = (
    df
    .filter(F.col('a') == 'x')
    .filter(F.col('b') > 10)
)
Quick Reference (for code generation)

When writing new code, apply these defaults:

Imports: F, T, W only

Columns: string refs where possible, F.col() when needed

Descriptive df names: orders_df, active_orders, not df, o

Constants: every literal in logic gets a SCREAMING_SNAKE name

Selects: plain columns + one transform each, no .when() inside

Chains: max 5 lines, group by concern (filter/select, then enrich, then join)

Joins: always how=, always left not right, alias for disambiguation

Windows: always explicit frame, always ignorenulls=True on first/last

Empty cols: F.lit(None), never lit('') or lit('NA')

No UDFs, no .otherwise() fallbacks, no \ continuations

Comments explain why, not what. No commented-out code.

#	Check	What to look for
C1	Imports	Any bare `from pyspark.sql.functions import ...` or alias other than `F`, `T`, `W`
C2	Column access	Any `df.colName` dot-access outside of a join `on=` clause
C3	String column refs	Any `F.col('x')` that could just be `'x'` (Spark 3.0+)
C4	Variable names	Any single-letter dataframe names (`df`, `o`, `d`, `t`)
C5	Magic values	Any literal string, number, or threshold inline in `filter`, `when`, `withColumn`, `select` that is not a named constant
C6	Select contract	More than one function per column in a `select`, or a `.when()` expression inside a `select`
C7	withColumnRenamed	Any use. Replace with `select` + `.alias()`
C8	Empty columns	Any `lit('')`, `lit('NA')`, `lit('N/A')`. Must be `lit(None)`
C9	Logical density	More than 3 boolean expressions in a single `.filter()` or `F.when()` without named variables
C10	Chain length	More than 5 chained statements in one block
C11	Chain mixing	Joins, filters, withColumn, and selects mixed in the same chain
C12	Join hygiene	Any `.join()` missing explicit `how=`
C13	Right joins	Any `how='right'`. Swap df order, use `left`
C14	Window frames	Any `Window.partitionBy(...).orderBy(...)` without explicit `.rowsBetween()` or `.rangeBetween()`
C15	Window nulls	`F.first()` or `F.last()` without `ignorenulls=True`
C16	Global windows	Empty `W.partitionBy()` or window without `orderBy` used for aggregation. Use `.agg()` instead
C17	Otherwise fallback	`.otherwise(<catch-all value>)` masking unexpected data. Use `None` or omit
C18	Line continuation	Any `\` for multiline. Wrap in parentheses instead
C19	UDFs	Any `@udf` or `F.udf()`. Rewrite with native functions
C20	Comments	Comments that describe what code does instead of why a decision was made
C21	Dead code	Commented-out code blocks. Remove them
C22	Function size	Functions over ~70 lines or files over ~250 lines

I named the notebook PYSPARK_STYLE_GUIDE. It's all caps intentionally (more on this later).
In another notebook, which already has some PySpark code , I opened Copilot.
Asked : List all notebooks in this workspace. I can see the PYSPARK_STYLE_GUIDE notebook:

My notebook has one cell with large code block (intentional). I prompted Copilot :

refer to @PYSPARK_STYLE_GUIDE and fix the code without losing the function and purpose

https://youtu.be/gtj_f7oBeuk

💡

As with anything AI, be sure to always back-up, test and verify.

Copilot read the style notebook and applied the rules to the cells in this notebook. You could also use this to extract code patterns from other notebooks, e.g. how did <notebook_name> ingested the data, use the same library as <notebook_name> to create ML features etc. Super handy.

Your BI/DE/DS team could also create reference pattern notebooks, and refer them for driving consistency and quality. Note that you can list items in another workspace but can't refer cross-workspace.

This was for Copilot in Fabric notebook. In an upcoming blog, I will share how I use Skills for Fabric for development.

Cross-referencing Notebooks In The Updated Fabric Notebook Copilot

Cross-referencing notebooks

PySpark Style Guide

VERSIONS

Enforcement Checklist

Anti-Patterns (find and fix these)

AP1: Bare function imports

AP2: Dot-access column references

AP3: Inline magic values

AP4: Complex logic inside .when() or .filter()

AP5: .when() inside select

AP6: Empty column sentinels

AP7: Missing window frame

AP8: Blanket .otherwise()

AP9: Monster chains

AP10: Backslash continuation

Quick Reference (for code generation)

Reference:

Comments

More from this blog

RAG in Fabric Notebook Using Microsoft Harrier Multilingual Text Embedding Model

Programmatically Retrieve Prep Data For AI Configuration of Semantic Models

Programmatically Comparing Draft vs Production Fabric Data Agent Responses

Monitoring Power BI Modeling MCP Server Usage and Adoption

Command Palette

Cross-referencing notebooks

PySpark Style Guide

VERSIONS

Enforcement Checklist

Anti-Patterns (find and fix these)

AP1: Bare function imports

AP2: Dot-access column references

AP3: Inline magic values

AP4: Complex logic inside .when() or .filter()

AP5: .when() inside select

AP6: Empty column sentinels

AP7: Missing window frame

AP8: Blanket .otherwise()

AP9: Monster chains

AP10: Backslash continuation

Quick Reference (for code generation)

Reference:

Comments

More from this blog