RAG in Fabric Notebook Using Microsoft Harrier Multilingual Text Embedding Model
Last week Microsoft released an open-source text embedding model called Harrier in three sizes- 270M, 0.6B and 27B. I have been testing it in my RAG pipeline and so far it has crushed all my metrics. It's currently the number 1 model on MTEBv2 leaderboard.
The 27B model is obviously too big for Fabric notebook but the 270M and 0.6B variant despite being small and extremely capable and fast even with CPU inferencing in Fabric Python notebook. For comparison, I have been using the text-embedding-ada-002 model in my pipeline with R1 score = 0.68, the 270M model topped that easily with R1 0.76 with lower latency. Bonus : no Fabric CU consumption (other than Python execution time). In fabric notebook, you can save the model in a lakehouse and load it for inferencing. The initial load, especially 0.6B model may take a while but the retrieval is fast.
Take a look at this sample notebook on how to operationalize it in Fabric.
snippets/harrier-ragbench-techqa-fabricguru.ipynb at main ยท pawarbi/snippets
Use
%%configureto upgrade the compute and attach a lakehouseDownload the model
Instantiate it
Load the data
Embed and retrive
Just for teh sake of testing, I also added a harrier + BM25 pipeline to improve retrieval, the R3 score improved to 99%