Introduction to Databricks Photon

What’s Photon ?

Photon is a vectorized query engine written in C++ that leverages data and instruction-level parallelism available in CPUs.

It’s 100% compatible with Apache Spark APIs which means you don’t have to rewrite your existing code ( SQL, Python, R, Scala) to benefit from its advantages.

Photon is an ANSI compliant Engine, it was primarily focused on SQL but now the scope is much larger, with more ingestion sources, formats, APIs and methods since the launch.

What are the advantages of Photon ?

1)Cheaper and Faster

Built from the ground up for the fastest performance at lower cost. It provides up to 80% TCO savings while accelerating data and analytics workloads up to 12X speedups

2) Built for all use cases

Photon is the first engine that enables Data teams to standardize on one set of APIs for all workloads

How Does Photon Help to Optimize the Costs ?

By Making the queries running faster, you will spend less on the VMs cost.

By Accelerating your time to market, your product will be quickly available to your customers.

How Can I Enable Photon ?

Photon is activated by Default for SQL Warehouses

Photon is activated by Default for Clusters

How Can I find all the functions that are supported by Photon ?

You can write the following Scala code to get the list of all the available functions supported by Photon.

To make sure to benefit from the latest functions, you need to make sure to be on the latest runtimes.

What are the operations that are highlighted in yellow ?

If a function is vectorized and executed by Photon it’s highlighted in yellow in the DAG

What happens if photon is enabled for my cluster and I run an unsupported function ?

Features not supported by Photon run the same way they would with Databricks Runtime.

Where Can I Find the Photon Paper ?

You can read the paper over here https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf

Apache Spark was awarded the SIGMOD Systems Award
Databricks Photon was awarded the Best Industry paper Award

Bonus : to get more information feel free to watch Simon Whitley’s video

Useful Links:

Databricks Sets official Data Warehousing Performance Record : https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html

Photon product : https://www.databricks.com/product/photon

Photon Documentation : https://docs.databricks.com/runtime/photon.html

Radical Speed on the Lakehouse Photon under the hood

How to pass the Databricks Platform Admin Accreditation?

How to pass the Associate Machine Learning Certification ?

How to pass the Associate Developer for Apache Spark certification?

How to pass the Associate Data Analyst Certification ?

How to pass the Professional Databricks Data Engineering certification ?

How to pass the Associate Databricks Data Engineering Certification ?

La data avec Youssef

Everything you need to know about Databricks / Tout ce qu'il faut connaitre sur Databricks

What’s new in Databricks for December 2023

What’s new in Databricks for November 2023

What’s new in Databricks for October 2023

What’s new in Databricks for September 2023

What’s new in Databricks for July 2023

What’s new in Databricks for June 2023

Introduction to Databricks Photon

Articles similaires

Laisser un commentaire Annuler la réponse

Partager :

Articles similaires

Laisser un commentaire Annuler la réponse