What’s Photon ?
Photon is a vectorized query engine written in C++ that leverages data and instruction-level parallelism available in CPUs.
It’s 100% compatible with Apache Spark APIs which means you don’t have to rewrite your existing code ( SQL, Python, R, Scala) to benefit from its advantages.
Photon is an ANSI compliant Engine, it was primarily focused on SQL but now the scope is much larger, with more ingestion sources, formats, APIs and methods since the launch.
What are the advantages of Photon ?
1)Cheaper and Faster
Built from the ground up for the fastest performance at lower cost. It provides up to 80% TCO savings while accelerating data and analytics workloads up to 12X speedups
2) Built for all use cases
Photon is the first engine that enables Data teams to standardize on one set of APIs for all workloads
How Does Photon Help to Optimize the Costs ?
By Making the queries running faster, you will spend less on the VMs cost.
By Accelerating your time to market, your product will be quickly available to your customers.
How Can I Enable Photon ?
Photon is activated by Default for SQL Warehouses
Photon is activated by Default for Clusters
How Can I find all the functions that are supported by Photon ?
You can write the following Scala code to get the list of all the available functions supported by Photon.
To make sure to benefit from the latest functions, you need to make sure to be on the latest runtimes.
What are the operations that are highlighted in yellow ?
If a function is vectorized and executed by Photon it’s highlighted in yellow in the DAG
What happens if photon is enabled for my cluster and I run an unsupported function ?
Features not supported by Photon run the same way they would with Databricks Runtime.
Where Can I Find the Photon Paper ?
You can read the paper over here https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf
- Apache Spark was awarded the SIGMOD Systems Award
- Databricks Photon was awarded the Best Industry paper Award
Bonus : to get more information feel free to watch Simon Whitley’s video
Useful Links:
Databricks Sets official Data Warehousing Performance Record : https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html
Photon product : https://www.databricks.com/product/photon
Photon Documentation : https://docs.databricks.com/runtime/photon.html
Radical Speed on the Lakehouse Photon under the hood