Turbocharge Azure Databricks with Photon powered Delta Engine
Published Sep 22 2020 08:00 AM 30.1K Views
Microsoft

Today we are excited to announce the preview of Photon powered Delta Engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x.

 

The need for faster insight

As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally.

 

Anum-R_0-1600731416096.png

 

Image: Hardware Trends​, 2010-2020

 

This leads to the question if CPUs have become the bottleneck, how can we achieve the next level of performance?  The answer with Photon lies in greater parallelism of CPU processing at both the data-level and instruction-level.

Introducing Photon powered Delta Engine

Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Written from the ground up in C++ to take advantage of modern hardware and capitalize on data-level and CPU instruction-level parallelism, this engine uses optimization techniques described in the paper MonetDB/X100: Hyper-Pipelining Query Execution.

 

Photon is one of the three key components of Delta Engine in addition to an improved query optimizer and a caching layer. Together, these three components accelerate performance for big data use cases such as data engineering, data science, machine learning, and data analytics.

 

 

Anum-R_1-1600731416140.png

 

 

Image: Delta Engine’s 3 components: 1) Query optimizer,2) Photon native execution engine and 3) Caching

Up to 20x faster performance

Azure Databricks was already blazing fast compared to Apache Spark, and now, the Photon powered Delta Engine enables even faster performance for modern analytics and AI workloads on Azure. We ran a 30TB test derived from a TPC-DS* industry-standard benchmark to measure the processing speed and found the Photon powered Delta Engine to be 20x faster than Spark 2.4.

 

blog 1.png

Image: 30TB Elapsed Times, Performance Comparison

Industry-leading Spark-based analytics & AI platform on Azure

 

With Azure Databricks, customers can set up an optimized Apache Spark environment in minutes. Native integration with Azure Active Directory and other Azure services such as Azure Synapse Analytics and Azure Machine Learning enables customers to build an end-to-end modern data warehouse, machine learning, and real-time analytics solutions.

 

Now with the preview of Photon powered Delta Engine, customers can benefit from the added performance boost to gain faster insights.

 

Blog 2.png

 

Get Started Today

Start today by requesting access to the Photon Preview here. Learn more about modern data engineering with Azure Databricks by attending a live event or viewing this webinar and ask your questions on our next Azure Databricks Office Hours.

 

*Since these are results of a test derived from TPC-DS, they may not be compared to published TPC-DS results.

Version history
Last update:
‎Oct 14 2020 09:53 PM