Number of questions : 60
Type of questions : Multiple choice questions
Duration : 120 Min
Passing score : 70%
Where to register for the certification : https://www.webassessor.com/databricks
Expiration : 2 years
Topics covered :
- Databricks tooling
- Data Processing
- Data Modeling
- Security and governance
- Monitoring and logging
- Testing and deployment
Practice tests: No practice exams are available yet.
How to prepare for the certification:
Complete The Advanced Data Engineer professional ( Databricks Academy)
Complete The Advanced Data Engineering Notebooks ( Strongly recommended)
Read the databricks documentation (recommended)
Features you should know before taking the exam:
Delta Optimization ( Optimize/Zorder, AutoOptimize)
Delta Lake ( Time Travel, Merge, Optimization, CTAs, Insert)
Managed and External Delta Tables
Delta Live Tables (DLT + Autoloader)
Structured Streaming ( Watermarking + Windowing + Joins)
Incremental processing ( Autoloader, Copy Into)
Slowly Changing Dimension
Additional resources :
Data Engineering with Databricks Session 1
Minimally Qualified Candidate :
The minimally qualified candidate should be able to:
- Understand how to use and the benefits of using the Databricks platform and its tools, including:
- Platform (notebooks, clusters, Jobs, Databricks SQL, relational entities, Repos)
- Apache Spark (PySpark, DataFrame API, basic architecture)
- Delta Lake (SQL-based Delta APIs, basic architecture, core functions)
- Databricks CLI (deploying notebook-based workflows)
- Databricks REST API (configure and trigger production pipelines)
- Build data processing pipelines using the Spark and Delta Lake APIs, including:
- Building batch-processed ETL pipelines
- Building incrementally processed ETL pipelines
- Optimizing workloads
- Deduplicating data
- Using Change Data Capture (CDC) to propagate changes
- Model data management solutions, including:
- Lakehouse (bronze/silver/gold architecture, databases, tables, views, and the physical layout)
- General data modeling concepts (keys, constraints, lookup tables, slowly changing dimensions)
- Build production pipelines using best practices around security and governance, including:
- Managing notebook and jobs permissions with ACLs
- Creating row- and column-oriented dynamic views to control user/group access
- Securely storing personally identifiable information (PII)
- Securely delete data as requested according to GDPR & CCPA
- Configure alerting and storage to monitor and log production jobs, including:
- Setting up notifications
- Configuring SparkListener
- Recording logged metrics
- Navigating and interpreting the Spark UI
- Debugging errors
- Follow best practices for managing, testing and deploying code, including:
- Managing dependencies
- Creating unit tests
- Creating integration tests
- Scheduling Jobs
- Versioning code/notebooks
- Orchestration Jobs
Article written by Youssef Mrini