Number of questions : 45
Type of questions : Multiple choice questions
Duration : 90 Min
Passing score : 70%
Where to register for the certification : https://www.webassessor.com/databricks
Expiration : 2 years
Topics covered :
- Databricks Machine Learning
- ML workflows
- Spark ML
- Scaling ML Models
Practice tests: No practice exams are available yet.
How to prepare for the certification:
Complete the Scalable Machine Learning With Apache Spark( Github repos)
Implementing MLOps In Databricks Lakehouse ( Link)
Getting started with Databricks Machine Learning ( Link)
Features you should know before taking the exam:
Databricks Runtime for Machine Learning
Feature Engineering with Sckit Learn
Feature engineering with MlLib
Additional resources :
Architecting Mlops on The Lakehouse
Build Reliable Production Data and ML Pipelines With Git Support
Automate your Data and ML Workflows With Github Actions for Databricks
Save Time on Data and ML Workflows with Repair and Rerun
Minimally Qualified Candidate :
- Use Databricks Machine Learning and its capabilities within machine learning workflows, including:
- Databricks Machine Learning (clusters, Repos, Jobs)
- Databricks Runtime for Machine Learning (basics, libraries)
- AutoML (classification, regression, forecasting)
- Feature Store (basics)
- MLflow (Tracking, Models, Model Registry)
- Implement correct decisions in machine learning workflows, including:
- Exploratory data analysis (summary statistics, outlier removal)
- Feature engineering (missing value imputation, one-hot-encoding)
- Tuning (hyperparameter basics, hyperparameter parallelization)
- Evaluation and selection (cross-validation, evaluation metrics)
- Implement machine learning solutions at scale using Spark ML and other tools, including:
- Distributed ML Concepts
- Spark ML Modeling APIs (data splitting, training, evaluation, estimators vs. transformers, pipelines)
- Hyperopt
- Pandas API on Spark
- Pandas UDFs and Pandas Function APIs
- Understand advanced scaling characteristics of classical machine learning models, including:
- Distributed Linear Regression
- Distributed Decision Trees
- Ensembling Methods (bagging, boosting)
Article written by Youssef Mrini