The Core Idea

In Databricks, compute is fully separated from storage. That means you always choose your compute — and the exam will test whether you choose correctly for a given scenario.

The decision comes down to four things: who is using it, what the workload is, what performance is needed, and what the governance constraints are.

Exam Mindset
Every question about compute is really a question about the workload. Map the scenario to the workload type first, then the compute choice becomes obvious.

The Three Compute Types

⚙️

Clusters

Spark compute — driver + workers. For engineering, notebooks, and ML.

"Spark processing"
📊

SQL Warehouses

Optimised for SQL queries and BI tools. For analysts and dashboards.

"BI / SQL analytics"
☁️

Serverless

Fully managed infra. For quick queries and ad-hoc work without setup.

"no infra management"

1. Clusters (All-Purpose & Job)

Clusters are Spark compute environments — a driver node plus workers. They're the backbone of data engineering and ML work.

All-purpose clusters are interactive, designed for notebooks and development. Job clusters spin up for a single job, then terminate automatically — cheaper and the right choice for production pipelines.

Use clusters when
Running PySpark ETL, streaming jobs, batch transformations, ML training, or any workload requiring custom libraries, fine-grained Spark control, or Unity Catalog governance.

2. SQL Warehouses (Databricks SQL)

SQL Warehouses are optimised purely for SQL queries and BI tools. They come in three flavours: Classic, Pro, and Serverless SQL Warehouse. The underlying compute is still Spark, but the interface and optimisation is all about query performance.

Use SQL Warehouses when
Business users are querying data, Power BI or Tableau dashboards are involved, or you're serving your Gold/reporting layer.

3. Serverless Compute

Serverless means Databricks manages the infrastructure entirely — no cluster config, no warm-up decisions. It appears as both a Serverless SQL Warehouse and (in newer platform versions) as Serverless Jobs.

Use serverless when
You want fast ad-hoc queries with zero setup overhead — especially for analysts who don't want to manage clusters.
Don't use serverless when
You're in a restricted networking environment, need private endpoints, or have complex Unity Catalog storage configurations. In those cases, a standard cluster is the correct choice — and the exam may test exactly this.
🏗️ Real-world note

A common real-world pattern: serverless fails in an enterprise environment due to private networking or Unity Catalog storage restrictions — but switching to a standard cluster with Unity Catalog resolves the issue immediately. If you see a scenario on the exam where serverless is ruled out for "security or networking reasons", the answer is Cluster.

Decision Framework

When you see a scenario on the exam, run it through this table:

Scenario Best Compute
PySpark ETL pipeline Cluster
Scheduled pipeline (Jobs / ADF) Job Cluster
Power BI or Tableau dashboard SQL Warehouse
Business analyst running SQL queries SQL Warehouse
Quick ad-hoc queries, no infra management Serverless SQL Warehouse
Secure enterprise pipeline with Unity Catalog Cluster
ML model training Cluster
Gold layer reporting for CFO dashboard SQL Warehouse

Exam-Style Practice Questions

Select an answer — green means correct, red means wrong.

Q1 You need to run a scheduled ETL pipeline written in PySpark. Which compute type should you use?
Q2 Business users need to query the Gold layer via Power BI. What is the appropriate compute?
Q3 A data analyst wants to run quick SQL queries with minimal infrastructure setup. What should they use?
Q4 Your pipeline requires custom Python libraries, full Spark control, and secure data access via Unity Catalog. What compute do you choose?
Q5 Serverless compute is failing in your environment due to private endpoint restrictions. What is the correct alternative?
Q6 A data engineer is developing and testing PySpark transformations interactively in a notebook. Which compute is most appropriate?

Common Exam Traps

These are the mistakes the exam is designed to catch:

⚡ Quick Memory Trick
People run SQL → SQL Warehouse. Code runs Spark → Cluster. Nobody manages infra → Serverless. Three sentences, three compute types covered.