Identify the applicable compute to use for a specific use case

The Core Idea

In Databricks, compute is fully separated from storage. That means you always choose your compute — and the exam will test whether you choose correctly for a given scenario.

The decision comes down to four things: who is using it, what the workload is, what performance is needed, and what the governance constraints are.

Exam Mindset

Every question about compute is really a question about the workload. Map the scenario to the workload type first, then the compute choice becomes obvious.

The Three Compute Types

⚙️

Clusters

Spark compute — driver + workers. For engineering, notebooks, and ML.

"Spark processing"

📊

SQL Warehouses

Optimised for SQL queries and BI tools. For analysts and dashboards.

"BI / SQL analytics"

☁️

Serverless

Fully managed infra. For quick queries and ad-hoc work without setup.

"no infra management"

1. Clusters (All-Purpose & Job)

Clusters are Spark compute environments — a driver node plus workers. They're the backbone of data engineering and ML work.

All-purpose clusters are interactive, designed for notebooks and development. Job clusters spin up for a single job, then terminate automatically — cheaper and the right choice for production pipelines.

Use clusters when

Running PySpark ETL, streaming jobs, batch transformations, ML training, or any workload requiring custom libraries, fine-grained Spark control, or Unity Catalog governance.

2. SQL Warehouses (Databricks SQL)

SQL Warehouses are optimised purely for SQL queries and BI tools. They come in three flavours: Classic, Pro, and Serverless SQL Warehouse. The underlying compute is still Spark, but the interface and optimisation is all about query performance.

Use SQL Warehouses when

Business users are querying data, Power BI or Tableau dashboards are involved, or you're serving your Gold/reporting layer.

3. Serverless Compute

Serverless means Databricks manages the infrastructure entirely — no cluster config, no warm-up decisions. It appears as both a Serverless SQL Warehouse and (in newer platform versions) as Serverless Jobs.

Use serverless when

You want fast ad-hoc queries with zero setup overhead — especially for analysts who don't want to manage clusters.

Don't use serverless when

You're in a restricted networking environment, need private endpoints, or have complex Unity Catalog storage configurations. In those cases, a standard cluster is the correct choice — and the exam may test exactly this.

🏗️ Real-world note

A common real-world pattern: serverless fails in an enterprise environment due to private networking or Unity Catalog storage restrictions — but switching to a standard cluster with Unity Catalog resolves the issue immediately. If you see a scenario on the exam where serverless is ruled out for "security or networking reasons", the answer is Cluster.

Decision Framework

When you see a scenario on the exam, run it through this table:

Scenario	Best Compute
PySpark ETL pipeline	Cluster
Scheduled pipeline (Jobs / ADF)	Job Cluster
Power BI or Tableau dashboard	SQL Warehouse
Business analyst running SQL queries	SQL Warehouse
Quick ad-hoc queries, no infra management	Serverless SQL Warehouse
Secure enterprise pipeline with Unity Catalog	Cluster
ML model training	Cluster
Gold layer reporting for CFO dashboard	SQL Warehouse

Exam-Style Practice Questions

Select an answer — green means correct, red means wrong.

Q1 You need to run a scheduled ETL pipeline written in PySpark. Which compute type should you use?

Q2 Business users need to query the Gold layer via Power BI. What is the appropriate compute?

Q3 A data analyst wants to run quick SQL queries with minimal infrastructure setup. What should they use?

Q4 Your pipeline requires custom Python libraries, full Spark control, and secure data access via Unity Catalog. What compute do you choose?

Q5 Serverless compute is failing in your environment due to private endpoint restrictions. What is the correct alternative?

Q6 A data engineer is developing and testing PySpark transformations interactively in a notebook. Which compute is most appropriate?

Common Exam Traps

These are the mistakes the exam is designed to catch:

Using a SQL Warehouse for ETL — it is optimised for queries, not transformations or Spark jobs.
Using a Cluster for BI dashboards — adds unnecessary overhead and misses the purpose of SQL Warehouses.
Assuming serverless works in all enterprise setups — private networking and storage restrictions can block it.
Forgetting that serverless can cost more for sustained heavy workloads — it is not always the cheapest option.
Confusing All-purpose and Job clusters — if the scenario says "scheduled" or "automated", prefer a Job Cluster.

⚡ Quick Memory Trick

People run SQL → SQL Warehouse. Code runs Spark → Cluster. Nobody manages infra → Serverless. Three sentences, three compute types covered.