Hands-on DataOps with Databricks, Terraform & GitHub Actions
- Chandan Kumar
- Jul 9
- 2 min read
A Step-by-Step Guide to Automating Databricks Deployments Using Infrastructure-as-Code
Why DataOps + DevOps for Databricks?
As teams scale their cloud-native data platforms, automation and reproducibility become essential. Manual provisioning and notebook execution just don’t cut it anymore. That’s where Infrastructure as Code (IaC) and CI/CD come in.
In this post, we’ll walk through what I presented in my latest webinar — a real-world automation pipeline that provisions Azure Databricks using Terraform, manages ETL notebooks and jobs, and schedules them using GitHub Actions.
Whether you're just getting started or already running Spark jobs in production, this guide will help you think like a platform engineer while working with data tools.
Architecture Overview
Here’s the high-level architecture we implemented:

Key components:
Terraform modules for reusable infrastructure
Azure for cloud resources (Databricks, Resource Groups, VNets)
GitHub Actions for automation
Databricks Jobs API for job orchestration
Fivetran for file ingestion (optional for real-time demo)
Modular Terraform Setup for Azure Databricks
We created two major layers:
infra/: Core Infrastructure
Resource Group
Virtual Network
Azure Databricks Workspace
Network Security Groups
module "databricks_workspace" {
source = "../../../modules/databricks_workspace"
workspace_name = "${local.prefix}-workspace"
resource_group_name = var.resource_group_name
region = var.region
managed_resource_group_name = "${local.prefix}-managed-rg"
vnet_id = module.network.vnet_id
...
}
apps/: Deploying Jobs, Notebooks, and Workflows
We created a simple Spark job as a Python script and uploaded it as a Databricks notebook:
resource "databricks_notebook" "nightly_job_notebook" {
path = "/Shared/nightly_task"
language = "PYTHON"
content_base64 = base64encode(file(var.notebook_file_path))
}
And the corresponding job:
resource "databricks_job" "nightly_serverless_job" {
name = "Nightly Python Job - Serverless"
notebook_task {
notebook_path = databricks_notebook.nightly_job_notebook.path
}
schedule {
quartz_cron_expression = "0 0 * * * ?"
timezone_id = "UTC"
}
job_cluster {
job_cluster_key = "serverless_cluster"
new_cluster {
spark_version = "13.3.x-scala2.12"
runtime_engine = "PHOTON"
num_workers = 1
}
}
}
GitHub Actions CI/CD for Terraform
We added a .github/workflows/terraform.yml pipeline:
name: Deploy Databricks Infra
on:
push:
paths:
- 'apps/**'
- 'infra/**'
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Terraform Apply
run: terraform apply -auto-approve
This allows you to trigger deployments automatically or manually on any infra/app changes.
Testing with Databricks Community Edition
To make the workshop accessible, we demonstrated how to:
Create a free Databricks Community Edition account
Run the same jobs and notebooks without Azure billing
Sync code from GitHub manually or using databricks-cli
What You’ll Walk Away With
By the end of this exercise, you can:
Deploy Azure Databricks workspaces using Terraform
Organize your infrastructure and application layers cleanly
Manage Spark jobs, notebooks, and workflows as code
Automate it all using GitHub Actions
What’s Next?
The repository https://github.com/kchandan/azure-databricks-terraform
In the upcoming sessions and course, we’ll dive into:
🔐 Secure secret management (Key Vault + Databricks secrets)
🧩 Advanced CI/CD pipelines
🧬 Integrating Fivetran, dbt, and Unity Catalog
🌍 Multi-environment (dev/staging/prod) strategies
