Hands-on DataOps with Databricks, Terraform & GitHub Actions

Chandan Kumar
Jul 9
2 min read

A Step-by-Step Guide to Automating Databricks Deployments Using Infrastructure-as-Code

Why DataOps + DevOps for Databricks?

As teams scale their cloud-native data platforms, automation and reproducibility become essential. Manual provisioning and notebook execution just don’t cut it anymore. That’s where Infrastructure as Code (IaC) and CI/CD come in.

In this post, we’ll walk through what I presented in my latest webinar — a real-world automation pipeline that provisions Azure Databricks using Terraform, manages ETL notebooks and jobs, and schedules them using GitHub Actions.

Whether you're just getting started or already running Spark jobs in production, this guide will help you think like a platform engineer while working with data tools.

Architecture Overview

Here’s the high-level architecture we implemented:

Architecture of the Databricks Data Pipeline

Key components:

Terraform modules for reusable infrastructure
Azure for cloud resources (Databricks, Resource Groups, VNets)
GitHub Actions for automation
Databricks Jobs API for job orchestration
Fivetran for file ingestion (optional for real-time demo)

Modular Terraform Setup for Azure Databricks

We created two major layers:

infra/: Core Infrastructure

Resource Group
Virtual Network
Azure Databricks Workspace
Network Security Groups

module "databricks_workspace" {
  source                          = "../../../modules/databricks_workspace"
  workspace_name                  = "${local.prefix}-workspace"
  resource_group_name             = var.resource_group_name
  region                          = var.region
  managed_resource_group_name     = "${local.prefix}-managed-rg"
  vnet_id                         = module.network.vnet_id
  ...
}

apps/: Deploying Jobs, Notebooks, and Workflows

We created a simple Spark job as a Python script and uploaded it as a Databricks notebook:

resource "databricks_notebook" "nightly_job_notebook" {
  path     = "/Shared/nightly_task"
  language = "PYTHON"
  content_base64 = base64encode(file(var.notebook_file_path))
}

And the corresponding job:

resource "databricks_job" "nightly_serverless_job" {
  name = "Nightly Python Job - Serverless"
  notebook_task {
    notebook_path = databricks_notebook.nightly_job_notebook.path
  }
  schedule {
    quartz_cron_expression = "0 0 * * * ?"
    timezone_id = "UTC"
  }
  job_cluster {
    job_cluster_key = "serverless_cluster"
    new_cluster {
      spark_version = "13.3.x-scala2.12"
      runtime_engine = "PHOTON"
      num_workers = 1
    }
  }
}

GitHub Actions CI/CD for Terraform

We added a .github/workflows/terraform.yml pipeline:

name: Deploy Databricks Infra

on:
  push:
    paths:
      - 'apps/**'
      - 'infra/**'
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve

This allows you to trigger deployments automatically or manually on any infra/app changes.

Testing with Databricks Community Edition

To make the workshop accessible, we demonstrated how to:

Create a free Databricks Community Edition account
Run the same jobs and notebooks without Azure billing
Sync code from GitHub manually or using databricks-cli

What You’ll Walk Away With

By the end of this exercise, you can:

Deploy Azure Databricks workspaces using Terraform
Organize your infrastructure and application layers cleanly
Manage Spark jobs, notebooks, and workflows as code
Automate it all using GitHub Actions

What’s Next?

The repository https://github.com/kchandan/azure-databricks-terraform

In the upcoming sessions and course, we’ll dive into:

🔐 Secure secret management (Key Vault + Databricks secrets)
🧩 Advanced CI/CD pipelines
🧬 Integrating Fivetran, dbt, and Unity Catalog
🌍 Multi-environment (dev/staging/prod) strategies

DevOps Services

Dev Rel Services

For Developers

DevOps Launchpad

AI Dev Accelerator