Using Databricks Asset Bundles for Client Projects
Introduction
Databricks Asset Bundles (DABs) are an Infrastructure-as-Code tool for managing Databricks resources, such as notebooks, jobs, Delta Live Tables, and clusters, using YAML configurations. They enable standardized, automated workflows for client projects, improving efficiency, consistency, and compliance. This post outlines when to use DABs, how to implement them, and best practices.
When to Use DABs
DABs are suited for: - Projects with multiple teams needing standardized configurations. - Deployments across development, staging, and production environments. - Regulated industries requiring versioned configurations for compliance. - Workflows needing CI/CD automation for rapid delivery. - Projects with recurring patterns where templates reduce setup time.
How to Implement DABs
Install Databricks CLI: Follow the installation instructions for your enviornment. Then set up your authentication by running command
databricks auth login --host dbc-464ba720-0425.cloud.databricks.comFollow the browser based instructions. It will create a configuration profile to use in the future. To use a profile, pass it with the -p flag.databricks clusters list -p <profile-name>Initialize Project: Synaptiq has a custom template to use when creating bundles for client projects. Synaptiq Bundle Template You can also use some of the default built in templates. Run
databricks bundle init synaptiq-dab-templateto create a project template with adatabricks.ymlfile and directories for notebooks, code, jobs, and pipelines.Define Resources: Specify jobs or pipelines in
databricks.yml. For example, a job might reference a notebook at./notebooks/ingest.py.bundle: name: <project-name> resources: jobs: ingestion: tasks: - task_key: ingest notebook_task: notebook_path: ./notebooks/ingest.pyYou can also include resource yml files:
bundle: name: <project-name> include: - resources/jobs/*.ymlSee Databricks Asset Bundle Configuration for the full yaml specification.
Validate and Deploy: Validate with
databricks bundle validate. Deploy to workspaces withdatabricks bundle deploy -t dev.Execute and Monitor: Run workflows with
databricks bundle run -t dev <project-name>. You can also run individual jobs or pipelines from the databricks web UI.Customize for Clients: Use variables like
${var.client_catalog}for client-specific settings, such as schemas. See Substitutions and Variables in Databricks Asset Bundles for how to define and use variables in the bundle.
Best Practices
- Project Structure: Use
/notebooksfor ipython notebooks,/<client_name>for client-specific code,/srcfor shared utilities,/resourcesfor YAML files, and/testsfor testing. - Version Control: Store DABs in Git with branches like
feature/<project-name>. - CI/CD Automation: Automate validation and deployment with GitHub Actions or Azure DevOps.
- Testing: Use pytest for unit tests and Nutter for integration tests in CI/CD pipelines.
- Security: Store credentials in secret scopes or environment variables.
- Templates: Create reusable DAB templates for common workflows, like ETL pipelines.
- Documentation: Include a
README.mdwith setup and client-specific details.