Streamlining Databricks Projects with Our Custom Asset Bundle Template
Why Use A DAB Template?
Databricks Asset Bundles (DABs) are instrumental in managing data pipelines, jobs, and infrastructure as code. Our custom template enhances this by:
- Standardizing Structure: Establishes a consistent folder structure for code, notebooks, and resources.
- Enforcing Best Practices: Integrates version control, testing, and environment isolation.
- Streamlining CI/CD: Provides integration with GitHub Actions for automated validation and deployment.
- Enhancing Collaboration: Facilitates team onboarding and contributions.
This section details the implementation process.
Step 1: Set Up Databricks CLI
- Databricks CLI (v0.218.0 or higher): Install Databricks CLI: Follow the installation instructions for your enviornment. Then set up your authentication by running command
databricks auth login --host dbc-464ba720-0425.cloud.databricks.comFollow the browser based instructions. It will create a configuration profile to use in the future. To use a profile, pass it with the -p flag.databricks clusters list -p <profile-name>
Step 2: Initialize a New Project
Create a new project using our template:
databricks bundle init https://github.com/FocusedDiversity/synaptiq-dab-templateYou’ll be prompted for:
- Client Name: A unique shortname for the client (e.g.
HarmonyCares) - Project Name: A unique shortname for the project (e.g.
CHA). - Cloud Provider: Select
aws(or your preferred cloud). - Cloud Resources: Provide the cloud-specific information required for deployment.
This generates a project with our standard structure:
synaptiq-client_name-project_name/
├── .github
├── client_name/
├── docs/
├── notebooks/
| ├── utils/
│ ├── bronze/
│ ├── silver/
│ └── gold/
├── resources/
│ ├── clusters/
│ ├── jobs/
│ └── pipelines/
├── tests/
├── databricks.yml
Step 3: Customize Your Project
- Add Code: Place reusable Python modules in
notebooks/utils/and notebooks innotebooks/bronze/,notebooks/silver/, ornotebooks/gold/. - Define Resources: Update
resources/YAML files to define jobs, clusters, or pipelines. For example, editresources/sample_job.ymlto specify a job that runs an ingest notebook. - Write Tests: Add unit tests in
tests/unit/usingpytestto validate your code.
Step 4: Validate and Deploy
Validate your bundle to catch errors:
databricks bundle validateDeploy to your development workspace:
databricks bundle deploy --target devRun a job or pipeline:
databricks bundle run sample_jobStep 5: Integrate with CI/CD
Our template includes a sample GitHub Actions workflow (.github/workflows/ci.yml) that:
- Validates the bundle on pull requests.
- Deploys to dev/staging/prod based on branch.
- Runs tests using
pytest.
Pushing changes to GitHub will trigger the workflow for automated processing.
What’s Next?
This template is your foundation for building robust Databricks projects. Customize it for client-specific needs and contribute improvements to the template repo.
Resources:
- Databricks Asset Bundles Documentation (https://docs.databricks.com/aws/en/dev-tools/bundles/settings)
- Databricks Asset Bundle Templates Documentation (https://docs.databricks.com/aws/en/dev-tools/bundles/templates)
- Databricks CLI Guide (https://docs.databricks.com/aws/en/dev-tools/cli/tutorial)
- Template Repository (
https://github.com/FocusedDiversity/synaptiq-dab-template)