Leveraging Databricks Asset Bundles

How we used Databricks Asset Bundles with Slingshot to streamline pipelines.

Syed Siraj Mehmood

October 30, 2025|6 min read

Capital One Slingshot is a data cloud management solution built and used by Capital One. We have recently expanded Slingshot’s core offering from Snowflake to Databricks optimization, releasing numerous new cost optimization features for Databricks.

As we expanded Slingshot’s offering to Databricks, we quickly encountered challenges managing the complexity of the platform for consistent, repeatable and maintainable pipelines. This is especially true as we have multiple engineering teams in the organization using Slingshot to deploy and manage pipelines on Databricks.

Our engineering teams also have certain requirements that influenced the solution we chose to implement. We needed:

Repeatable pipelines across multiple environments (dev, QA, prod)
The ability to integrate with enterprise CI/CD tools with little-to-low effort (i.e., Jenkins)
Support for a growing number of jobs and scalability
The ability to easily allow different teams to have independent jobs, while also sharing code as needed
To be able to support native features like Serverless Jobs, DLT, dashboards, etc.

After some research we landed on two options: the first was building our own Python–based deployment system and the second was using a Databricks native offering called Asset Bundles.

Databricks Asset Bundles

Databricks Asset Bundles is a CI/CD tool that allows you to deploy various assets (Jobs, DLT tables, etc.) together to Databricks workspaces. The tool has a concept of a “bundle,” which is essentially a collection of assets that can be repeatedly deployed across workspaces. A bundle will contain the following:

Config files: Configuration of a job, workspace, etc.
Source files: Notebooks, Python wheel files, etc.

Databricks Asset Bundles meets our requirements with seamless CI/CD integration capabilities, version control for deployments, consistent portability across environments and simplified management of processes and pipelines across Databricks environments.

However, we discovered a significant flaw. A crucial aspect of our workflows hinges on effective catalog management, which fundamentally relies on a precise structure of schemas and tables. For catalog management to truly function, the underlying structure of schemas and tables must be consistently established. This becomes especially problematic when deploying into new or isolated environments where the required catalog structure simply didn't exist.

To solve this, we integrated a dedicated setup task into our primary job. This task automatically creates or updates the catalog and the required tables at the beginning of every run. This addition has dramatically enhanced the reliability of our deployments and effectively eliminated the need for tedious manual pre-configuration.

The setup

Given the benefits we identified with Asset Bundles and the additional task for catalog management, we decided to move forward with implementation.

Here’s a basic layout of the architecture implemented for Asset Bundles:

Illustration of Databricks Asset Bundles implementation architecture

As you can see, we follow a standard flow to deploy with Asset Bundles. Developers create a PR, which is deployed via Jenkins. Jenkins then pulls the Asset Bundles Image (a Python package) and deploys the files. We inject secrets, define the environment that it is being deployed to, etc. in the CI/CD. This automatically happens on merge to all environments except those in production, which requires prior approval from our change management system.

For a deeper look, let’s take a code example of what we do to deploy Asset Bundles. One unique thing is we set up multiple assets to be deployed. For example, ‘Team A’ might have an asset, and then ‘Team B’ might have another asset. Code is set up in a way that it can be shared if necessary between these two assets. In addition, as we now have many asset files, we also changed how we structure files so they are easier to manage.

File structure

src/
	team_a_bundle/
		databricks.yml
	team_b_bundle/
		bundle_configs/
			job_1_config.yaml
			job_2_config.yaml
		databricks.yml
 tests/
 Jenkinsfile

As you can see, we have maintained a fairly clean file structure. The file structure is as follows:

Each “Bundle” has its own folder that contains all the code for a specific bundle (a shared folder in a bundle is also an option, but can be more complex).
Within each bundle we have a databricks.yml file. This is the primary code for a Databricks bundle and contains common variables across jobs and other assets that can be leveraged. This file can either contain the definition of an asset, or reference the asset files from another location (in the example above it’s under the bundle_configs).

Let’s take this a step further and look at an example yaml file:

# yaml-language-server: $schema=bundle_config_schema.json
bundle:
  name: sample_dbx_bundle

variables:
  s3_bucket:
    description: The S3 bucket for this sample
    default: ""

workspace:
  host: https://sample.cloud.databricks.com

resources:
  jobs:
    sample_job:
      name: "[${bundle.environment}] Sample DBx Job"
      tasks:
        - task_key: serverless_task
          depends_on:
            - task_key: dlt_medium_pipeline
          notebook_task:
            notebook_path: ./fe_medium_report.py
        - task_key: standard_compute_task
          depends_on:
            - task_key: serverless_task
          notebook_task:
            notebook_path: ./fe_medium_report.py
          new_cluster:
            spark_version: 13.1.x-scala2.12
            num_workers: 1
            node_type_id: i3.xlarge            

environments:
  development:
    default: true

  dev:
    workspace:
      host: https://dev-sample-workspace.cloud.databricks.com/
   variables:
      s3_bucket: "s3://sdev-bucket"   
  qa:
    workspace:
      host: https://qa-sample-workspace.cloud.databricks.com/
   variables:
      s3_bucket: "s3://sdev-bucket"
  prod:
    workspace:
      host: https://prod-sample-workspace.cloud.databricks.com/
   variables:
      s3_bucket: "s3://sdev-bucket"

Bundle definition

The above databricks.yml file defines a Databricks Asset Bundle configuration. Let's break down its structure and components.

`bundle`: The top-level key, indicating the start of the bundle definition.
`name`: Specifies the name of the Databricks bundle, which is `sample_dbx_bundle`.

Variables

`variables`: Defines variables that can be used within the bundle configuration.
`variable_name,`: The name of the variable to define
`description`: Describes what the variable represents:
`default`: Sets the default value for the variable to an empty string (`""`).

Workspace

`workspace`: Configures the Databricks workspace.
`host`: Sets the default host URL for the Databricks workspace: `https://sample.cloud.databricks.com`.

Resources

`resources`: Defines the resources that will be deployed as part of the bundle.
`jobs`: Specifies the Databricks jobs to be deployed.
`sample_job`: Defines a job named `sample_job`.
`name`: Sets the name of the job, including the environment: 
`"[${bundle.environment}] Sample DBx Job"`.
`tasks`: Defines the tasks within the job.
`task_key`: Defines a task named `serverless_task`.
`depends_on`: Specifies that this task depends on `dlt_medium_pipeline`.
`notebook_task`: Defines a notebook task.
`notebook_path`: Sets the path to the notebook: `./fe_medium_report.py`.
`task_key`: Defines a task named `standard_compute_task`.
`depends_on`: Specifies that this task depends on `serverless_task`.
`notebook_task`: Defines a notebook task.
`notebook_path`: Sets the path to the notebook
`new_cluster`: Defines a new cluster for the task.
`spark_version`: Sets the Spark version: `13.1.x-scala2.12`.
`num_workers`: Sets the number of workers: `1`.
`node_type_id`: Sets the node type ID: `i3.xlarge`.

Environments

`environments`: Defines different deployment environments.
`env`: Defines the production environment.
`workspace`: Overrides the default workspace settings.
`host`: Sets the host URL for Workspace
`variables`: Overrides the default variables.
`variable_name`: Sets the variable value.

This databricks.yml file configures a Databricks Asset Bundle named sample_dbx_bundle with a job, variables and specific settings for different environments (dev, QA and prod). It defines how the job will be executed, including tasks, dependencies and cluster configuration, while also allowing for environment-specific overrides for workspace details and variables like the S3 bucket.

Conclusion

Databricks Asset Bundles was a great choice to make our deployment pipeline simpler and more scalable across development teams. Its structured approach enables our engineering teams to easily standardize operations, reduce manual errors and achieve consistent deployments across diverse environments.

If your organization is looking to optimize your Databricks deployment strategy, we recommend Asset Bundles as a powerful enabler for robust and repeatable CI/CD.

Syed Siraj Mehmood, Director of Software Engineering, Capital One Slingshot

Syed is a Director of Software Engineering with over 14 years of experience at the intersection of full-stack development and data engineering. He was the driving force behind the strategic integration of Snowflake and Databricks into the Capital One Slingshot platform, architecting the data services that power its native applications. His leadership has been critical in advancing Slingshot's capabilities in large-scale data processing, real-time analytics and the delivery of cutting-edge data products.

Leveraging Databricks Asset Bundles

How we used Databricks Asset Bundles with Slingshot to streamline pipelines.

Databricks Asset Bundles

The setup

File structure

Bundle definition

Variables

Workspace

Resources

Environments

Conclusion

Related Content

20 SQL queries to assess Databricks health & free dashboard

Top 5 Databricks features announced at DAIS 2025

New visibility and optimizations features for Databricks

Footnotes