From S3 to Schema: Deploying Unity Catalog with Terraform on Databricks

Unity Catalog is the backbone of data governance in Databricks, offering a single control plane to manage permissions, data access, and lineage across all workspaces and personas, spanning engineers, analysts, and scientists.

OCT. 31, 2025

12 Min Read

Dilorom Abdullah

What is Unity Catalog?

Rather than handling ACLs per workspace or per table, Unity Catalog enables granular controls at the catalog, schema, and table levels with built-in auditing for compliance. The result is simpler governance, stronger security, and better data discovery, all while supporting multi-cloud environments. If you plan to scale a data platform responsibly, Unity Catalog isn’t optional. It’s foundational.

What is Terraform?

Terraform is an open-source Infrastructure as Code (IaC) framework for declaring, provisioning, and operating cloud infrastructure in a repeatable, consistent manner. Managing Unity Catalog resources with Terraform lets you automate setup, track changes in version control, and promote reliable deployments across environments, improving scalability, auditability, and team collaboration.

Create an S3 bucket

To back a Unity Catalog catalog on Databricks for AWS, you need cloud storage. In this case, an S3 bucket. The bucket associated with catalog creation will hold two categories of content:

Catalog metadata (schemas, tables, functions, and related entities).
The data stored within those objects (tables, schemas, functions, etc.).

When creating a schema under this catalog, you can optionally point its storage to a different location, such as another S3 bucket, if that better fits your setup.

If your AWS infrastructure isn’t managed via Terraform, create the S3 bucket directly in your AWS account, or ask your Cloud/DevOps team to provision it if you lack permissions.

If you do manage AWS with Terraform, this walkthrough assumes your AWS provider is already configured. You might keep bucket definitions in a main.tf under your AWS module or a dedicated buckets.tf. Wherever you define buckets, paste and adapt the example below.

If you’re unsure where to define it, it’s safe to add the S3 resource to the main.tf within your AWS module:

resource "aws_s3_bucket" "databricks_development_bucket" {

bucket = "databricks-development"

}

Keep the bucket URI handy for later (for example, s3://my-bucket-name). With the snippet above, that would be s3://databricks-development, subject to your organization’s naming rules.

If your team doesn’t yet have a naming standard, establish a clear, consistent convention now—it prevents confusion and saves time down the road.

Create an IAM role

If AWS resources aren’t managed by Terraform in your environment, create an IAM role in AWS that can access the S3 bucket you just set up. If you can’t create roles, have your Cloud/DevOps/Infrastructure team do this step.

Be sure to capture the role’s ARN (Amazon Resource Name)—you’ll need it later. It typically looks like:

arn:aws:iam::<account-id>:role/databricks-development

Replace <account-id> and databricks-development with values appropriate for your account.

If you do manage AWS with Terraform, we assume your AWS provider is configured. In the relevant .tf file, you’ll define the IAM role using the snippets below. Before the code, complete these steps:

Get Databricks account ID

You must include the Databricks account ID and an external ID in the IAM role’s trust policy. In the Databricks account console, click your profile in the upper-left to see your name, email, and account ID; copy the ID.

Get Databricks metastore external ID

In the account console, open Catalog from the left menu. Select the metastore you’ll use for this catalog. On the metastore details page, copy the External ID. You’ll place it in the role’s trust relationship.

metastore view

Next, substitute DATABRICKS_ACCOUNT_ID and EXTERNAL_ID with the values you collected:

resource "aws_iam_role" "databricks_development_role" {

name = "databricks_development_role"

assume_role_policy = jsonencode({

Version = "2012-10-17"

Statement = [

{

Effect = "Allow"

Principal = {

AWS = "arn:aws:iam::<DATABRICKS_ACCOUNT_ID>:root"

}

Action = "sts:AssumeRole"

Condition = {

StringEquals = {

"sts:ExternalId" = "<EXTERNAL_ID>"

}

]

})

}

The next step is attaching a policy that permits S3 access, allowing Databricks to read and write data. Without this, Databricks may assume the role but won’t be able to interact with the bucket.

resource "aws_iam_role_policy" "databricks_s3_access" {

name = "databricks_s3_access"

role = aws_iam_role.databricks_development_role.id

policy = jsonencode({

Version = "2012-10-17"

Statement = [

{

Effect = "Allow"

Action = [

"s3:GetObject",

"s3:PutObject",

"s3:DeleteObject",

"s3:ListBucket"

]

Resource = [

aws_s3_bucket.databricks_development_bucket.arn,

"${aws_s3_bucket.databricks_development_bucket.arn}/*"

]

}

]

})

}

Under the Resource list, the first entry:

aws_s3_bucket.databricks_development_bucket.arn

grants permissions on the bucket itself. This is required for bucket-level actions such as ListBucket.

The second line:

"${aws_s3_bucket.databricks_development_bucket.arn}/*"

applies to all objects within the bucket, for example: arn:aws:s3:::databricks-development/*.

This is needed for object-level operations like GetObject, PutObject, and DeleteObject.

Create a storage credential

With the S3 bucket in place, the next task is to allow Databricks to access it by creating a storage credential.

For most Unity Catalog entities (storage credentials, catalogs, external locations, and so on), the workspace you use to create them doesn’t matter. Because Unity Catalog is centralized, as long as the metastore is attached to the workspace, you can manage these objects from any workspace.

This guide assumes your Databricks provider is configured in Terraform. You can place the following in a dedicated storage-credentials.tf or directly in main.tf, depending on your layout. Note: I’ve included the access control block alongside the credential for convenience.

resource "databricks_storage_credential" "s3_unity_catalog_development" {

owner = databricks_group.groups["account_admins"].display_name

name = "s3-unity-catalog-development"

comment = "This is a storage credential for databricks-development S3 bucket to use for Sigma writeback in Unity Catalog"

aws_iam_role {

aws_iam_role.databricks_development_role.arn

}

resource "databricks_grants" "external_creds_s3_unity_catalog_development" {

grant {

privileges = ["ALL_PRIVILEGES", "MANAGE"]

principal = databricks_group.groups["account_admins"].display_name

}

grant {

privileges = ["ALL_PRIVILEGES", "MANAGE"]

principal = databricks_group.groups["metastore_admins"].display_name

}

storage_credential = databricks_storage_credential.s3_unity_catalog_development.id

}

In the first resource, the IAM role references the role created earlier. Setting owner is optional—by default, the creator (often the CI/CD service principal) becomes the owner. Avoid single-user or single-principal ownership for critical UC resources; assign ownership to appropriate groups such as account_admins and/or metastore_admins. The name and comment clarify intent and usage.

The second resource grants permissions. The grant entries specify:

principal: the target of the grant (group, service principal, or user).
privileges: the permissions to assign.

Refer to the Unity Catalog documentation for which privileges apply to each securable.

Create an external location

With the storage credential ready, define an external location. You can use a dedicated external_locations.tf or add this to main.tf, depending on your Terraform structure.

resource "databricks_external_location" "databricks_development" {

url = "s3://databricks-development/"

owner = databricks_group.groups["account_admins"].display_name

name = "databricks-development-external-location"

credential_name = databricks_storage_credential.s3_unity_catalog_development.name

comment = "This is an external location pointing to databricks-development S3 bucket in AWS."

}

resource "databricks_grants" "external_location_databricks_development" {

grant {

privileges = ["ALL_PRIVILEGES", "MANAGE"]

principal = databricks_group.groups["account_admins"].display_name

}

grant {

privileges = ["ALL_PRIVILEGES", "MANAGE"]

principal = databricks_group.groups["metastore_admins"].display_name

}

external_location = databricks_external_location.s3_unity_catalog_development.name

}

This mirrors the storage credential step. The first resource declares the external location, points to the correct S3 URL, and ties it to the credential. The second resource applies the desired permissions to the external location.

Create a catalog

With prerequisites finished, you can create the Unity Catalog catalog. Either place this in a catalogs.tf file or drop it into main.tf, based on your preferences.

resource "databricks_catalog" "development" {

storage_root = databricks_external_location.databricks_development.url

owner = databricks_group.groups["account_admins"].display_name

name = "development"

isolation_mode = "ISOLATED"

comment = "A catalog for development purposes."

}

This defines a catalog named development, attaches the external location as its storage root, sets an admin group as owner, configures isolation, and adds a descriptive comment.

Creating the catalog alone doesn’t make it visible in workspaces. To surface it, bind the catalog to specific workspace(s) with the snippet below.

locals {

workspace_id_development = 1234567890

}

resource "databricks_workspace_binding" "catalog_development_ws_dev" {

workspace_id = local.workspace_id_development

securable_name = databricks_catalog.development.name

binding_type = "BINDING_TYPE_READ_WRITE"

}

If you haven’t already defined your workspace ID elsewhere, create a local value as above. You can find the ID in the workspace URL, typically the numeric or alphanumeric value after o= in the query string, or in the AWS subdomain.

Here, we bind the development catalog to the development workspace by providing the workspace ID, the catalog name in securable_name, and the binding_type.

Note: The binding shown grants read/write. Use a read-only binding if that better fits your access model.

Repeat this resource for each workspace that should see the catalog.

Grant permissions to groups, users, or service principals

To control who can use the catalog, copy and adapt the following into catalogs.tf or another appropriate file:

resource "databricks_grants" "catalog_development" {

grant {

privileges = ["USE_CATALOG"]

principal = databricks_group.groups["engineers"].display_name

}

grant {

privileges = ["ALL_PRIVILEGES"]

principal = databricks_group.groups["account_admins"].display_name

}

grant {

privileges = ["ALL_PRIVILEGES"]

principal = databricks_group.groups["metastore_admins"].display_name

}

catalog = databricks_catalog.development.id

}

This block assigns specific privileges to the designated groups or principals for the catalog.

Create a schema and a table in the catalog

To create a schema within the catalog, place a block like this in a schema.tf or in the appropriate module file:

resource "databricks_schema" "my_demo_schema" {

catalog_name = databricks_catalog.development.id

name = "my_demo_schema"

owner = databricks_group.groups["account_admins"].display_name

storage_root = databricks_external_location.databricks_development.url

comment = "My demo schema"

}

This specifies the parent catalog, the new schema’s name, and its storage root. Here we reuse the external location created earlier, but you can point a schema to any external location, even one that differs from the catalog’s storage root.

This flexibility helps separate domains, isolate team storage, or differentiate staging from production.

We also assign ownership and provide a brief description via comment.

To manage schema-level access, add grants similar to the catalog example:

resource "databricks_grants" "my_demo_schema" {

schema = databricks_schema.my_demo_schema.id

grant {

privileges = ["USE_SCHEMA", "READ_VOLUME", "SELECT", "EXECUTE"]

principal = data.databricks_group.groups["engineers"].display_name

}

grant {

privileges = ["USE_SCHEMA", "READ_VOLUME", "SELECT", "EXECUTE"]

principal = data.databricks_group.groups["analysts"].display_name

}

Edge cases

A few real-world scenarios are worth calling out.

If you created your external location, storage credential, catalog, or schema in the UI and now want Terraform to manage them, you have two options:

Delete them in the UI and recreate everything via Terraform as shown above. (Not recommended for production.)
Import the existing objects into the Terraform state and start managing them in place.

To import, add the resource block to your configuration and run a Terraform import. For example:

import {

to = databricks_catalog.development

id = "development"

}

Here we import a catalog named development into the state. The id is the catalog name as shown in the Databricks UI. After importing, define a matching resource with the same identifier (development) used in the to reference.

import {

to = databricks_catalog.development

id = "development"

}

resource "databricks_catalog" "development" {

storage_root = databricks_external_location.databricks_development.url

owner = databricks_group.groups["account_admins"].display_name

name = "development"

isolation_mode = "ISOLATED"

comment = "A catalog for development purposes."

}

You only need to run the import once to bring the existing resource under Terraform. After a successful import and deploy, remove the import block from the code.

The same pattern applies to other Unity Catalog objects such as external locations, storage credentials, and schemas.

If a resource won’t be managed by Terraform but must be referenced from Terraform-managed configuration, use a data source instead.

For example, suppose AWS roles are managed elsewhere in your org, and the cloud team has provided the role ARN. You can reference it in a Terraform-managed resource (like a storage credential) by defining a data block:

data "aws_iam_role" "databricks_development_role" {

name = "databricks_development_role"

}

resource "databricks_storage_credential" "s3_unity_catalog_development" {

owner = databricks_group.groups["account_admins"].display_name

name = "s3-unity-catalog-development"

comment = "This is a storage credential for databricks-development S3 bucket to use for Sigma writeback in Unity Catalog"

aws_iam_role {

data.aws_iam_role.databricks_development_role.arn

}

Note: Because this is a data source, reference it with the data. prefix in Terraform:

aws_iam_role {

data.aws_iam_role.databricks_development_role.arn

}

Terraform brings rigor, scale, and automation to platform management—including Unity Catalog. Defining resources as code reduces manual mistakes, makes deployments reproducible, and keeps configurations versioned and auditable. That matters even more with multiple teams and workspaces, where separation of duties and clear access control are key.

While the initial setup can feel involved, once established it provides a secure, governed, automated foundation. Across development, staging, and production, IaC lets you evolve safely and predictably.

Summary

We stepped through creating a Unity Catalog catalog on AWS with Terraform—from setting up the S3 bucket and IAM role to configuring storage credentials, external locations, catalogs, schemas, and grants. We also touched on practical edge cases like importing existing resources into Terraform and referencing externally managed items with data sources. Using Terraform for this workflow gives you a maintainable, scalable governance model across your Databricks environment.

AUTHOR

Dilorom Abdullah

Senior Databricks Architect

Our Approach

From S3 to Schema: Deploying Unity Catalog with Terraform on Databricks

Unity Catalog is the backbone of data governance in Databricks, offering a single control plane to manage permissions, data access, and lineage across all workspaces and personas, spanning engineers, analysts, and scientists.

What is Unity Catalog?

What is Terraform?

Create an S3 bucket

Create an IAM role

Create a storage credential

Create an external location

Create a catalog

Grant permissions to groups, users, or service principals

Create a schema and a table in the catalog

Edge cases

Summary