Building a data app in Databricks: automating invoice retrieval and PDF delivery

In this post, I’ll show you how to build a straightforward data application using Databricks Apps.

NOV. 14, 2025

7 Min Read

Dilorom Abdullah

The goal of this example is to automate a repetitive task: locating invoice numbers and fetching their corresponding PDF files.

Why Databricks Apps?

Traditionally, creating an internal data-facing application required bringing in a frontend developer proficient in React or JavaScript, setting up infrastructure across cloud environments, and coordinating efforts between multiple teams. This usually meant multiple approval cycles and added overhead before anything could be built. Databricks Apps reduce this friction by letting you deploy applications directly inside your workspace (handling authentication, hosting, and access management automatically), so you can focus purely on functionality rather than infrastructure.

In this guide, we’ll look at how Databricks streamlines that process, examine the main components of an app, and walk through screenshots showing how everything fits together.

Introduction to Databricks Apps

Databricks Apps offer a simplified way to build and run custom applications natively within your Databricks workspace. You can easily create:

Chat-style apps powered by serving endpoints
Data-driven apps connected to SQL Warehouses
Simple, self-contained “Hello World” prototypes

Because authentication, SSO, and deployment are managed for you, there’s no need for external hosting. The platform supports frameworks such as Dash, Flask, Gravy, Shiny, and Streamlit. At the 2025 Data and AI Summit, Databricks expanded this support to include React, JavaScript, and Node.js, giving developers even more flexibility when building front-end interfaces.

For our scenario, we’ll focus on building a data app, taking advantage of its direct integration with Unity Catalog and the fine-grained access control it provides.

Creating a new data app

To get started:

Go to your Databricks workspace and navigate to Compute → Apps.
Select Create new app, and then choose Data app.

Configure resources by choosing an existing SQL Warehouse or creating a new one.

Next, review the app’s authorization details — you’ll notice Databricks automatically creates a service principal for SQL Warehouse access.

Provide a name and optional description for your app (the defaults are fine if you prefer).

Deployment takes just a few minutes. Once complete, you’ll get a link to open the app, along with details on code location and logs. You can modify the app’s source code directly in Databricks or locally, manage permissions, and redeploy updated versions using the Deploy button. If needed, you can also change the deployment source path.

The About this app panel displays metadata such as the creation date, last modification time, and app ID.

Under Authorization, you’ll see the assigned service principal and its ID. You can rename it from the Databricks Account Console if you’d like a clearer identifier.

The Deployments section tracks past deployments and includes logs for debugging purposes. Meanwhile, Environments lists environment variables, which can be referenced directly in your code or configuration files.

Exploring the default data app

Every Databricks workspace comes with a prebuilt sample app — the New York Taxi Fare demo — showcasing how interactive dashboards can be built on top of SQL Warehouse data. Opening the app displays fare distribution visualizations with input filters for ZIP codes and a “predict” button. This example highlights how quickly you can deliver interactive, data-backed applications for business users without additional infrastructure.

Accessing and modifying the app code

To view or change your app’s code:

In the app overview, click the deployment source path link.
You’ll be redirected to your workspace files (typically under Users > your_user > Databricks apps > app_name). The main branch contains three key files:
- app.py – the primary logic and UI code
- app.yaml – configuration and environment details, including the SQL Warehouse
- requirements.txt – dependency list; add any extra libraries your app needs here

You can edit app.py to adjust business logic or the user interface, tweak variables in app.yaml, and extend requirements.txt for additional dependencies. If you prefer local development, Databricks provides syncing instructions, but you can also develop entirely within the workspace for simplicity.

Building a custom invoice processing app

As a practical example, I created an app that allows business users to retrieve invoice PDFs based on a list of invoice numbers. Users enter the numbers (comma-separated), and the app performs the following:

Queries a Delta table containing pre-signed S3 URLs for each invoice.
Downloads the corresponding PDFs from S3 using those URLs.
Creates a timestamped folder in Google Drive.
Uploads the PDFs and writes a log file that records the process status.

This design keeps S3 credentials secure by never exposing direct bucket access to end users. Once processing completes, the app displays a status message and a button linking to the Google Drive folder. For each invoice, the log specifies whether it was skipped (already uploaded), successfully retrieved, or missing.

Below is the simple input interface where users enter invoice numbers and trigger the Fetch and Upload PDFs action.

After running, the interface updates to show the status summary for each invoice.

Navigating the Google Drive output

Clicking Open Google Drive folder takes you to the latest timestamped directory that corresponds to your most recent run.

Inside, you’ll find the uploaded PDFs alongside a log file that records processing details.

The log file summarizes the state of each invoice:

Skipped – previously processed and stored in an earlier folder
Found – successfully matched to a pre-signed URL and uploaded
Not found – missing from both the processed records and the pre-signed URLs table

Behind the scenes: key components

Here’s an overview of the underlying design (omitting detailed implementation code):

PDF files in S3 – stored securely; end users don’t receive direct access
Pre-signed URLs table – generated daily by a scheduled notebook that creates short-lived (one-day) URLs stored in Delta
Processed data table – tracks previously handled invoices, timestamps, and their processing status

App workflow:

Clean and validate the user’s invoice list.
For each invoice, check the processed table. If present, mark as skipped.
If absent, look up the pre-signed URLs table. If a match exists, download the file from S3, connect to Google Drive via API, create a timestamped directory, and upload the file. Log it as found.
If not located in either table, log as not found.

Since the app executes within Databricks, it inherits Databricks’ unified governance framework through Unity Catalog. The app’s service principal must have permission to read and write to the Delta tables. For Google Drive operations, a separate Google service account performs authenticated API requests on the app’s behalf. That account’s JSON key should be stored securely using Databricks Secrets. For setup details, see the earlier article on Databricks secrets require disciplined CLI habits for secure delivery, which covers secret storage steps.

To enable larger batch processing, the app could also be extended to accept a CSV file of invoice numbers. Two access layers are essential for this workflow:

Permission to S3 and Delta tables (through AWS credentials or a Databricks service principal)
Permission to Google Drive (via a Google service account)

Conclusion

Databricks Apps provide a practical way to create interactive tools that live alongside your data, taking advantage of Databricks’ built-in authentication and governance. In this example, we walked through building a data app from the ground up, deploying it with a SQL Warehouse, and extending it to automate invoice retrieval from S3 to Google Drive. By offloading infrastructure and identity management to Databricks, developers can focus on the core logic and user flow — whether for a quick dashboard or a complete operational workflow like this automated invoice PDF app.

AUTHOR