North American asset manager streamlines data management with a modern lakehouse on Databricks and AWS

A global asset manager transformed its fragmented data ecosystem by adopting the Databricks Lakehouse on AWS, enabling real-time analytics, streamlined governance, and AI readiness across the enterprise.
About
This privately held investment management firm specializes in a comprehensive suite of investment products, risk management solutions, and advisory services. Managing funds for investors and institutions worldwide, the company employs associates globally.
200+
datasets processed in near real-time
12-Wk
Proof of Concept (POC) delivered measurable success
200+
associates trained on Databricks tools
Challenge
Before adopting Databricks, the organization faced significant data-management limitations that constrained agility and slowed innovation. Multiple teams relied on disparate tools—including pandas notebooks, on-prem compute, and ad-hoc scripts—which resulted in duplicated efforts and inconsistent workflows. Legacy systems struggled to handle increasing data volumes across environments such as S3, PostgreSQL, Snowflake, and other legacy databases.
Development teams lacked a centralized platform for experimentation, spending more time managing infrastructure than producing insight. Analysts, data scientists, and engineers operated in separate environments without a unified workspace, delaying cross-functional collaboration and reducing productivity. The on-prem and cloud-based hybrid setup also increased maintenance costs and deployment complexity, making it difficult to scale analytics or introduce advanced capabilities such as streaming, machine learning, or AI at an enterprise level.
The firm needed a unified data foundation capable of supporting real-time analytics, AI-driven insights, and robust governance for its investment and research ecosystem.
Approach
The firm selected Lumenalta to design a phased modernization strategy that would migrate legacy pipelines and analytics workloads to the Databricks Lakehouse Platform on AWS. The engagement began in August 2024 with an initial proof of concept and has since expanded into an ongoing enterprise adoption effort.
This modernization initiative aimed to unify disparate data pipelines within a single Lakehouse architecture, streamline analytics workflows using Databricks SQL and Delta Live Tables, and strengthen data governance and compliance through Unity Catalog and automated testing frameworks. The approach balanced innovation with operational stability, ensuring that migration efforts did not disrupt existing business processes while laying the groundwork for long-term scalability.
Solution

Phase 1 – 12-week Proof of Concept (POC) success:
- Delta Lake for unified storage: All ingestion and transformation pipelines were standardized on Delta Lake, ensuring ACID transactions, schema enforcement, and time travel for reliable historical insight.
- Automated data pipelines: Implemented Databricks Auto Loader to process over 200 datasets in near real time, reducing manual intervention.
- Efficient ETL workflows: Adopted Delta Live Tables (DLTs) to improve reliability, reduce pipeline errors, and accelerate deployment cycles.
- Federated queries: Linked external financial and reference data (e.g., S&P 500 from Snowflake) to internal sources for faster dashboard creation.
- Unity Catalog for governance: Provided full data lineage, auditing, and fine-grained access control, improving compliance and reducing access-approval times.
- Databricks SQL for self-service BI: Business analysts gained direct access to curated Delta tables for ad hoc reporting without engineering dependencies.
- GenAI-assisted automation: Integrated AI for data-quality validation and documentation within Databricks Jobs, cutting manual effort and improving consistency.
Phase 2 – Enterprise Rollout of two lines of business (LOBs):
Following POC success, two key lines of business were migrated:
Following POC success, two key lines of business were migrated:
- LOB 1 – AWS EMR Migration: Transitioned from AWS EMR to Databricks and dbt for unified orchestration and faster transformations.
- LOB 2 – Risk Analysis Modernization: Migrated R-based ETL and compute to Databricks and Delta Lake on AWS, enhancing scalability and reliability.
Each migration leveraged the Databricks Lakehouse architecture to consolidate storage, computation, and governance.
Ongoing Enhancements
- Implementing row- and column-level security within Unity Catalog to further strengthen governance.
- Training 200+ associates on Databricks tools to build an analytics-driven culture.
- Exploring Mosaic AI components (LakeBase, LakeFlow, LakeBridge, AgentBricks) to expand AI-ready capabilities.
Key Highlights & Impact
The adoption of the Databricks Lakehouse Platform has streamlined operations, enhanced collaboration, and established a scalable foundation for AI-enabled investment insights. By consolidating pipelines from R scripts, SQL Server ETL, AWS Glue jobs, and Jupyter notebooks into one Lakehouse environment, teams can now manage data more efficiently and accelerate reporting cycles. The unified workspace enables analysts, engineers, and data scientists to collaborate in real time, improving speed and consistency of analysis across the enterprise.

The platform has also simplified infrastructure and reduced maintenance overhead through cloud-native scalability, while Databricks SQL and Delta Live Tables have delivered faster, more reliable analytics. Unity Catalog has strengthened governance, ensuring centralized control over permissions, lineage, and auditing—an essential capability for meeting strict regulatory requirements.
This modernization initiative has positioned the firm to expand into advanced analytics and AI, with early exploration of Mosaic AI tools preparing the organization for future use cases in predictive modeling and portfolio optimization. The partnership continues to evolve as the company scales its Lakehouse capabilities across the enterprise.
Platforms
- AWS
- Databricks
- Delta Lake
- Databricks SQL
- Unity Catalog
- Delta Live Tables
- PySpark
- Python
- Sparklyr
Capabilities
- Data-driven infrastructure
- AI-powered BI tools
- Integrated GenAI for automated pipelines
- Enhanced data governance and compliance
- Cloud-native scalability
- Unified developer workspace



