Medical Lab Achieves 50% Compute Savings via Databricks Migration
Sagis Diagnostics, a leading U.S. pathology lab, replaced its fragmented Azure SQL setup with a unified Databricks Lakehouse built by Dataforest. The migration consolidated 21 data sources, automated analytics, and ensured HIPAA compliance — delivering full data transparency, pay-per-use efficiency, and a ~50% reduction in compute costs.
~
50
%
compute cost reduction through optimized architecture
21
Integrated data sources unified under Medallion Architecture
3
Genie spaces deployed for self-service BI
.webp)
Sagis Diagnostics, a US-based, physician-owned subspecialty diagnostic pathology laboratory, partners with healthcare providers and insurance companies to deliver precise diagnostic data analysis and pathology services.
Python
Spark
Azure SQL
Databricks
Genie (LLM)
THE CHALLENGE
Migrating from Azure SQL to Databricks for Scalability, Compliance, and Cost Efficiency
Sagis Diagnostics needed to migrate from a legacy Azure SQL Server environment to Databricks to unify diagnostics and billing data, enable advanced analytics, and ensure compliance with healthcare data standards.
The Azure system covered only 20–25% of the required functionality, lacked scalability for growing data volumes, and cost around $20,000 per year while utilizing only a small portion of its capacity.
The new solution also needed to support long-term data growth, improve observability, and consolidate all BI, AI, and compliance workflows into a single Lakehouse platform.
Transform Legacy SQL Scripts into Functional Jobs
The previous environment relied on static SQL scripts that lacked automation and consistency across data workflows. This limited scalability and increased maintenance overhead.
Ensure Data Compliance in Databricks (Patient Data)
Sagis Diagnostics processes sensitive patient data, requiring strict compliance with HIPAA and healthcare data protection standards.
Achieve Full Observability Across Data Pipelines
Before the migration, the client lacked visibility into how data was transformed, validated, and consumed, making it difficult to track issues or verify accuracy. Limited visibility from ingestion to BI/AI output incurs blind spots on freshness, schema drift, quality regressions, and downstream blast radius.
Implementation Challenges During Migration
At the start of the project, there was no clear documentation of the existing Azure setup. The client also faced limited access rights, missing connectors, and Databricks platform updates that required continuous adaptation and consultation with Databricks support.
THE SOLUTION
Unified Migration to a Modern, Compliant Lakehouse Architecture
We successfully migrated all data and pipelines from Azure SQL to Databricks-based enterprise data warehouse. Implemented a Medallion Architecture (Bronze/Silver/Gold), and rewrote legacy SQL scripts into automated, production-ready Databricks jobs. The new platform provides governed data storage, real-time observability, and cost-efficient compute scaling.
Automated Databricks Job Conversion
All legacy SQL scripts were converted into fully automated Databricks jobs with error handling, scheduling, and integration into the business logic layer.
Key Deliverables:
- Delta Lake-based medallion architecture with CDC ingestion
- Production-ready Databricks environment
- 3-tier medallion pipeline (Bronze/Silver/Gold) processing 30+ tables
- 3 fully functional LLM (Genie) spaces tailored for AI/BI business needs
- Cost-monitoring dashboard for precise compute control
Data Lineage and Monitoring Dashboards
Dataforest implemented automated data lineage and monitoring dashboards within Databricks, featuring real-time data refresh tracking, anomaly detection, and event-based alerts. This provided full transparency, faster troubleshooting, and greater confidence in data reliability. Clear runbooks and domain-level SLOs ensured faster incident resolution, safer change management, and reliable, compliant analytics.
Compliance-First Databricks Environment
Patient data used for AI/BI and ML training was fully anonymized to maintain HIPAA compliance while enabling advanced analytics.
Incremental Implementation and Knowledge Transfer
Our engineers reconstructed undocumented logic by analyzing legacy SQL patterns and rebuilding missing connectors. Access management was standardized, and an adaptive update policy was implemented to synchronize with Databricks’ frequent releases. Continuous documentation and proactive communication ensured smooth handover and maintainability.
THE RESULT
Unified, Compliant, and Scalable Data Platform with Pay-per-use compute reducing costs from $20k to ~$10k annually
Sagis Diagnostics migrated from Azure SQL to a Databricks-based enterprise data warehouse, unifying diagnostics and billing data in a single governed environment. The Medallion Architecture (Bronze/Silver/Gold) ensured data quality, scalability, and traceability.
All BI, AI, and ETL workflows were consolidated into Databricks, enabling transparent data management and efficient collaboration across teams. AutoML pipeline enabled triage denial prediction model with automated training for underutilized claims prediction.
The result is an AI-ready Lakehouse that accelerates reporting, enhances visibility, and reduces pay-per-use compute costs by nearly 50%, establishing a future-ready foundation for predictive analytics and LLM-powered BI in healthcare.
Key outcomes included:
- Integration of 21 data sources into a governed Medallion Architecture (Bronze/Silver/Gold).
- Consolidation of all data from two vendors into one governed platform with real-time CDC ingestion.
- Deployment of 2 dashboards, providing functional SQL code for 20 dashboards and widgets, and creation of 3 Genie spaces for self-service BI.
- ML-ready feature store with automated denial prediction model, ready for evaluation and deployment.
- Full data observability and compliance readiness with automated lineage tracking and schema documentation for future AI-driven analytics.
- ~50% compute cost reduction through optimized, pay-per-use architecture
Additional Value Delivered (Client Feedback):
- Fast onboarding: Seamless adoption of the Databricks environment with guided support.
- Proactive documentation: All pipelines and jobs were fully documented without the need for client follow-ups.
- Engineering excellence: High technical quality, structured communication, and timely delivery ensured a smooth migration and reliable system performance.
KPIs
compute cost reduction through optimized, pay-per-use architecture
Integrated data sources unified under Medallion Architecture
Genie spaces deployed for self-service BI.
Why Sagis Diagnostics Chose Dataforest as Their Software Development Partner
“Dataforest got us off the ground really quickly, and they even provided documentation without us having to ask for it — that was really impressive.”
Medical Lab Achieves 50% Compute Savings via Databricks Migration
How we provide data integration solutions
Latest publications
All publicationsLatest publications
All publicationsWe’d love to hear from you
Share project details, like scope or challenges. We'll review and follow up with next steps.














.webp)

.webp)
.webp)
.webp)