Step 1 of 5
Free consultation
It's a good time to get info about each other, share values, and discuss your project in detail. We will advise you on a solution and help you understand if we are a perfect match for you.
Sagis Diagnostics, a leading U.S. pathology lab, replaced its fragmented Azure SQL setup with a unified Databricks Lakehouse built by Dataforest. The migration consolidated 21 data sources, automated analytics, and ensured HIPAA compliance — delivering full data transparency, pay-per-use efficiency, and a ~50% reduction in compute costs.
~
50
%
compute cost reduction through optimized architecture
21
Integrated data sources unified under Medallion Architecture
3
Genie spaces deployed for self-service BI
Python
Spark
Azure SQL
Databricks
Genie (LLM)
THE CHALLENGE
Sagis Diagnostics needed to migrate from a legacy Azure SQL Server environment to Databricks to unify diagnostics and billing data, enable advanced analytics, and ensure compliance with healthcare data standards.
The Azure system covered only 20–25% of the required functionality, lacked scalability for growing data volumes, and cost around $20,000 per year while utilizing only a small portion of its capacity.
The new solution also needed to support long-term data growth, improve observability, and consolidate all BI, AI, and compliance workflows into a single Lakehouse platform.
The previous environment relied on static SQL scripts that lacked automation and consistency across data workflows. This limited scalability and increased maintenance overhead.
Sagis Diagnostics processes sensitive patient data, requiring strict compliance with HIPAA and healthcare data protection standards.
Before the migration, the client lacked visibility into how data was transformed, validated, and consumed, making it difficult to track issues or verify accuracy. Limited visibility from ingestion to BI/AI output incurs blind spots on freshness, schema drift, quality regressions, and downstream blast radius.
At the start of the project, there was no clear documentation of the existing Azure setup. The client also faced limited access rights, missing connectors, and Databricks platform updates that required continuous adaptation and consultation with Databricks support.
THE SOLUTION
We successfully migrated all data and pipelines from Azure SQL to Databricks-based enterprise data warehouse. Implemented a Medallion Architecture (Bronze/Silver/Gold), and rewrote legacy SQL scripts into automated, production-ready Databricks jobs. The new platform provides governed data storage, real-time observability, and cost-efficient compute scaling.
All legacy SQL scripts were converted into fully automated Databricks jobs with error handling, scheduling, and integration into the business logic layer.
Key Deliverables:
Patient data used for AI/BI and ML training was fully anonymized to maintain HIPAA compliance while enabling advanced analytics.
Dataforest implemented automated data lineage and monitoring dashboards within Databricks, featuring real-time data refresh tracking, anomaly detection, and event-based alerts. This provided full transparency, faster troubleshooting, and greater confidence in data reliability. Clear runbooks and domain-level SLOs ensured faster incident resolution, safer change management, and reliable, compliant analytics.
Our engineers reconstructed undocumented logic by analyzing legacy SQL patterns and rebuilding missing connectors. Access management was standardized, and an adaptive update policy was implemented to synchronize with Databricks’ frequent releases. Continuous documentation and proactive communication ensured smooth handover and maintainability.
THE RESULT
Sagis Diagnostics migrated from Azure SQL to a Databricks-based enterprise data warehouse, unifying diagnostics and billing data in a single governed environment. The Medallion Architecture (Bronze/Silver/Gold) ensured data quality, scalability, and traceability.
All BI, AI, and ETL workflows were consolidated into Databricks, enabling transparent data management and efficient collaboration across teams. AutoML pipeline enabled triage denial prediction model with automated training for underutilized claims prediction.
The result is an AI-ready Lakehouse that accelerates reporting, enhances visibility, and reduces pay-per-use compute costs by nearly 50%, establishing a future-ready foundation for predictive analytics and LLM-powered BI in healthcare.
compute cost reduction through optimized, pay-per-use architecture
Integrated data sources unified under Medallion Architecture
Genie spaces deployed for self-service BI.
“Dataforest got us off the ground really quickly, and they even provided documentation without us having to ask for it — that was really impressive.”
Share project details, like scope or challenges. We'll review and follow up with next steps.