Legacy ETL to Azure Modernization Playbook
Migrate legacy ETL tools (Informatica, SSIS, DataStage, Pentaho) to Azure Data Factory + Microsoft Fabric + PySpark with parity validation and incremental rollout.
Tell us what you're building
We use these answers to surface the prompts, skills, and MCP configs that fit your stack — and to substitute stack values like {{database}} into the prompts you copy. Content (PRDs, code, etc.) stays in your repo. Everything you enter here is stored in your browser — nothing is sent to a server.
Steps you'll go through
- 01
Pipeline Inventory and Audit
6-10 weeksInventory every legacy mapping / workflow / package — active vs dead, who consumes it, complexity, criticality, owner. Pipeline counts always come in higher than expected.
- 02
Strategy Decision (per Pipeline Category)
2-3 weeksNot every pipeline needs the same approach. Categorize each into rewrite, tool-assisted translation, redesign, retire, or replace-with-platform-feature.
- 03
Target Architecture and Framework
4-6 weeksStand up the Azure-side architecture and the metadata-driven pipeline framework before migrating any specific pipeline. Reuses Phases 2-5 of the Azure Data Platform Modernization playbook.
- 04
Behavior Parity Test Framework
4-6 weeksLock down legacy pipeline behavior as ground truth — including the quirks (rounding, null handling, date / sort / aggregation behavior). The new pipelines must match these to keep trust.
- 05
Pilot Migration (1-3 Representative Pipelines)
6-10 weeksValidate the framework end-to-end on real pipelines: one simple, one medium, one complex. The pilot is where you discover what the framework didn't anticipate.
- 06
Mass Migration
1-3 quartersMigrate the bulk of pipelines in batches using the established patterns. Common batching: by source system, by consumer, by complexity, or by owner.
- 07
Cutover and Decommission Legacy ETL
4-8 weeksDisable legacy pipelines, verify zero traffic for 30+ days, decommission infrastructure, cancel licenses, archive legacy code for compliance and reference.
- 08
Optimize and Stabilize
Ongoing — especially the first 6 months post-migrationTune performance, rationalize cost, build operational maturity, refine patterns. The migration is over — the platform owns the work now.