Choosing the right migration strategy: beyond lift and shift
Every application fits one of these paths
Rehost
Lift & ShiftMove applications as-is to new infrastructure. No code changes. Fastest path but least optimized.
Use when
Datacenter exit deadlines, simple apps, quick wins
Avoid when
Apps that need architectural improvement to perform in the cloud
Re-platform
Lift & ReshapeMake targeted optimizations during migration — managed databases, container orchestration, cloud-native storage.
Use when
Apps that mostly work but need cloud-specific tuning
Avoid when
Tightly coupled monoliths that need deeper restructuring
Refactor
RearchitectRestructure the application — break monoliths into microservices, adopt event-driven patterns, go serverless.
Use when
Core business applications you'll maintain for 5+ years
Avoid when
Apps nearing end-of-life or with limited strategic value
Rebuild
Start FreshRewrite from scratch using cloud-native technologies. Maximum flexibility, maximum investment.
Use when
Legacy tech stack is a dead end, business rules need rethinking
Avoid when
When the existing codebase is salvageable — don't rewrite for ego
Replace
Buy vs. BuildSwap the custom application for a SaaS or off-the-shelf solution. No migration — just adoption.
Use when
Commodity capabilities better served by existing products
Avoid when
Core differentiating functionality that defines your business
Map your context to a strategy
Real-world strategy decisions
Legacy Java monolith (15 years old)
1 core app, 200K+ lines
Decision
Phased refactor — strangler fig pattern over 8 months
Why
Too critical to rewrite, too coupled to just rehost. Incremental extraction let the team learn microservices while keeping the system running.
Fleet of 40 internal tools
40 small apps, mixed tech stacks
Decision
Rehost 30, replace 8 with SaaS, refactor 2
Why
Not every app deserves the same investment. Portfolio analysis saved 6 months of unnecessary refactoring.
E-commerce platform migration
1 platform, high traffic, seasonal peaks
Decision
Re-platform to containers with auto-scaling
Why
The app was architecturally sound. It just needed managed infrastructure, container orchestration, and elastic scaling.
Building a migration risk assessment matrix
A migration risk assessment matrix scores each application across five dimensions before you select a strategy. First, dependency complexity — how many upstream and downstream systems does this application integrate with? An application with two API consumers is a different risk profile from one with 40 downstream dependencies. Map every integration point, classify each as synchronous or asynchronous, and identify which ones require coordinated cutover versus which can tolerate temporary dual-write patterns.
Second, data sensitivity — does the application handle PII, financial records, healthcare data, or other regulated information? Data sensitivity affects your migration approach directly. Regulated data may require encryption in transit during migration, geographic restrictions on intermediate storage, and chain-of-custody documentation for audit purposes. A rehost of an application handling payment card data has compliance requirements that a rehost of an internal wiki does not.
Third, availability requirements — what is the maximum acceptable downtime during migration? Zero-downtime migrations require parallel-run patterns and incremental traffic shifting, which increase complexity and cost. If the business can tolerate a four-hour maintenance window, a simpler cutover approach becomes viable. Fourth, team expertise — does the team responsible for this application have experience with the target platform? A team migrating to Kubernetes for the first time with a critical production application is a higher risk than a team that has already operated three services on the platform. Fifth, rollback feasibility — if the migration fails, how quickly and completely can you revert? Applications with stateless architectures roll back easily. Applications with data migrations, schema changes, or external integration changes may not be reversible without data loss.
Score each dimension on a 1-to-5 scale and multiply by a weight reflecting your organization's priorities. The total risk score determines not just which strategy to use but how much investment in testing, parallel runs, and rollback infrastructure each migration deserves. High-risk applications get dedicated migration teams, extensive rehearsals, and conservative cutover plans. Low-risk applications can move in batches with lighter oversight.
Parallel-run validation patterns
Shadow traffic (read path validation)
Mirror production read traffic to the new system while continuing to serve responses from the old system. Compare response payloads between old and new for every request. This pattern is safe because the new system's responses are discarded — users always get results from the proven system. It is ideal for validating that the new system produces identical results under real production load and data patterns. Use a traffic mirroring proxy or service mesh feature (Istio mirror policy, Envoy shadow routing) to duplicate requests without impacting the primary request path. Log discrepancies and fix them before moving to write-path validation.
Dual-write with reconciliation (write path validation)
For systems that process writes, implement dual-write — send every write operation to both the old and new systems, with the old system remaining the system of record. Run a continuous reconciliation process that compares the state of both systems and reports divergences. This is more complex than shadow reads because write operations have side effects, ordering matters, and failure handling must account for partial success scenarios. Implement the dual-write at the application layer, not the database layer, so you can handle failures gracefully — if the write to the new system fails, log the failure and continue, rather than failing the user's request.
Percentage-based traffic shifting
Once parallel runs confirm functional correctness, shift live traffic incrementally — 1 percent, then 5 percent, then 25 percent, then 50 percent, then 100 percent. At each stage, monitor error rates, latency percentiles, and business metrics for a minimum stabilization period (typically 24 to 48 hours) before increasing the percentage. Use feature flags or weighted routing in your load balancer to control the split. Define automatic rollback triggers: if error rates exceed the old system's baseline by more than a defined threshold, traffic automatically shifts back. This approach catches load-dependent issues that shadow testing at full production volume might miss because the new system is actually serving real users.
Data migration integrity verification
Implement verification at three levels. Row-count verification is the most basic — confirm that the number of records in the source matches the destination for every table. This catches bulk failures like dropped batches or truncated imports but misses data corruption within individual records. Checksum verification computes a hash (SHA-256 or similar) of each record's content in both source and destination and compares them. This catches field-level corruption, encoding issues, and transformation errors. For large datasets, compute checksums in parallel batches rather than sequentially. Semantic verification validates that the data makes business sense in its new context — referential integrity is maintained, computed fields produce correct results, and business rules are satisfied. This layer catches issues that survive checksum verification, like timezone conversion errors where the bits are technically correct but the business meaning has changed.
For migrations involving schema transformations (common in re-platform and refactor strategies), build a dedicated data validation suite that runs before cutover. This suite should include known-answer tests — pre-computed expected results for specific queries that exercise complex joins, aggregations, and business logic. If the quarterly revenue report produces different numbers on the new system than the old system, you have a data integrity issue that needs resolution before cutover, not after.
Account for data that changes during migration. Unless you can freeze writes to the source system during migration (rarely feasible for production systems), you need a change data capture (CDC) strategy. Capture all writes to the source system during the migration window and replay them against the destination after the bulk migration completes. Tools like Debezium, AWS DMS, and Oracle GoldenGate handle this automatically, but verify that the CDC stream captures deletes and updates, not just inserts. Run your integrity verification after CDC replay completes, not after the bulk migration, to ensure the final synchronized state is correct.
Rollback planning for each migration strategy
Rehost rollback: DNS and routing reversion
Rehost migrations are the simplest to roll back because the application code has not changed. Keep the source environment running in a read-only or standby state for at least two weeks after cutover. Rollback is a DNS change or load balancer configuration update that routes traffic back to the original infrastructure. The primary risk is data divergence — any data written to the new environment after cutover needs to be replicated back to the source if you roll back. Plan for this by maintaining database replication in both directions during the stabilization period.
Re-platform rollback: configuration and dependency mapping
Re-platform migrations introduce new dependencies (managed databases, container orchestration, cloud-native services) that complicate rollback. If you migrated from a self-managed PostgreSQL instance to Amazon RDS, rolling back means having a synchronized self-managed instance ready. Document every platform dependency change and maintain the original dependency stack in a warm standby state. Test the rollback path explicitly — do not assume that reverting configuration changes is sufficient. Platform-level changes often have subtle side effects on connection pooling, timeout behavior, and failover mechanics that only surface under production load.
Refactor rollback: the strangler fig safety net
Refactored applications have different code, different architectures, and often different data models — making clean rollback difficult. The strangler fig pattern provides a built-in rollback mechanism: because the old system continues to handle traffic for un-migrated functionality, you can reroute traffic for migrated features back to the old system by updating the routing layer. This works only if you maintain the old system's capability for migrated features during the stabilization period. Set a clear decommission date for old system capabilities — typically 30 days after the last feature migration — and do not remove old code until that date passes without incident.
Rebuild rollback: accept the constraint
Rebuild migrations are the hardest to roll back because the new system shares little with the old one. In practice, rollback for a rebuild means continuing to operate the old system — which means the old system must remain operational throughout the migration period. Budget for this dual-operation cost explicitly. The rollback plan for a rebuild is less about technical reversion and more about operational continuity: if the rebuild fails acceptance criteria, the old system continues serving production while the team addresses the gaps. Define clear go/no-go criteria for each phase of the rebuild cutover, and do not decommission the old system until the new system has operated successfully in production for a minimum stabilization period — typically 60 to 90 days for business-critical applications.
The best migration strategy is a portfolio strategy.
Don't apply one approach to every application. Assess each workload independently, map it to the right strategy, and sequence your migration to deliver quick wins first while investing in the applications that matter most long-term.