Tech migration is hard

May 2018

Some Background

Here's a scenario that keeps playing out in the Technology landscape.
Team A builds a new API that solves Problem X. Now that the API is built, we just need Team B, C, D, and E to migrate to the new API.
But wait! Team C is running low on resources and really doesn't have time to do the migration.
Hold On! Team D never accounted for the migration in their Quarterly Release Planning. The earliest they can fit it in is Q4 of next year.
Wait! now you hear from Team Member Y that noone can go live with the new features until Team E re-architects their client-facing application and middleware.
Meanwhile the benefits of the new API are just sitting there without anyone taking advantage of them.

Classic Migration Patterns

Let's talk through some of the options that we have.

Number 1 - Let's all hold hands and jump in the pool

This classic pattern (a.k.a. LAHHAJITP) is the most obvious, simplest, and riskiest.
Basically on a specific date everyone simply migrates from API A to API B.

Pros

No one uses the deprecated system after the cutover date.
Benefits of the new system are live for everyone immediately.
Doesn't require the application to be architected in a specific way.

Cons

A ton of coordination is needed to move the changes through all 3 environments smoothly. Case and point a recent API Gateway Upgrade. (Notice how the page has 281 edits spanning Sept 21st 2017 to Apr 12th 2018, 6 months to upgrade one detail!!)
A ton of risk. Especially if there is any database migration involved. If anything goes wrong there can be some extremely painful rollbacks + TM's up all night. Case and point the recent TSB fiasco.
If something not obvious is broken, it will be broken for everyone, e.g. no guinea pig / early adopters to weed out issues in Prod.

Number 2 - Piece by Piece

This pattern (a.k.a. Strangler Pattern) is a popular choice.
Basically the old system and the new system live side-by-side and all consumers slowly migrate as they have bandwidth.

Pros

Much safer, way less risk, and easier to scale up the usage of the new system at a relaxed pace.
Coordinating the migration is way easier, everyone can migrate at their own pace.

Cons

No sense of urgency for consumers to migrate. As long as the old system still works, most teams will not prioritize the migration over other features.
It is very difficult and costly to maintain 2 systems side-by-side, plus a lot of context switching for the team that is maintaining both instances.
The application has to be architected in a way in which the old and new can run side-by-side. This typically involves crazy database hacks, and threatens the stability of the new system.

Number 3 - Full Service

This pattern (a.k.a. FS) is a new concept that may be gaining some steam. Basically the folks that built the system, go into the codebases of the consumers and migrate for them.

Pros

Safe and low risk because the migration team actually understands the API they are migrating to. (Pull requests are reviewed by the SME's to avoid totally breaking the other systems, plus testing in Test and Beta)
Coordinating the migration is in the hands of the API builders, this allows them to go live with as many teams as they think they can safely handle.
The migration is done by the team that literally knows the new API the best. There's no ambiguity about what different fields do, the SME's are the builders.
Timelines are no longer relevant, Quarterly Release Planning becomes less of a confusing information overload.

Cons

More work for the team that built the API

The Argument for Number 3 - Full Service

I've seen a ton of timeline mismatch over the last couple of years. It's extremely difficult to keep a handle on where everyone in the enterprise is at.
Here's an obligatory medium post about a form of the Full Service strategy that is being used at Google https://medium.freecodecamp.org/how-google-builds-a-web-framework-5eeddd691dea Here's a key paragraph:

You break it, you fix it When AngularDart authors want to introduce a breaking change, they have to go and fix it for their users. Since everything at Google lives in a single repo, it’s trivial to find out whom they’re breaking, and they can start fixing right away. Any breaking change to AngularDart also includes all the fixes to that change in all the Google apps that depend on it. So the breakage and the fix go into the repo simultaneously and — of course — after proper code review by all affected parties. Let’s give a concrete example. When someone from the AngularDart team makes a change that affects code in the AdWords app, they go to that app’s source code and fix it. They can run AdWords’ existing tests in the process, and they can add new ones. Then, they put all of that into their change list and ask for review. Since their change list touches code in both the AngularDart repo and the AdWords repo, the system automatically requires code review approval from both of those teams. Only then can the change be submitted.

For the most part, all code lives in github or some easily accessible location. We have engineers, that can write code, just waiting for teams to migrate to their API, why can't they just do the migration themselves? This will really help us adapt a more open source mindset, and allow teams to focus on what they do best.