Technical teams need to plan and test their ability to rollback changes that are negatively impacting users. This needs to be a formal part of your process - sometimes that means falling forward, fixing the broken functionality in production; sometimes that means falling back and reverting to a known state where the application was functioning properly.
Let's be honest - none of us are perfect all of the time. Occasionally, we are going to miss something and move code into the production environment that ends up breaking things and making life miserable for the folks actually using the application.
What have you done to prepare for this moment?
- Does your implementation plan include steps to proactively monitor the system as changes are moved into production?
- Does your implementation plan allow time for you to look at patterns of application activity to validate that you are not seeing patterns that would indicate that something is broken within the application?
- Does your implementation plan include validation steps to ensure that the application is acting accordingly and that the new/changed functionality is working?
- Does your implementation plan identify risk points and fallback strategies?
- Have you tested your implementation plan?
- Have you tested your fallback plan?
Yes, planning your implementations takes time. Yes, planning validation steps and inserting them into the overall implementation plan takes additional time. Yes, testing your implementation plan chews up more time. And, yes, testing your fallback plan takes even more time. That all being true, failing to do any of this means that your operating by the seat of your pants and your recovery time when things go bad will hurt you worse.
- When your application fails - the reputation with your customers takes a hit.
- The longer the outage increases the risk that your customers will walk away from you and begin to use a competing application?
- Will an outage impact your revenue stream?
Not to hit Microsoft too hard - but, in April they released a patch to Windows 8.1 that was required before any other updates could be applied. This impacted retail customers as well as corporate customers and was not fixed until early May. I'm sure that they ran their standard quality processes against the code prior to the initial release in April - but something obviously got by the testers, and it wasn't a minor issue. It took time for Microsoft to hit the reset button, but in the meantime their customers were in an uproar.
I'm going to walk softly here because as a developer it could happen to me or the teams that I lead. I'm not trying to single out Microsoft - anyone that writes software has had to recover from a misstep when loading code into production or sending out a new release. What I'm advocating for is that you implement solid processes that minimize the risk to your customers, your team and your organization.
What are you doing to mitigate the risk of an implementation/software release going bad?
Tags: Development; Programming; Programming Languages; Change; Decisions; Decision Making; Project Management; SDLC; Lifecycle;
For more information on David L. Collison: LinkedIn Profile
No comments:
Post a Comment