Keep It Right

As the pace of change and requirement for rapid time to market continue to increase, software companies have less and less time to envision, design, build and test complex solutions. The Software as a Service model has provided a mechanism and set of expectations that the applications, delivered via this model, will be continuously evolving. Due to these dynamics, the software industry is rapidly moving away from a traditional Waterfall approach to more Agile-ish types of processes where short iterations with scrum teams deliver smaller sub-sets of functionality to the end user in sprints. This topic warrants its own post, but lays the foundation for one of the key challenges in this model: don’t break what already works.

In many circles, “keep it right" is referred to as regression testing. Traditionally, tools that allow for record and playback are used to perform this function. These tools require significant domain expertise. They are also time consuming to maintain and prone to becoming obsolete due to the rapidly changing landscape of an application being developed in an Agile-ish fashion. The routine is to write hundreds or thousands of use cases, implement the use cases via a record-and-playback tool, re-write changed use cases, re-record and then start over again (endless loop). My experience has proven that these scripts will quickly become out of date and useless due to the time and cost involved in maintaining them. This results in 100 percent manual regression testing. “Keep it right” gets very expensive.

The rapid release process by default lowers some of the risk. Smaller sets of changes deployed more quickly do limit the amount of things that can be broken. That’s not enough when dealing with highly complex applications that have millions of users who are expecting the applications to work. There is a better way, which I refer to as “Log & Compare." The fundamental shift is that “Log & Compare” removes the need for writing and maintaining use cases, manually recording them, and then fixing broken recordings.

The most fundamental element of this new approach is that all new functionality resides behind a switch and is turned off until it meets the specified requirements. The basic premise is that an HTTP request is routed to the production app server as well as a pre-release app server. The full HTTP request content is written to a log file on both the production and pre-release servers. Processing occurs and a response is generated. The response is logged on the production and pre-release servers. The response is then returned to the production client and discarded on the pre-release environment.

You now have a log file in production that represents all inbound and outbound HTTP traffic. You also have this same set of log files in your pre-release environment, which contains the new code. Now, a simple comparison can be made between these two sets of files. Theoretically, these files should be identical. Only new functionality that has the activation switch turned on should produce anomalies. Prior to production activation, all new code should be turned off in the pre-release environment, allowing for a pure regression "Log & Compare" process.

“Keep it right” is one of the most important aspects of deploying new code to users. If it worked yesterday, it better work today. Decrease the amount of new code being deployed by reducing the time between deployments, and then test that new code daily against what’s happening in production. You will eliminate significant cost, greatly improve your odds of “keeping it right," and most importantly, keep your customers loving your software.