In order to more fully understand this reality, we must take into account other dimensions of a broader reality.
- John Archibald Wheeler
I explored the architectural styles we can use to stay nimble so we can react to customer and market needs in “Architect for a Tomorrow” and further in “Multi-Style Architecture for an API-centric World”. My focus was squarely on functional qualities, though I did mention the use of another dimension to fulfill system qualities such as scalability to keep up with the market’s love of your product – so you are not loved to death.
According to Andre B Bondi in a 2000 WOSP paper, “scalability is the capability of a system, network or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth”.
In “Multi-Style Architecture for an API-centric World”, I described how architecture layering could be coupled with functional segmentation to help you organize your architecture in a way to allow for rapid functional evolution. These same dimensions can also be applied for technical scaling to allow for independent expansion to support demand on those specific componentized capabilities so we can keep up with proportionate demands on the overall solution.
We can add another dimension to that model to refine our architecture to one that can be scaled to near infinity by splitting our systems into similar contexts such as marketing channels, employer groups, users or other data grouping. This is also called horizontal data partitioning (along “rows” rather than “columns”) or sharding.
One of the best models I have seen that describes these dimensions in a model is from “The Art of Scalability”. This is a great book that rightly addresses all aspects of scalable companies, including, but not limited to technical scaling of their solutions. The book does a great job of explaining how important it is to consider organization and processes as well. While it does not provide any groundbreaking innovations for experienced technology practitioners, its greatest enduring set of contributions are the insights provided by combining several best practices into the metaphor of the “AKF Scale Cube” (AKF Partners – Abbot, Keeven & Fisher Partners).
This model explains how we can extend the well-proven layered technical architecture scaling by augmenting it with both functional decomposition and sharding to provide three dimensions of availability and scalability tuning:
- Technical Architectural Layering (X-Axis )
- Functional Decomposition Segmentation – Componentization to Modules & Microservices (Y-Axis)
- Horizontal Data Partitioning - Shards (Z-Axis)
Figure 1 Scale cube [derived] from "The Art of Scalability"
Dimensions of Architecture
Technical Architecture Layering
Technical Application Layering represents the set of application patterns that support separation of concerns such as presentation, application logic and data storage. This separation includes externalizing protocol agents such as HTTP agents (web servers, IP Load Balancers, etc). Examples include three-tiered architectures used in first generation web applications. In this model, scale is achieved by simple load balancing across cloned hardware that host components such as web servers or application containers (application servers, autonomous Java applications, etc).
Figure 2 X-Axis –Horizontal Technical duplication - Traditional 3-Tier Web Application with component level cloning – Load Balancing to Clones from "The Art of Scalability"
Traditional monolithic web applications often follow this pattern, but have limited scale because of the coarse tuning restrictions of shared resources inherent in high functional coupling and a single data store.
Great strides have been made in recent years that have served to extend the scale capabilities and cost efficiencies of such solutions. The virtualization of commuting, network and storage resources along with the availability of highly efficient (and cost effective) cache mechanisms combined with local data partitioning for storage and access scaling have contributed to the longevity of many systems. These advances will not avoid the inevitable limitations if used in isolation, but can contribute to ultimate longevity when used in concert with other scaling and availability techniques.
Functional Segmentation – Components: Modules and Microservices
Functional Segmentation (decomposition) is the complete encapsulation of functional components that provide a single business capability. These capabilities are separated into modules or autonomous microservices shaped by bounded contexts – meaning there is no cross-sharing of models or data except through a published API. Scale is achieved by intelligently routing requests for specific “things” (components) that are independently managed.
Figure 3 Y-Axis - Functional Decomposition - Components: Modules and Microservices – Routing to Things
Each component, module or microservice may in turn independently follow the relevant Technical Architectural Layering patterns to achieve a much higher fidelity of tuning based upon volume and load characteristics by function. In this approach, scaling by Technical Architecture Layering is greatly enhanced by combining Y-Axis scaling with X-Axis scaling so that we have smart routing to the “things” that are themselves load balanced clones:
Figure 4 Y-Axis Scaling with X-Axis Scaling - Smart Routing to Things that are Load Balanced Clones
This independent management by function allows for right sizing of clones on the X-Axis based upon unique load characteristics, but even more significantly, it also allows for right-choice selection of the application stack technology to match the specific need. For instance, all logic does not have to be Java JEE and all data management/storage does not have to be in an RDBM system – technology choices may be made based on the unique nature of the function and the skills of the team accountable for the function.
Horizontal Data Partitioning - Shards
Horizontal Data Partitioning is a data-driven system management approach that is the value-based, horizontal or row-level separation of data to allow for storage (and ultimately, the related computational work) on separate servers for distributed load. The separation is typically based on the value of key fields such as marketing channels, groups/clients, location, or even a consistent hashing model for pure load distribution. Scale is achieved by intelligently routing requests for specific contexts (shards) that are independently managed.
Figure 5 Z-Axis - Shards - Horizontal Data Partitioning - Routing to Context
Each shard may also independently follow the relevant Technical Architectural Layering patterns to provide tuning that is based upon the volume and load characteristics of the shard context – some groupings may be larger than others. In this approach, scaling by sizing the clones on the Technical Architecture Layering is greatly enhanced by combining Z-Axis scaling with X-Axis scaling:
Figure 6 Pods - Z-Axis Scaling with X-Axis Scaling – Smart Routing to Context that are Load Balanced Clones
This independent management allows right sizing (the number and depth of clones) and unique tuning of application stack technology to match the specific load characteristics of the shard context, even though each shard is a logical clone of all others. This model is often referred to as “Pods” when the technical architecture supporting the application logic that is operating on the shard is also included in each independent deployment – that is the stack along the X-axis. Not only is scale achieved in terms of growth and performance, but also this level of isolation limits impact of any disruption (planned or not) to only one shard.
Volume = Architecture3
Finally, all three dimensions can be combined in various routing priorities to achieve near-infinite scaling. In my example, I route to Shards of contexts scoped by a “marketing channel" which are further routed to Functionally Segmented Components that are load balanced across cloned components. This partitioning and segmentation creates the opportunity to use “shared nothing architecture” for virtually unlimited scale, extreme high-availability and low-risk continuous delivery.
Figure 7 Architecture Cubed - Routing to Shards with Routing to Components that are Load Balanced Clones
In this example, instances of the functional decomposition (Y-axis) are grouped within the shard context (Z-axis) based on an obvious data affinity around a “marketing channel” to imply computational grouping with the data (move compute to the data). However, this does not have to be the only model. Depending up on your specific needs, moving compute to data may be the right answer, however, different routing orders could be applied to route to functions that then route to retrieve data at shards when a “move data to compute” is the right answer such as solving problems of global directories for a global authentication model or a global look-up coordinating a distributed search.
The Scale Cube helps us keep the critical dimensions of system scale in mind as we search for solutions and make decisions. As with many architectural metaphors and pattern languages, the Scale Cube provides a great framework to work through options, but you must find the specific answer for your specific business problem.
These concepts couple very well with modern infrastructure platforms such as public or private clouds that open new opportunities with highly elastic computational resource management (right-sizing based on granular and even cyclical needs). Using the Scale Cube in concert with these platforms opens the door to cost effective scaling capabilities without sacrificing our other critical system qualities in a way that is not only economically responsible for our business, but game changing.