Growth Beyond Minimal Viable Platform

Page last updated:

Organic growth from the Minimal Viable Platform (MVP) to a more complex system is expected and encouraged. The next sections will offer considerations on growing from a lower to a higher level of fault tolerant architecture progressively.

Single Cluster Stretched Across Three Racks

Across Racks View a larger version of this diagram.

Key items that change are:

  • Two more physical racks are added to the original, establishing a three rack / three AZ configuration.

The maximum number of hosts per cluster for this architecture is limited to 64 hosts. The maximum number of hosts in this model is 192 hosts and 40,000 powered-on VMs.

Unchanged items as compared to MVP:

  • Management functions do not change location

By adding a three physical rack alignment, several HA features from different parts of the configuration align. Each pair of hosts in a rack is designated a vSAN fault domain. Fault Domains enable you to protect shared storage against rack or chassis failure if your vSAN cluster spans across multiple racks or blade server chassis. You can create fault domains and add one or more hosts to each fault domain. Increasing the high availability of shared storage is a key contributor to improved platform resiliency in the event of a loss or failure. At this stage, we add in a crucial improvement to the shared storage HA solution.

Then, vSphere Host Groups are defined to align to the three racks of hosts which then can be consumed by TAS for VMs/TKGI as an Availability Zone. The Host Groups feature is a great way to pin the application containers capacity usage to a set of compute & memory aggregate that is cluster-aligned without having to equal the entire cluster regardless of size while also being akin to a Resource Pool in the cluster without having to sub-segment the available compute & memory using shares.

If the intent is to dedicate the entire installation to application containers, then Host Groups are less valuable as you can just set the entire cluster as an AZ. But, if there is the desire to blend application containers with any other kind of workload, even two different kinds of application container technologies at once, Host Groups fit well to segment capacity without the constraint of resource shares.

Migration from the original rack to using the added racks is possible with TAS for VMs by modifying vSphere host group definition and performing a bosh recreate operation. The PaaS will deploy all non-singleton components evenly across all AZs as long as multiples selected are evenly divisible by the number of AZs used. Be certain to review how many non-singleton jobs are defined, for example, Gorouters, and use a count evenly divisible by AZs, three in this model.

Multiple Cluster Design

TA4V Multiple Cluster Design clusters View a larger version of this diagram.

This third stage design is production-ready with ample high availability and redundant capacity for full-time Enterprise use. This size is appropriate for combined TKGI and TAS for VMs deployments, if desired, using host groups (or Resource Pools with/without host groups) to organize components of each against cluster capacity. There is no practical reason to scale horizontally (more clusters/AZs) from this model as the upper limits of aggregate compute & memory far exceed the capability of the software components that make it up. If ture exascale capacity is desired, a better approach is to deploy three of these models in parallel as opposed to one, bigger one.

Key items that change at this point in growth:

  • High Availability (HA) of the IaaS layer is significantly improved with the addition of new vSAN domains and increased host capacity.
  • High Availability (HA) of the PaaS layer is improved with a separation of management from payload and a stronger alignment of PaaS HA (AZs) to IaaS HA constructs, including vSphere HA and vSAN fault tolerance.
  • A new cluster is deployed strictly for management functions, isolating them for resource use/abuse in the payload clusters and also isolating the HA that supports them.
  • A second and third payload cluster are deployed (for a total of four clusters) yielding three clusters strictly for payloads generated by the system.
  • All non-singleton PaaS components are deployed in multiples of three to take advantage of three Availability Zones. HA features of both the IaaS and PaaS are aligned and in effect for the AZs.

Migration from the MVP1 model to this model directly will be a challenge, as all management VMs from vSphere, NSX-T Data Center and PaaS will be migrated to a dedicated cluster and not blended in with any of the app containers created by TKGI and/or TAS for VMs. A fresh install of the PaaS layer (at least) makes this model the easiest to install, as it will give the opportunity to place management components in the proper cluster and evacuate any pre-existing clusters of anything other than payload (app) components.

Migrating from the Single Cluster/Three Racks (MVP2) model to this one will also include challenges with relocating management functions from the (now) payload cluster to the new management cluster. This can be accomplished as long as there are shared networks and storage between the current management location and the new management cluster to facilitate vMotion and storage-vMotions of them. Or, a consolidation of all management functions onto a single rack’s hosts prior to addition of new capacity could be accomplished. The approach is to evacuate all management functions from the existing racks targeted for payloads in favor of placement on the new management cluster however best that is accomplished in your environment.

TKGI Multiple Cluster Design

TKGI clusters View a larger version of this diagram.