vSphere Reference Architecture

Page last updated:

This guide provides reference architectures for Pivotal Cloud Foundry (PCF), including Pivotal Application Service (PAS) and Pivotal Container Service (PKS), on vSphere.

Overview

Pivotal validates the reference architectures described in this topic against multiple production-grade usage scenarios. These designs are sized for up to 1500 app instances.

This document does not replace the installation instructions provided in the PCF on vSphere Requirements topic. Instead, it gives examples of how to apply those instructions to real-world production environments.

PCF Products Validated Version
vSphere 6.5
VMware NSX-T 2.1
Pivotal Cloud Foundry Operations Manager 2.0, 2.1
PAS 2.0, 2.1
PKS 1.0

Base vSphere Reference Architecture

This reference architecture includes VMware vSphere and NSX-T, a software-defined network virtualization platform that runs on VMware ESXi virtual hosts.

To use all features listed in this topic, you must have Advanced or above licensing from VMware for NSX-T. If you are not using NSX-T, see Non-SDN Deployments.

This reference design supports capacity growth at the vSphere and PCF levels as well as installation security through the NSX-T firewall. It allocates a minimum of three or more servers to each cluster, as recommended by vSphere, and spreads PAS components across three clusters for High Availability.

If you want to deploy PCF without NSX-T, see vSphere Reference Architecture, which includes NSX-V design considerations.

Non-SDN Deployments

This sections provides information about VLAN and container-to-container networking considerations for PCF deployments that do not rely on software-defined networking (SDN).

VLAN Considerations

Without an SDN environment, you draw network address space and broadcast domains from the greater data center pool. For this approach, you need to deploy PAS and PKS in a particular alignment.

Load balancing

View a larger version of this diagram

If you are designing a non-SDN environment, ensure that VLANs and address space for PAS and PKS do not overlap. In addition, keep in mind that non-SDN PCF environments have the following characteristics:

  • Load balancing is performed external to the PCF installation.
  • Client SSL termination needs to happen either at the load balancer, Gorouters, or both. Plan for certificates to accomplish this.
  • Firewall layer is accomplished external to the PCF installation.

Container-to-Container Networking and CNI Considerations

Changing your container-to-container networking strategy after deployment is possible. However, with NSX-T, you need both the VMware NSX-T Container Plug-In for PCF tile and NSX-T infrastructure. For information about deploying PAS on vSphere with NSX-T internal networking using the VMware NSX-T Container Plug-In for PCF tile, see Deploying PAS with NSX-T Networking.

Migrating from a non-SDN environment to an SDN-enabled solution is possible but best considered as a greenfield deployment. Inserting an SDN layer under an active PCF installation is disruptive.

PAS Without SDN

You can use PAS without an SDN overlay. For more information, see PCF on vSphere Requirements.

PKS Without SDN

PKS is supplied with NSX-T SDN included, along with the associated licensing. However, you can ignore the bundled NSX-T SDN solution and use a built-in network stack, Flannel, which handles container networking.

For more information about PKS without NSX-T, see Installing PKS on vSphere.

SDN-Enabled Deployments With NSX-T

This sections provides information about network requirements, routing, and load balancing for PAS and PKS deployments that rely on SDN.

Network Requirements for PAS

PAS requires a number of statically defined networks to host the main components it is composed of. If you consider the “inside” of an NSX-T deployment, a series of non-routable address banks that the NSX-T routers manage is defined as follows:

Note: These static networks have a smaller host address space assigned than reference designs for PCF v1.xx.

  • Infrastructure: 192.168.1.0/24
  • Deployment: 192.168.2.0/24
  • Services: 192.168.3.0/24

    The Services network can be used with Ops Manager tiles that you might install in addition to PAS. Some of these tiles may require on-demand network capacity. Pivotal recommends that you consider adding a network per tile that needs such resources and pair them up in the tile’s configuration.

  • OD#-Services: 192.168.4.0 - 192.168.9.0 in /24 segments

    For example, the Redis for PCF tile asks for Network and Services Network. The first one is for placing the broker, and the second one is for deploying worker VMs to support the service. In this scenario, you can deploy a new network OD-Services1 and instruct Redis for PCF to use the Services network for the broker and the OD-Services1 network for the workers. The next tile can also use the Services network for the broker and a new OD-Services2 network for workers and so on.

  • Isolation Segment##: 192.168.10.0- 192.168.63.0 in /24 segments

    Isolation segments can be added to an existing PAS installation. This range of address space is designated for use when adding one or more segments. A /24-network in this range should be deployed for each new isolation segment. There is reserved capacity for more than 50 isolation segments, each having more than 250 VMs, in this address bank.

  • On-Demand-Org##: 192.168.128.0/17 = 128 orgs/foundation

    New to the PCF v2.x reference design is the concept of dynamically assigned org networks that are attached to automatically generated NSX T1 routers. The operator does not define these networks in Ops Manager, but rather allows for them to be built by supplying a non-overlapping block of address space for this purpose. This is configurable in the NCP pane of the VMware NSX-T Container Plug-In for PCF tile in Ops Manager. The default is /24. A new /24 is assigned to every org.

This reference uses a pattern that follows previous references. However, all networks now break on the /24 boundary, and the network octet is numerically sequential (1-2-3).

Network Requirements for PKS

When deploying PKS using Ops Manager, you must allow for a block of address space for dynamic networks that PKS deploys per namespace. The recommended address space is 172.20.0.0/16.

Network Requirements for External Routing

Routable external IPs on the provider side, for example, for NATs, PAS orgs, and load balancers, are assigned to the T0 router, which is located in front of the PCF installation. There are two approaches to assigning address space blocks to this job:

  • Without PKS or with PKS Ingress: The T0 router needs some routable address space to advertise on the BGP network with its peers. Select a network range with ample address space that can be split into two logical jobs: one job is advertised as a route for traffic, and the other job is for aligning T0 DNATs and SNATs, load balancers, and other jobs. Unlike with NSX-V, SNATs consume much more address space than before.

  • With PKS No Ingress: Relative to above, this approach has much higher address space consumption for load balancer VIPs. Therefore, allow for 4x the address space due to the fact that Kubernetes service types allocate addresses frequently.

Provider routable address space is /25 or /23 for PKS No Ingress.

NSX-T handles the routing between a T0 router and any T1 routers associated with it.

PAS with NSX-T

Expanding PAS with SDN features is best considered as a greenfield effort. Inserting an SDN layer under a working PAS installation is non-trivial and likely triggers a rebuild. NSX-T constructs that can be used by PAS include the following:

  • Logical networks (vWires), encapsulated broadcast domains
  • VLAN exhaustion avoidance through the use of logical networks
  • Routing services and NAT/SNAT to network fence the PCF installation
  • Load balancing services to pool systems such as Gorouters
  • SSL termination at the load balancer
  • Distributed routing and firewall services at the hypervisor

Pas with nsxt

View a larger version of this diagram

The VMware NSX-T Container Plug-In for PCF tile, which provides a container networking stack instead of the built-in Silk solution, interlocks with NSX-T already deployed on the IaaS. This tile cannot be used unless NSX-T has already been established on the IaaS. These technologies cannot be retrofitted into an established PKS or PAS installation.

To deploy PAS with NSX-T, you need to provide the following in the VMware NSX-T Container Plug-In for PCF tile:

  • NSX Manager host
  • NSX Manager username and password
  • NSX Manager CA certificate
  • PAS foundation name that must match the tag added to T0 router, external NAT pool, and IP block
  • Subnet prefix, which controls the size of each org and defaults to /24, that is, 254 addresses

You must also select Enable SNAT for Container Networks in the tile.

For more information about configuring PAS with NSX-T networking, see Deploying PAS with NSX-T Networking.

Each new job, like an isolation segment, falls to a broadcast domain or logical switch connected to a T1 router acting as the gateway to that network. This approach provides a DNAT and SNAT control point and a firewall boundary.

The one exception to this is the Services network, which shares a T1 router with any associated OD-Services# networks. Because these networks are considered part of the same system, inserting any NATs or firewall rules between them is not recommended.

Load Balancing for PAS

Without NSX-T, you need to choose a suitable load balancer to send traffic to the Gorouters and other systems. Installations approaching production level typically use external load balancing from hardware appliance vendors or other network-layer solutions.

With NSX-T, load balancing is available in the SDN layer. These load balancers are a logical entity tied to the resources in the Edge Cluster and align to the network or networks represented by a T1 router. They function as a Layer 4 load balancer. There is no SSL termination available on the NSX-T load balancer at this time.

Common deployments of load balancing in PAS are as follows:

  • HTTP/HTTPS traffic to and from Gorouters
  • TCP traffic to and from TCP routers
  • MySQL proxies
  • Diego Brain

NSX-T load balancers can support many VIPs, so it is best to think of deploying one load balancer per network (T1) and one-to-many VIPs on that load balancer per job. When designing your deployment, consider how many load balancers are needed and how much capacity is available for them.

PKS with NSX-T

PKS is supplied with NSX-T SDN included, along with the associated licensing.

To allow for dynamic constructs to be built, you need to provide the following when configuring the PKS tile:

  • NSX Manager host
  • NSX Manager username and password
  • NSX Manager CA certificate
  • T0 router to connect dynamically created namespace networks to
  • NSX IP block from which to pull networks for each new namespace
  • Floating IP pool ID created for externally facing IPs (NATs and VIPs)

For more information about configuring PKS with NSX-T networking, see Installing and Configuring PKS with NSX-T Integration.

Pks with nsxt

New T1 routers are spawned on-demand as new namespaces are added to PKS. Because these can grow rapidly, a large address block is desired for this use. The size of the address block is configurable and /24 by default.

If you want to deploy PKS without NSX-T, see Deploying PKS Without NSX-T for more information.

Ingress Routing and Load Balancing for PKS

This section provides information about ingress routing (Layer 7) and load balancing (Layer 4) for your PKS deployment.

Ingress Routing

NSX-T provides a native ingress router. Third-party options include Istio or Nginx, running as containers in the cluster.

Wildcard DNS entries are needed to point at the ingress service in the style of Gorouters in PAS. Domain information for ingress is defined in the manifest of the Kubernetes deployment. See the example below.

Note: HTTPS with NSX-T-provided ingress is not currently supported.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: music-ingress
  namespace: music1
spec:
  rules:
  - host: music1.pks.domain.com
    http:
      paths:
      - path: /.*
        backend:
          serviceName: music-service
          servicePort: 8080
Load Balancing

When pushing a Kubernetes deployment with type set to LoadBalancer, NSX-T automatically creates a new VIP for the deployment on the existing load balancer for that namespace.

You need to specify a listening and translation port in the service, along with a name for tagging. You also specify a protocol to use. See the example below.

apiVersion: v1
kind: Service
metadata:
  ...
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: web

Deploying PKS Without NSX-T

You can deploy the PKS tile in Ops Manager either alongside PAS tile or without it.

Note: You should not share an Ops Manager and BOSH Director installation between PKS and PAS in a production environment.

If you want to deploy PKS without NSX-T, select Flannel as your container network interface on the Networking tab of the PKS tile.

Select from networks already identified in Ops Manager to deploy the PKS API Gateway and the cluster or clusters.

If following the reference design for PAS, you can use the Services and OD-Services# networks for these jobs. If no PAS networks are available, define a network named PKS-Services and another named PKS-Cluster in Ops Manager.

Storage Design

Shared storage is a requirement for PCF. You can allocate networked storage to the host clusters following one of two common approaches, horizontal or vertical. The approach you follow should reflect how your datacenter arranges its storage and host blocks in its physical layout. Highly redundant storage systems are the best choice for a fully HA PaaS. Hyperconverged systems are a good example of sotrage that is highly redundant, scales with capacity growth and also is aligned to the AZ strategy of PCF.

  • Vertical: You grant each cluster its own dedicated datastores, creating a cluster-aligned storage strategy. vSphere VSAN is an example of this architecture. This alignment is the best overall choice for PCF as it matches the AZ model of the PaaS one for one. Loss of storage in a cluster constitutes loss on an AZ, which is a recoverable failure when at least two more AZs are operational.

    For example, with six datastores ds01 through ds06, you assign datastores ds01 and ds02 to your first cluster, ds03 and ds04 to your second cluster, and ds05, and ds06 to your third cluster. You then provision your first PCF installation to use ds01, ds03, and ds05 and your second PCF installation to use ds02, ds04, and ds06. With this arrangement, all VMs in the same installation and cluster share a dedicated datastore.

  • Horizontal: You grant all hosts access to all datastores and assign a subset to each installation. This alignment is less desirable, as loss of storage would impact all the AZs in use (potentially) and cause a non-recoverable situation, depending on what was lost. For this approach, strong redundancy design at the storage array is highly encouraged.

    For example, with six datastores ds01 through ds06, you grant all nine hosts access to all six datastores. You then provision your first PCF installation to use stores ds01 through ds03 and your second PCF installation to use ds04 through ds06.

Note: If a datastore is part of a vSphere Storage Cluster using sDRS (storage DRS), you must disable the s-vMotion feature on any datastores used by PCF. Otherwise, s-vMotion activity can rename independent disks and cause BOSH to malfunction. For more information, see How to Migrate PCF to a New Datastore in vSphere.

A further step towards resiliency would be to place the management plane of PCF, Ops Manager + BOSH, on storage separate from those used in the clusters/pools for AZ/app use. In addition, placing the blobstore on separate storage again ensures that storage loss at the AZ won’t cripple app deployment or scale during the outage.

Storage Capacity and Type

Pivotal recommends the following capacity allocation for PAS installations:

  • For production use, at least 8 TB of data storage, either as one 8-TB store or a number of smaller volumes adding up to 8 TB. Frequent development may require significantly more storage to accommodate new code and buildpacks. Optionally, the blobstore can be placed on storage separate from the storage offered to the AZs for app use.

  • For small installations, 4-6 TB of data storage is recommended.

The primary consumer of storage is the NFS and WebDAV blobstore.

Note: PCF does not currently support using vSphere Storage Clusters with the versions of PCF validated for this reference architecture. Datastores should be listed in the vSphere tile by their native name, not the cluster name created by vCenter for the storage cluster.

When planning storage allocation for PKS installations based on Pod and Node needs, consider the following chart.

Storage design

Compute and HA Considerations

In PAS, a consolidation ratio for containers should follow a conservative 4:1 ratio of vCPUs to pCPUs. A more conservative ratio is 2:1.

A standard Diego cell is 4x16 (XL) by default. At a 4:1 ratio, you have a 16-vCPU budget, or about 12 containers.

If you want to run more containers in a cell, scale the vCPUs accordingly. However, high core count VMs become increasingly hard to schedule on the pCPU without high physical core counts in the socket.

HA considerations include the following:

  • PAS is not considered HA until at least two AZs are defined. Pivotal recommends using three AZs for HA.

    PCF achieves redundancy through the AZ construct and the loss of an AZ is not considered catastrophic. BOSH Resurrector can replace lost VMs as needed to repair a foundation. Foundation backup and restoration is accomplished externally by BBR.

  • PKS has no inherent HA capabilities to design for. To support PKS, design HA at the IaaS, storage, power, and access layers.

Scaling and Capacity Management

This section provides information about scaling and capacity management for PCF.

Small PCF Installations

A small-sized PCF v2.0 or v2.1 foundation looks as follows:

  • 1 vSphere cluster/1 AZ
  • 2 resource pools: 1 for NSX-T components and 1 for PAS
  • 3 hosts minimum for vSphere HA (4 hosts for vSphere vSAN)
  • Shared storage/vSAN
  • 1 NSX Manager
  • 1 NSX Controller
  • 1 Large Edge VM = 4 small LBs
  • Ops Manager, the BOSH Director, and Small Footprint Runtime

This approach is best used as a small, development-only system or a proof of concept. Small Footprint Runtime has no upgrade path to the standard PAS tile.

Medium PCF Installations

Growing to a second AZ and cluster enables HA in the PaaS and IaaS layers. For this design, Pivotal recommends replacing Small Footprint Runtime with PAS. The second AZ doubles compute consumption and expands the NSX-T footprint.

A medium-sized PCF v2.0 or v2.1 foundation looks as follows:

  • 2 vSphere clusters/2 AZs
  • 3 resource pools: 1 for NSX-T components and 2 for PAS
  • 3 hosts minimum for vSphere HA (4 hosts for vSphere vSAN) per cluster
  • Shared storage/vSAN
  • 1 NSX Manager
  • 2 NSX Controllers
  • 1 Large Edge VM = 4 small LBs
  • Ops Manager, the BOSH Director, and PAS

Full PCF Installations

A production-ready PCF v2.0 or v2.1 foundation looks as follows:

  • 3 vSphere clusters/3 AZs
  • 4 resource pools: 1 for NSX-T components and 3 for PAS
  • 3 hosts minimum for vSphere HA (4 hosts for vSphere vSAN)
  • Shared Storage/vSAN
  • 1 NSX Manager
  • 3 NSX Controllers
  • 2 Large Edge VMs = 8 small LBs
  • Ops Manager, the BOSH Director, and PAS

If you deploy PKS alongside PAS, add 1 vSphere cluster/AZ and 1 resource pool for PKS to your design.

PAS and PKS with NSX-T

A fully meshed PCF 2.1 installation designed using the recommendations provided in this topic looks as follows:

Pas and pks

View a larger version of this diagram

Common components are the NSX T0 router and the associated T1 routers. This approach ensures that any cross-traffic between PKS and PAS apps stays within the bounds of the T0. This also provides a one-stop access point to the whole installation, which simplifies deployment automation for multiple, identical installations.

Note that each design, green PKS and orange PAS, has its own Ops Manager and BOSH Director installation. You should not share an Ops Manager and BOSH Director installation between PKS and PAS in a production environment.

AZs are aligned to vSphere clusters, with resource pools as an optional organizational tool to place multiple foundations into the same capacity. PKS v1.0 is a single-AZ deployment. You can align PKS to any AZ or cluster. Keeping PKS and PAS in completely separate AZs is not required.

You can use resource pools in vSphere clusters as AZ constructs to stack different installations of PCF. As server capacity continues to increase, the efficiency of deploying independent server clusters only for one installation is low. For example, customers are commonly deploying servers with 768-GB RAM and greater.

If you want to split the PAS and PKS installations into separate network trees, behind separate T0 routers, ensure that approach meets your needs by reviewing VMware’s recommendations for T0 to T0 routing.

Create a pull request or raise an issue on the source for this page in GitHub