High Availability

This topic describes implementation of high availability for Altoros Heartbeat for Pivotal Cloud Foundry (PCF) using an external load balancer.

External Load Balancer Configuration

Setup and configuration of an external load balancer is required to implement high availability of the installation. Any highly available TCP and UDP load balancer is acceptable.

Heartbeat uses the following two ports to accept metrics:

  • TCP 2003
  • UDP 8125

TCP 2003 is the load balancer port for the Graphite endpoint. It is used to receive component metrics from PCF and external services sent by the collectd agent. To implement high availability in your Altoros Heartbeat installation, configure your load balancer to accept these TCP connections and balance them across Heartbeat back ends. For information about getting back end IP addresses for your configuration, see Find Back End IPs below. If high availability is not required, configure your collectd agent to send metrics to any back end as described in Installing collectd Add-On for PCF.

UDP 8125 is the load balancer port for the Statsd endpoint. It is used to receive metrics from PCF deployed applications, both built-in (e.g. JMX) and custom (sent to Statsd from an app).

Because the UDP port is used to receive metrics, your load balancer may not support health checks. In this case, use TCP port 8126 to test server availability. Besides simple configuration, advanced TCP health check is available. You can pass the health command to the TCP port and analyze the output, which can be up or down.

Similar to Graphite metrics, metrics from StatsD are sent to the VMs with heartbeat-backend in their names. For information about getting back end IP addresses for these VMs, see Find Back End IPs below. These IP addresses can be used for configuring the load balancer.

Find Back End IPs

To find the IP address for your Heartbeat back end:

  1. Open the Status tab.
  2. Copy the value in the Jobs column for Heartbeat PCF Monitoring back end job.

Finding backend ip

Scale Up

The default installation can be scaled up to improve performance. Scaling up is achieved by increasing the number of back end and front end jobs. In fact, scaling up can be done without a load balancer and high availability. However, in this case, back end operation and the metrics flow can be compromised.

Note: The default installation of the Altoros Heartbeat for PCF tile is highly performant and has large capacity for receiving and processing metrics. Lack of performance might not be related to insufficient number of instances. For assistance with detecting and eliminating performance problems with your installation, contact Contact Altoros.

To increase the number of back end and front end jobs:

  1. From the Settings tab, click Resource Config.

    Increase backend number

  2. In the Instances column, select the required number of front-end and back-end instances.

  3. Navigate to the Ops Manager Installation Dashboard and click Apply Changes.

Scale Down

The installation can be scaled down to adjust to decreased consumption of resources. To decrease the number of back end and front end jobs, follow the steps described in Scale Up above.

Although scaling down front end jobs is unrestricted, with back end jobs, avoid selecting fewer than two instances to prevent metrics loss. Scaling down can lead to a loss in high availability for some previously received metrics, because only one copy thereof is retained. New values of these metrics have multiple copies according to a new configuration. Scale down carefully. If you need assistance, contact Altoros support.

Create a pull request or raise an issue on the source for this page in GitHub