Redis for PCF Smoke Tests

Page last updated:

Redis for Pivotal Cloud Foundry (PCF) runs a set of smoke tests during installation to confirm system health. The tests run in the org system and in the space redis-smoke-tests. The tests run as an application instance with a restrictive Application Security Group (ASG).

Smoke Test Steps

The smoke tests perform the following for each available service plan:

  1. Targets the org system and space redis-smoke-tests (creating them if they do not exist).
  2. Deploys an instance of the CF Redis Example App to this space.
  3. Creates a Redis instance and binds it to the CF Redis Example App.
  4. Creates a service key to retrieve the Redis instance IP address.
  5. Creates a restrictive security group, redis-smoke-tests-sg, and binds it to the space.
  6. Checks that the CF Redis Example App can write to and read from the Redis instance.

Security Groups

Smoke tests create a new application security group for the CF Redis Example App (redis-smoke-tests-sg) and delete it after the tests finish. This security group has the following rules:

[
    {
      "protocol": "tcp",
      "destination": "<broker IP address>",
      "ports": "32768-61000" // Ephemeral port range (assigned to shared-vm instances)
    }
]

This allows outbound traffic from the test app to the Redis shared-VM nodes.

Smoke Tests Resilience

Smoke tests could fail due to reasons outside of the Redis deployment; for example network latency causing timeouts or the Cloud Foundry instance dropping requests. They might also fail because they are being run in the wrong space.

The smoke tests implement a retry policy for commands issued to CF, for two reasons: - To avoid smoke test failures due to temporary issues such as the ones mentioned above - To ensure that the service instances and bindings created for testing are cleaned up.

Smoke tests retry failed commands against CF. They use a linear back-off with a baseline of 0.2 seconds, for a maximum of 30 attempts per command. Therefore, assuming that the first attempt is at 0s and fails instantly, subsequent retries are at 0.2s, 0.6s, 1.2s and so on until either the command succeeds or the maximum number of attempts is reached.

The linear back-off was selected as a good middle ground between: - Situations where the system is generally unstable-such as load-balancing issues-where max number of retries are preferred, and - Situations where the system is suffering from a failure that lasts a few seconds-such as restart of a Cloud Foundry VM where it is preferable to wait before reattempting the command.

Considerations

The above retry policy does not guard against a more permanent Cloud Foundry downtime or network connectivity issues. In this case, commands fail after the maximum number of attempts and might leave claimed instances behind. Pivotal recommends disabling automatic smoke test runs and manually releasing any claimed instances in case of upgrades or scheduled downtimes.

Troubleshooting

If errors occur while the smoke tests run, they are summarised at the end of the errand log output. Detailed logs can be found where the failure occurs. Some common failures are listed below.

Error Failed to target Cloud Foundry
Cause Your PCF is unresponsive.
Solution Examine the detailed error message in the logs and check the PCF Troubleshooting Guide for advice.
Error Failed to bind Redis service instance to test app.
Cause Your deployment’s broker has not been registered with PCF.
Solution Examine the broker-registrar installation step output and troubleshoot any problems.

When encountering an error when running smoke tests, it can be helpful to search the log for other instances of the error summary printed at the end of the tests, for example, Failed to target Cloud Foundry. Lookout for TIP: ... in the logs next to any error output for further troubleshooting hints.