Pivotal Cloud Cache Operator Guide
This document describes how a Pivotal Cloud Foundry (PCF) operator can install, configure, and maintain Pivotal Cloud Cache (PCC).
You must have access to a Service Network in order to install PCC.
Minimum Version Requirements
PCC requires PCF with PCF Elastic Runtime v1.9.11 or later.
Follow the steps below to start installing PCC on PCF:
- Download the tile from the Pivotal Network.
- Click Import a Product to import the tile into Ops Manager.
- Click the + symbol next to the uploaded product description.
- Click on the Cloud Cache tile. From here, you can click through the Settings, Status, Credentials, and Logs tabs to start configuring PCC.
Assign Availability Zones and Networks
- Under Place singleton jobs in, select the Availability Zone (AZ) where your singleton virtual machines (VMs) will reside.
- Under Balance other jobs in, select the AZ(s) you want to use for distributing other GemFire VMs. We recommend selecting all of them.
- For Network, select your Elastic Runtime network.
- For Service Network, select the network to be used for GemFire VMs.
- Click Save.
Settings: Smoke Tests
Click the Settings heading on the the left side of the page to go to the next set of configurable properites. Here, you can choose a plan to use for smoke tests.
This section allows you to configure the
smoke-tests errand that runs after tile installation. The errand verifies that your installation was successful.
From the list shown, select a plan to use when the
smoke-tests errand runs. The
smoke-tests errand uses the selected plan. Ensure the selected plan is enabled on the plan-configuration page. If the selected plan is not enabled, the
smoke-tests errand will fail.
Pivotal recommends that you use the smallest four-server plan for smoke tests. Because smoke tests create and later destroy this plan, using a very small plan reduces installation time.
By default, the
smoke-test errand runs on the
system org and the
Settings: Allow Outbound Internet Access
The Allow outbound internet access from service instances checkbox is unchecked by default.
Check this box if you want to allow outbound internet traffic (this is IaaS dependent; see Ops Manager documentation for details). For configuring an external syslog endpoint, this box may need to be checked.
Check Allow outbound internet access from service instances if BOSH is configured to use an external blob store; this enables the PCC tile to allow external internet access.
Settings: External Syslog
PCC supports forwarding syslog to an external log management service (e.g. papertrail, splunk, or your custom enterprise log sink). To enable remote syslog for the service broker, provide a host and port in the External Syslog Host and External Syslog Port fields.
The broker logs are useful for debugging problems creating, updating, and binding service instances.
Remote syslog sends unencrypted logs by default. To secure log transmission, you can enable TLS by selecting the Enable TLS option button. If there are several peer servers that may respond to remote syslog connections, you need to provide a regex in the Permitted Peer for TLS Communication field (shown here as *.example.com). You may need to provide the CA certificate of the log management service endpoint if the server certificate is not signed by a known authority (for exampe, an internal syslog server).
By default, only the broker logs will be forwarded to your configured log management service. If you would like to forward server and locator logs from all service instances please check the “Send service instance logs to external syslog” checkbox.
You may want to enable remote syslog for service instances if you would like to monitor the health of the clusters. However, this will generate a large volume of logs, which is why it is disabled by default. The broker logs only include information about service instance creation, not on-going cluster health. Please note that service instance logs will be sent to the same host and port configured for the broker logs.
You can configure five individual plans for your developers. Select the Plan 1 through Plan 5 tabs to configure each of them.
The Enable Plan toggle is checked by default. If you do not want to add this plan to the CF service catalog, select Disable Plan. You must enable at least one plan.
The CF Service Access dropdown allows you to configure the plan’s visibility in the CF Marketplace. Enable Service Access will display the service plan to all developers in the CF marketplace. Disable Service Access will not display the service plan to developers in the CF marketplace and cannot be enabled at a later time. Leave Service Access Unchanged will not display the service plan in the CF marketplace by default but can be enabled at a later time.
The Plan Name text field allows you to customize the name of the plan. This plan name is displayed to developers when they view the service in the Marketplace.
The Plan Description text field allows you to supply a plan description. The description is displayed to developers when they view the service in the Marketplace.
The Service Instance Quota sets the maximum number of PCC clusters that can exist simultaneously.
When developers create or update a service instance, they can specify the number of servers in the cluster. The Maximum servers per cluster field allows operators to set an upper bound on the number of servers developers can request. If developers do not explicitly specify the number of servers in a service instance, a new cluster has the number of servers specified in the Default Number of Servers field.
The Availability zones for service instances setting determines which AZs are used for a particular cluster. The members of a cluster are distributed evenly across AZs.
WARNING After you’ve selected AZs for your service network, you cannot add additional AZs; doing so causes existing service instances to lose data on update.
The remaining fields control the VM type and persistent disk type for servers and locators. The total size of the cache is directly related to the number of servers and the amount of memory of the selected server VM type. We recommend the following configuration:
- For the VM type for the Locator VMs field, select a VM that has at least 1 GB of RAM and 4 GB of disk space.
- For the Persistent disk type for the Locator VMs field, select 10 GB or higher.
- For the VM type for the Server VMs field, select a VM that has at least 4 GB of RAM and 8 GB of disk space.
- For the Persistent disk type for the server VMs field, select 10 GB or higher.
When you finish configuring the plan, click Save to save your configuration options.
Ensure you import the correct type of stemcell indicated on this tab.
You can download the latest available stemcells from Pivnet.
The BOSH layer that underlies PCF generates
healthmonitor metrics for all VMs in the deployment.
However, these metrics are not included in the Loggregator Firehose by default.
To get these metrics, do either of the following:
- To send BOSH HM metrics through the Firehose, install the open-source HM Forwarder.
- To retrieve BOSH health metrics outside of the Firehose, install the JMX Bridge for PCF tile.
Follow the steps below to upgrade PCC on PCF:
- Download the new version of the tile from the Pivotal Network.
- Upload the product to Ops Manager.
- Click Add next to the uploaded product.
- Click on the Cloud Cache tile and review your configuration options.
- Click Apply Changes.
Follow the steps below to update plans in Ops Manager.
- Click on the Cloud Cache tile.
- Click on the plan you want to update under the Information section.
- Edit the fields with the changes you want to make to the plan.
- Click Save button on the bottom of the page.
- Click on the PCF Ops Manager to navigate to the Installation Dashboard.
- Click Apply Changes.
Plan changes are not applied to existing services instances until you run the
upgrade-all-service-instances BOSH errand. You must use the BOSH CLI to run this errand. Until you run this errand, developers cannot update service instances.
Changes to fields that can be overridden by optional parameters, for example
new_size_percentage, will change the default value of these instance properties, but will not affect existing service instances.
If you change the allowed limits of an optional parameter, for example the maximum number of servers per cluster, existing service instances in violation of the new limits will not be modified.
When existing instances are upgraded, all plan changes will be applied to them.
To uninstall PCC, follow the steps from below from the Installation Dashboard:
- Click the trash can icon in the bottom-right-hand corner of the tile.
- Click Apply Changes.
You can visualize the performance of your cluster by downloading the statistics files from your servers. These files are located in the persistent store on each VM. To copy these files to your workstation, run the following command:
bosh scp server/0:/var/vcap/store/gemfire-server/statistics.gfs /tmp
See the Pivotal GemFire Installing and Running VSD topic for information about loading the statistics files into Pivotal GemFire VSD.
Error: “Creating p-cloudcache SERVICE-NAME failed”
The smoke tests could not create an instance of GemFire. To troubleshoot why the deployment failed, use the cf CLI to create a new service instance using the same plan and download the logs of the service deployment from BOSH.
Error: “Deleting SERVICE-NAME failed”
The smoke test attempted to clean up a service instance it created and failed to delete the service using the
cf delete-service command. To trobleshoot this issue, run
bosh logs to view the logs on the broker or the service instance to see why the deletion may have failed.
Error: Cannot connect to the cluster SERVICE-NAME
The smoke test was unable to connect to the cluster.
To troubleshoot the issue, review the logs of your load balancer, and review the logs of your CF Router to ensure the route to your PCC cluster is properly registered.
You also can create a service instance and try to connect to it using the gfsh CLI. This requires creating a service key.
Error: “Could not perform create/put on Cloud Cache cluster”
The smoke test was unable to write data to the cluster. The user may not have permissions to create a region or write data.
Error: “Could not retrieve value from Cloud Cache cluster”
The smoke test was unable to read back the data it wrote. Data loss can happen
if a cluster member improperly stops and starts again or if the member machine
crashes and is resurrected by BOSH. Run
bosh logs to view the logs on the
broker to see if there were any interruptions to the cluster by a service
PCC Clients communicate to PCC servers on port 40404 and with locators on port 55221. Both of these ports must be reachable from the Elastic Runtime network to service the network.
Membership Port Range
PCC servers and locators communicate with each other using UDP and TCP. The current port range for this communication is
If you have a firewall between VMs, ensure this port range is open.
BOSH Director and VMs on Different Networks
A deployment will fail if the VMs in the service network cannot reach the BOSH Director VM on another network. VMs on a different service network need to be able to talk to the BOSH Director.
If the VMs cannot communicate with the BOSH Director, the following error message appears:
Director task 1257 Started preparing deployment > Preparing deployment. Done (00:00:00) Started preparing package compilation > Finding packages to compile. Done (00:00:00) Started creating missing vms Started creating missing vms > locator/d328ac30-fd48-424f-a28f-5087d94c07fd (0) Started creating missing vms > locator/c41e4988-257c-47a4-bb97-1828dae15df4 (2) Started creating missing vms > server/54f3690f-2366-405e-aca0-e0d630753e91 (0) Started creating missing vms > server/cbbb4739-aae0-472f-9f7f-713d6ac15d07 (3) Started creating missing vms > locator/36758e47-6f89-4003-968d-55620ca28e8a (1) Started creating missing vms > server/abc4f156-1403-4187-a1f7-35f1ff4d961c (1) Started creating missing vms > server/5a9c74e8-881e-4f49-99e3-0a708b2b583c (2). Failed: Timed out pinging to 37ebdd5e-4fe8-49cd-8a0f-cc72837a361c after 600 seconds (00:10:35) Failed creating missing vms > server/cbbb4739-aae0-472f-9f7f-713d6ac15d07 (3): Timed out pinging to db1d846b-a2cd-4c49-942e-3cf1b48edac1 after 600 seconds (00:10:37) Failed creating missing vms > locator/d328ac30-fd48-424f-a28f-5087d94c07fd (0): Timed out pinging to 578a74a3-5511-4263-adbd-5028c6b7d1ab after 600 seconds (00:10:38) Failed creating missing vms > server/54f3690f-2366-405e-aca0-e0d630753e91 (0): Timed out pinging to 73aec92f-b563-4f50-b22a-283072455b6e after 600 seconds (00:10:39) Failed creating missing vms > locator/36758e47-6f89-4003-968d-55620ca28e8a (1): Timed out pinging to 634738c2-2338-4d32-973a-63f91dcfd922 after 600 seconds (00:10:39) Failed creating missing vms > locator/c41e4988-257c-47a4-bb97-1828dae15df4 (2): Timed out pinging to 35b63dff-a7ff-44e2-984d-481dd9d1337b after 600 seconds (00:10:41) Failed creating missing vms > server/abc4f156-1403-4187-a1f7-35f1ff4d961c (1): Timed out pinging to e6a59079-974b-4c05-8ce3-d250b582a571 after 600 seconds (00:10:54) Failed creating missing vms (00:10:54) Error 450002: Timed out pinging to 37ebdd5e-4fe8-49cd-8a0f-cc72837a361c after 600 seconds Task 1257 error