LATEST VERSION: 1.1 - CHANGELOG
Pivotal Cloud Cache for PCF v1.0

Pivotal Cloud Cache

Overview

Pivotal Cloud Cache (PCC) is a high-performance, highly available caching layer for Pivotal Cloud Foundry (PCF). PCC offers an in-memory key-value store. The product delivers low-latency responses to a large number of concurrent data access requests.

PCC provides a service broker for on-demand creation of in-memory data clusters that are dedicated to the PCF space and tuned for the intended use cases defined by the plan. Service operators can create multiple plans to support different use cases. PCC uses Pivotal GemFire.

You can use PCC to store any kind of data objects using the Pivotal GemFire Java client library.

This documentation performs the following functions:

  • Describes the features and architecture of PCC
  • Provides the PCF operator with instructions for installing, configuring, and maintaining PCC
  • Provides app developers instructions for choosing a service plan, creating and deleting PCC service instances, and binding apps

Product Snapshot

Current Pivotal Cloud Cache details:

  • Version: v1.0.7
  • Release Date: Aug 14, 2017
  • Software Component Version: GemFire v9.0.4
  • Compatible Ops Manager Version: v1.9.x and v1.10.x
  • Compatible Elastic Runtime Version: v1.9.21+ and v1.10.x
  • vSphere Support: Yes
  • Azure Support: Yes
  • GCP Support: Yes
  • OpenStack Support: Yes
  • AWS Support: Yes
  • IPsec Support: No

Known Issues

Service Instance Upgrade Fails When Pulse Has Open Connections

Affected Versions: 1.0.0 - 1.0.5

The upgrade-all-service-instances errand fails. On the Cloud Cache service instance deployment, this may appear with an error similar to the following:

 19:38:27 | Updating instance locator: locator/d6d98feb-f005-49ca-b1c1-3e2803811cc8 (0) (canary) (00:10:20)
            L Error: Action Failed get_task: Task f425d048-954e-424c-642f-3ba2f2b3bec4 result: Unmounting persistent disk: Running command: 'umount /dev/sdc1', stdout: '', stderr: 'umount: /var/vcap/store: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
         ': exit status 1

This is due to a bug in GemFire where the locators cannot stop successfully when there is an open connection to the Pulse dashboard. This issue can be tracked in the Apache Geode Jira here.

Mitigation Option 1

If possible, close all connections to the Pulse dashboards before upgrading the Pivotal Cloud Cache tile.

Mitigation Option 2

If you have not yet run the upgrade-all-service-instances errand, follow the steps below:

  1. SSH onto the Operations Manager VM.
  2. Identify each Cloud Cache deployment using the BOSH CLI.
  3. For each Cloud Cache deploymnent, use bosh ssh to ssh onto a locator VM.
  4. On the locator VM, change yourself to the root user by running sudo su.
  5. On the locator VM, use monit to stop the route_registrar job by running monit stop route_registrar.
  6. Repeat steps 4-5 for all the locators for each Cloud Cache deployment.
  7. Navigate back to the Operations Manager dashboard and click Apply Changes.
  8. The upgrade-all-service-instances errand should complete successfully.
Mitigation Option 3

If you have already run the upgrade-all-service-instances errand and now have a stopped locator in one of your Cloud Cache deployments, follow the steps below:

  1. SSH onto the Operations Manager VM.
  2. Identify the locator which is in a stopped state using the BOSH CLI. This can be done by running bosh vms.
  3. On the locator VM, change yourself to the root user by running sudo su.
  4. (Optional) On the locator VM, run monit summary to confirm that the gemfire-locator process is not monitored.
  5. Kill the process by running kill -9 "$(ps -ef | grep LocatorLauncher | head -n 1 | awk '{print $2}')".

After killing the LocatorLauncher process on the stopped Locator VM, proceed with option 2 above.

Pulse Monitoring Tool Issue

The topology diagram might not be accurate and might display more members in a cluster than the cluster actually contains. However, the numerical value displayed on the top bar is accurate.

Updating a PCC service instance fails if syslog TLS is disabled

Affected Versions: 1.0.0 - 1.0.6

If you have syslog enabled but want to disable syslog TLS, you will not be able to upgrade PCC. You will encounter the following error:

Service broker error: previous manifest locator job must contain syslog.tls properties if syslog properties exist

Release Notes

Pivotal Cloud Cache v1.0.7

New in This Release

  • Improved password requirement of the testing utility
  • Increased monit for smoke tests to be successful
  • Improved kill command to force kill in the event of gfsh stop failure
  • Fixed upgrading a cloud cache service failure when syslog tls is disabled and - Send service instance logs to external syslog is enabled

Pivotal Cloud Cache v1.0.6

New in This Release

  • Fixed service instance upgrade failure when pulse has open connections

Pivotal Cloud Cache v1.0.5

New in This Release

  • This version is based on the On-Demand Services SDK (ODB) v0.16. It fixes the issue with the upgrade-all-service-instances errand returning success when it has actually failed.

Pivotal Cloud Cache v1.0.4

New in This Release

  • Rewrote smoke test using Go to fix timeout issues on some environments

Pivotal Cloud Cache v1.0.3

New in This Release

  • Upgraded to GemFire v9.0.3
  • Upgraded to v0.15.3 of On-Demand Service Broker
  • Fixed JVM parameters for garbage collection
  • URL routes were updated to read “cloudcache” rather than “gemfire”

Pivotal Cloud Cache v1.0.2

New in This Release

  • Upgrades to v0.15.2 of On-Demand Service Broker
  • Increased smoke test canary timeout to five minutes
  • Sets global service instance quota to 50 instances
  • Provides option to disable a plan and service access separately
  • Validates that the plan description doesnt contain any spaces

Pivotal Cloud Cache v1.0.1

New in This Release

  • Upgrades to v0.15.1 of On-Demand Service Broker
  • Smoke test now runs in the system organization
  • Improved reliability of Smoke Test errand
  • Adds support for gfsh export logs command
  • Optimized size of tile to make downloading and uploading faster
  • Ensured no data loss with when performing a stemcell upgrade for regions with redundancy
  • Add post deployment checks that all locators properly join the cluster
  • PCC broker can be configured to send logs to a syslog server
  • PCC service instances will now send logs to configured syslog endpoint if enabled
  • Added option to provide public IP addresses to VMs, which is necessary to allow egress on Google Cloud Platform
  • Added errand when deleting tile that deletes all PCC service instances

Pivotal Cloud Cache v1.0.0

Minimum Version Requirements

  • PCC requires PCF with PCF Elastic Runtime v1.9.11 or above.

New in This Release

  • Support for application look-aside cache pattern
  • Multi AZ support
  • On-demand provisioning
  • MVP role-based access control
  • Quotas, both for service instances and for number of servers in a cluster
  • GemFire v9.0.2
  • Manage and configure clusters using the Cloud Foundry Command Line Interface (cf CLI) service commands and the GemFire gfsh CLI

Known Issues

  • The Upgrade all service instances errand can only upgrade the first 50 instances created.
  • On PCF for Google Cloud Platform deployments, Ops Manager service network VMs are not assigned the correct firewall rules. As a result, these VMs cannot communicate with the BOSH Director, and the Cloud Cache Broker will fail to create service instances. As a workaround, modify your firewall to use subnet CIDR-based rules.
  • The PCC broker will accept requests to modify instances that have ongoing BOSH operations, resulting in an asynchronous failure.

Architecture

PCC deploys cache clusters that use Pivotal GemFire to provide high availability, replication guarantees, and eventual consistency.

When you first spin up a cluster, you’ll have three locators and at least four servers.

graph TD; Client subgraph P-CloudCache Cluster subgraph locators Locator1 Locator2 Locator3 end subgraph servers Server1 Server2 Server3 Server4 end end Client==>Locator1 Client-->Server1 Client-->Server2 Client-->Server3 Client-->Server4

If you scale the cluster up, you’ll have more servers, increasing the capacity of the cache. There always will be three locators.

graph TD; Client subgraph P-CloudCache Cluster subgraph locators Locator1 Locator2 Locator3 end subgraph servers Server1 Server2 Server3 Server4 Server5 Server6 Server7 end end Client==>Locator1 Client-->Server1 Client-->Server2 Client-->Server3 Client-->Server4 Client-->Server5 Client-->Server6 Client-->Server7

When a client connects to the cluster, it first connects to a locator. The locator replies with the IP address of a server for it to talk to. The client then connects to that server.

sequenceDiagram participant Client participant Locator participant Server1 Client->>+Locator: What servers can I talk to? Locator->>-Client: Server1 Client->>Server1: Hello!

When the client wants to read or write data, it sends a request directly to the server.

sequenceDiagram participant Client participant Server1 Client->>+Server1: What’s the value for KEY? Server1->>-Client: VALUE

If the server doesn’t have the data locally, it fetches it from another server.

sequenceDiagram participant Client participant Server1 participant Server2 Client->>+Server1: What’s the value for KEY? Server1->>+Server2: What’s the value for KEY? Server2->>-Server1: VALUE Server1->>-Client: VALUE

Workflow

The workflow for the PCF admin setting up a PCC service plan:

graph TD; subgraph PCF Admin Actions s1 s2 end subgraph Developer Actions s4 end s1[1. Upload P-CloudCache.pivotal to Ops Manager] s2[2. Configure CloudCache Service Plans, i.e. caching-small] s1-->s2 s3[3. Ops Manager deploys CloudCache Service Broker] s2-->s3 s4[4. Developer calls `cf create-service p-cloudcache caching-small test`] s3-->s4 s5[5. Ops Manager creates a CloudCache cluster following the caching-small specifications] s4-->s5
  • PCC for PCF can be used as a cache. It supports the look-aside cache pattern.
  • PCC can be used to store objects in key/value format, where value can be any object.
  • PCC works with gfsh v9.0.0 and later
  • Any gfsh command not explained in the PCC documentation is not supported.
  • PCC supports basic OQL queries, with no support for joins.

Limitations

  • Scale down of the cluster is not supported.
  • Plan migrations, for example -p flag with the cf update-service command, are not supported.
  • WAN (Cross Data Center) replication is not supported.
  • Persistent regions are not supported.

Security

Pivotal recommends that you do the following:

  • Run PCC for PCF in its own network
  • Use a load balancer to block direct, outside access to the Gorouter

To allow PCC network access from apps, you must create application security groups that allow access on the following ports:

  • 1099
  • 8080
  • 40404
  • 55221

For more information, see the PCF Application Security Groups topic.

Authentication

Clusters are created with two default users: cluster_operator and developer. A cluster can only be accessed using one of these two users. All client applications, gfsh, and JMX clients need to use one of these users to access the cluster.

Authorization

Default user roles cluster_operator and developer have different permissions:

  • cluster_operator role has DATA:MANAGE, DATA:WRITE, and DATA:READ permissions.
  • developer role has DATA:WRITE and DATA:READ permissions.

You can find more details about these permissions in the Pivotal GemFire Implementing Authorization topic.

Feedback

Please provide any bugs, feature requests, or questions to the Pivotal Cloud Foundry Feedback list.

Create a pull request or raise an issue on the source for this page in GitHub