Service Backups for Pivotal Cloud Foundry
BOSH operators running services (e.g. Redis service broker for Cloud Foundry) may want to back up certain files from the virtual machines running these services so that they can restore them after a disaster.
Configuration
The Service Backup BOSH release backs up a directory on the instance VM it is located on to one of several supported destination types. The supported destination types are AWS S3, Azure blobstore, and SCP.
Uploading a Service Backup Release
Service Backup is distributed as a BOSH final release. To upload a release to your BOSH director, upload the latest final release tarball from VMware Tanzu Network.
Configuring the Manifest
Service Backup is designed to be colocated on service instance VMs, and must be included in that service’s BOSH deployment manifest.
Add Service Backups to Your Service Deployment
Shown below is an template manifest, adding an S3 backup destination to a Redis service deployment. For further information on changing the backup destination, see link.
---
properties:
service-backup:
destinations:
- type: s3
name: <OPTIONAL: destination name>
config:
bucket_name: <bucket>
bucket_path: <path in bucket>
access_key_id: <aws access key>
secret_access_key: <aws secret key>
endpoint_url: <OPTIONAL: S3 compatible endpoint URL>
region: <OPTIONAL: S3 region, required for Signature Version 4 regions>
source_folder: <directory to back up>
cron_schedule: <cron schedule>
backup_user: <OPTIONAL: backup user>
source_executable: <OPTIONAL: run before each backup>
exit_if_in_progress: <OPTIONAL: exits if another backup is already running. defaults to false>
missing_properties_message: <OPTIONAL: message to log when properties are missing in the manifest. defaults to 'Provide these missing fields in your manifest.'>
service_identifier_executable: <OPTIONAL: command that prints service instance ID on stdout>
cleanup_executable: <OPTIONAL: command to run after each backup>
add_deployment_name_to_backup_path: <OPTIONAL: if set true will include deployment name in destination path. Defaults to false>
alerts: # optional
product_name: <product name>
config:
cloud_controller:
url: <Cloud Foundry API URL>
user: <Cloud Foundry username with SpaceAuditor role in cf_space>
password: <Cloud Foundry password>
notifications:
service_url: <Cloud Foundry notification service URL>
cf_org: <Cloud Foundry org name>
cf_space: <Cloud Foundry space name>
reply_to: <OPTIONAL: email reply-to address. This is required for some SMTP servers>
client_id: <UAA client ID with authorities to send notifications>
client_secret: <UAA client secret>
timeout_seconds: <OPTIONAL: default is 60>
skip_ssl_validation: <OPTIONAL: ignore TLS certification verification errors>
releases:
- name: redis
version: latest
- name: service-backup
version: latest
instance_groups:
- name: redis-server
jobs:
- name: redis-server
release: redis
- name: service-backup
release: service-backup
Configure the Backup Schedule
Backups will be triggered according to the schedule given by the cron_schedule
property. See robfig/cron for cron expression syntax.
Configure the Backup User
By default the service backup process will run as ‘vcap’. This can be configured by setting the backup_user
property.
If you are also providing your own source_executable
be sure that the backup_user
you create
has execute permissions for it.
Defining the Files to Be Backed Up
The source_folder
property names a local path from which backups are uploaded. All files in this location are uploaded, therefore, Pivotal recommends that this directory be separate from the directory that files are written to. This avoids uploading files that are still being written.
Preparing the Files to be Backed Up
The source_executable
is an optional property that names a command to run before each backup. Any required arguments can be provided space-separated after the executable name. The property is useful for services that require some operation to be performed before backing files up, for example triggering a Redis memory dump to disk. The service author may also use the source_executable
to decide which instance performs the backup. For example, the service author can choose to only trigger backups on replica
nodes to avoid issues with data consistency.
If the property is not specified, it is ignored and nothing is executed.
If the property is specified, when the BOSH lifecycle runs stop scripts, any running processes of source_executable
will be identified and will be sent the SIGTERM signal. This attempts to stop the main service-backup process, which first forwards the signal to any active backup processes and allows them to trap the signal and clean up any used resources, such as open files, before potential removal or update of VMs. If the main service-backup process does not terminate successfully with the SIGTERM signal within 15 seconds, SIGQUIT then SIGKILL signals are sent respectively and the main service-backup process is terminated by force. In the event that SIGQUIT or SIGKILL is sent, the signal will not be forwarded to active backup processes, but no new backup processes will be initiated by the main process.
If a suitable executable is not included in the service release, you can add one by publishing it in a separate release, as its own package and job, and colocating it into the deployment.
Cleaning Up the Files Which Were Backed Up
The optional cleanup_executable
property names a local executable to cleanup backups. Tokens are split on spaces; first is command to execute and remaining are passed as args to command.
Sending an Alert When a Backup Fails
When the optional alerts
properties are configured an alert will be sent to SpaceDevelopers in the configured cf_space
when a backup fails. Alerts are sent using the Cloud Foundry Notifications Service.
Correlating BOSH Instances to Cloud Foundry Service Instances
BOSH operators might want to correlate BOSH-deployed VM instances with CF service instances, in which case the service author must provide a binary that returns a string identifier for your service instance. This will appear in all log messages under the data element identifier
. For example:
{ "source": "ServiceBackup", "message": "doing-stuff", "data": { "backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61", "identifier": "service_identifier" }, "timestamp": "2021-04-05T16:28:16.000000000Z", "log_level": "info" }
Add the optional service_identifier_executable
key to your manifest (tokens are split on spaces; first is command to execute and remaining are passed as args to command).
Naming Backup Destinations
Each destination can be given an optional name
property. This will appear in the log messages for uploads to that destination. For example:
{ "timestamp":"2021-04-05T16:28:16.000000000Z", "source":"ServiceBackup", "message":"ServiceBackup.WithIdentifier.about to upload /path/to/file to S3 remote path bucket_name/2016/07/04", "log_level":"info", "data": { "backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61", "destination_name": "some-destination-name", "identifier": "service_identifier", "session":"1" } }
Identifying Logs for a Backup Run
The log lines of a particular backup run can be identified by correlating their unique backup_guid
For example
{"timestamp":"2021-04-05T16:28:16.000000000Z","source":"ServiceBackup","message":"ServiceBackup.WithIdentifier.Upload backup started","log_level":"info","data":{"backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61","identifier":"service_identifier","session":"1"}}
{"timestamp":"2021-04-05T16:28:17.000000000Z","source":"ServiceBackup","message":"ServiceBackup.WithIdentifier.Upload backup completed successfully","log_level":"info","data":{"backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61","duration_in_seconds":8.081000000000001e-06,"identifier":"service_identifier","session":"1","size_in_bytes":200}}
{"timestamp":"2021-04-05T16:28:17.000000000Z","source":"ServiceBackup","message":"ServiceBackup.WithIdentifier.Cleanup started","log_level":"info","data":{"backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61","identifier":"service_identifier","session":"1"}}
{"timestamp":"2021-04-05T16:28:17.000000000Z","source":"ServiceBackup","message":"ServiceBackup.WithIdentifier.Cleanup debug info","log_level":0,"data":{"backup_guid":"244eadb0-91e7-45da-9a7f-3616a59a6e61","cmd":"creator-cmd","identifier":"service_identifier","out":"Cleanup Complete\n","session":"1"}}
Triggering Manual Service Backups
BOSH operators might want to trigger a one-off, manual backup. To do this:
- ssh onto a BOSH-deployed VM that has a service-backup job running on it.
- Execute
/var/vcap/jobs/service-backup/bin/manual-backup
.
Disabling Service Backups
Backups can be disabled by removing the service-backup
section from your manifest and then redeploying. You can still leave the job on your instance group if you wish.
Backup Destinations
Service Backup supports S3 (AWS, Ceph s3, Swift w/ S3 compatibility module), Azure blobstore, and SCP. To change the backup destination change the manifest destinations
value:
S3
properties:
service-backup:
destinations:
- type: s3
name: <OPTIONAL: destination name>
config:
bucket_name: <bucket>
bucket_path: <path in bucket>
access_key_id: <aws access key>
secret_access_key: <aws secret key>
endpoint_url: <OPTIONAL: url for S3-compatible blobstore>
region: <OPTIONAL: S3 region, required for Signature Version 4 regions>
New S3 Buckets
If the bucket does not exist in S3, then it will be created in the us-east-1
region. To create the bucket in another region, configure the region
property.
Existing S3 Buckets
If the bucket exists in S3 and requires the Signature Version 4 Signing Process then the region
property must be configured. Some S3 regions require Signature Version 4, e.g. eu-central-1
. To obtain a bucket’s region run aws s3api get-bucket-location --bucket BUCKET_NAME
.
AWS IAM
If you are using AWS ensure that the IAM user has the right permissions. Create a new custom policy (IAM > Policies > Create Policy > Create Your Own Policy) and paste in the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ServiceBackupPolicy",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:CreateBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::MY_BUCKET_NAME/*",
"arn:aws:s3:::MY_BUCKET_NAME"
]
}
]
}
The s3:CreateBucket
permission is required because the tool will attempt to create the bucket if it does not already exist. If the desired bucket already exists, the s3:CreateBucket
permission is not required.
Finally, attach this policy to your AWS user (IAM > Policies > Policy Actions > Attach).
AWS CLI version
The current release uses the aws
CLI version aws-cli/1.11.91 Python/3.6.1 Darwin/15.6.0 botocore/1.5.54
to upload.
S3-compatible blobstores
By default, backups are sent to AWS S3. To use an S3-compatible blobstore like RiakS2, set the endpoint_url
property.
Service Backup uses the AWS CLI to backup files to S3, or S3-compatible blobstores. If the endpoint has a self-signed SSL server certificate, then the root CA certificate must be added to the default system trust store. This can be done using BOSH trusted certs.
Azure
properties:
service-backup:
destinations:
- type: azure
name: <OPTIONAL: destination name>
config:
storage_account: <storage account>
storage_access_key: <storage key>
container: <container name>
path: <path in container>
endpoint: <OPTIONAL: Azure Storage endpoint>
By default, backups are sent to the public Azure blobstore. If you are not
using an Azure public region, you can specify a different endpoint name using
the endpoint
property.
Azure client version
The current release uses blobxfer
CLI version 1.0.0
to upload.
Google Cloud Storage
properties:
service-backup:
destinations:
- type: gcs
name: <OPTIONAL: destination name>
config:
service_account_json: |
<GCP service account key JSON literal>
project_id: <GCP project ID>
bucket_name: <GCP Storage bucket name, does not have to already exist>
The service account must have “Storage Admin” IAM permissions. You can generate the JSON key for a service account with:
gcloud iam service-accounts keys \
create key-lives-here.json \
--iam-account $IAM_ACCOUNT_ADDRESS
If the bucket does not already exist, service-backup will create it. It uses the API default attributes for the bucket, summarised here: Bucket ACL rules: for the project that owns the configured service account, owners own the bucket, editors can write the bucket (CRUD objects), viewers can read the bucket. Objects in bucket ACL rules: for the project that owns the configured service account, owners and editors can read/write/delete the object, viewers can read the object. Location: US Storage class: Standard. See storage classes documentation.
If you create the bucket in advance, then you must ensure that the service account has access to write it. “Storage Admin” IAM permission should ensure this.
Google Cloud Storage version
The current release uses the Google GCS storage golang package at commit 86c12b7
to upload.
SCP
properties:
service-backup:
destinations:
- type: scp
name: <OPTIONAL: destination name>
config:
user: <ssh username>
server: <ssh server>
destination: <path to upload to on server>
fingerprint: <host-fingerprint> #optional
key: |
-----BEGIN EXAMPLE RSA PRIVATE KEY-----
...
-----END EXAMPLE RSA PRIVATE KEY-----
port: <optional ssh port. Defaults to 22>
The fingerprint
field expects the entire output in the format returned by the ssh-keyscan
utility for the host. If the fingerprint is provided and doesn’t match, then the backup will fail. If it’s empty then the fingerprint of the host will be requested right before the upload and this would be used instead. A fingerprint should be configured to prevent server spoofing or man-in-the-middle attacks. For more information refer: http://man.openbsd.org/ssh#authentication
SCP version
The current release leverages the scp
that is included in the stemcell. Check your stemcell for its scp
version.
Multiple destinations
properties:
service-backup:
destinations:
- type: s3
name: <OPTIONAL: destination name>
config:
bucket_name: <bucket>
bucket_path: <path in bucket>
access_key_id: <aws access key>
secret_access_key: <aws secret key>
- type: scp
name: <OPTIONAL: destination name>
config:
user: <ssh username>
server: <ssh server>
destination: <path to upload to on server>
key: |
-----BEGIN EXAMPLE RSA PRIVATE KEY-----
...
-----END EXAMPLE RSA PRIVATE KEY-----
port: <optional ssh port. Defaults to 22>
The tool can be provided with configuration for multiple destinations in the destinations
property. The tool will upload backups to all the provided destinations sequentially.
You can configure multiple destinations of the same type, for example: two S3 buckets in different regions.
Operating
Locating the Backups
If add_deployment_name_to_backup_path
is configured true
the tool will add the deployment name in your destination bucket / folder, for example <bucket-name>/<bucket-path>/<deployment-name>/*
.
The tool will then create a date-based folder structure in your destination bucket / folder as follows: <bucket-name>/<bucket-path>/YYYY/MM/DD/
or <bucket-name>/<bucket-path>/<deployment-name>/YYYY/MM/DD/
. The tool uses the BOSH VM it is running on to calculate the date, so for example if your VM is using UTC time, then the folder structure will reflect this.
For example, on S3 the provided path is appended with the current date such that the resultant path is /my/remote/path/inside/bucket/YYYY/MM/DD/
and hence the backups are accessible at s3://my-bucket-name/my/remote/path/inside/bucket/YYYY/MM/DD/
.
If add_deployment_name_to_backup_path
is configured true
and the deployment is called deployed-service
, the resultant path will be /my/remote/path/inside/bucket/deployed-service/YYYY/MM/DD/
and the backups are accessible at s3://my-bucket-name/my/remote/path/inside/bucket/deployed-service/YYYY/MM/DD/
.
Logging
Service backup logs to files in /var/vcap/sys/log/service-backup
, and also to syslog.
For forwarding syslog to a third party syslog drain (e.g. papertrail), Pivotal recommends colocating the syslog-release..
Monitoring
Here are log messages you may choose to monitor for. The log messages appear on one line and have been formatted here for easy reading.
Internal Checksums
If you are publishing a tile to be consumed by Ops Manager 1.8.x or 1.9.x, you will need to build your tile using releases with SHA1 internal checksums. Service Backup releases are published using SHA2 internal checksums. You can convert these releases to use SHA1 internal checksums using the BOSH CLI command sha1ify-release
.
Troubleshooting
ServiceBackup.Error scheduling job
This error occurs when the cron schedule is invalid, e.g. “* * * * * 99”:
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.Error scheduling job",
"log_level": "error",
"data": {
"error": "End of range (99) above maximum (6): 99"
}
}
ServiceBackup.No destination provided - skipping backup
This warning will be logged when no destination is provided.
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.No destination provided - skipping backup",
"log_level": "info",
"data": {}
}
ServiceBackup.Perform backup completed with error
This error will be logged when performing a backup fails, e.g. when the backup command exits status 1:
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.Perform backup completed with error",
"log_level": "error",
"data": {
"backup_guid": "3ebc4d08-62a3-4919-a02c-841c1afe51d0",
"error": "exit status 1"
}
}
ServiceBackup.Upload backup completed with error
This error will occur when uploading a backup fails. For example, when the S3 credentials provided are not authorised to create a bucket:
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.Upload backup completed with error",
"log_level": "error",
"data": {
"backup_guid": "3ae36910-3c1d-4e53-b28e-62105aee1e79",
"error": "error in create bucket: exit status 1, output: make_bucket failed: s3://doesnotexist5f7c0f16-bba4-49c9-9f60-371366edea3b An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied\n"
}
}
And when the SCP host fingerprint is invalid:
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.Upload backup completed with error",
"log_level": "error",
"data": {
"backup_guid": "7bd784ae-2375-43fd-bd9b-b4a43c2368c2",
"error": "error checking if remote path exists: 'exit status 255', output: 'No ECDSA host key is known for localhost and you have requested strict checking.\r\nHost key verification failed.\r\n'"
}
}
ServiceBackup.Backup currently in progress
This error will occur when the tool is configured with exit_if_in_progress: true
and a backup starts whilst another backup is in progress.
{
"timestamp": "2021-04-05T16:28:16.000000000Z",
"source": "ServiceBackup",
"message": "ServiceBackup.Backup currently in progress, exiting. Another backup will not be able to start until this is completed.",
"log_level": "error",
"data": {
"backup_guid": "94ca42ed-c289-4b15-a76a-23fe55d4955b",
"error": "backup operation rejected"
}
}
ServiceBackup.Error running
An unexpected error has occurred.