Monitoring and Troubleshooting

This topic broadly outlines techniques for troubleshooting your Concourse for PCF installation.

Troubleshooting with Fly Commands

To get help on all fly commands, run: fly --help.

The following selected fly commands give you useful information to help you troubleshoot Concourse for PCF environments and pipelines.

Troubleshooting Concourse for PCF Environments

You can use the following fly commands to troubleshoot possible environment problems.

Fly CommandDescription
containersLists active containers. This confirms which container or task is placed on which worker.
workersLists registered workers. This helps you verify that the number of containers does not exceed the maximum number allowable.
prune-workerRemoves a non-running worker. Stops Concourse for PCF from tracking an out-of-commission worker.
volumesLists active volumes. Checks disk usage across workers.

Troubleshooting Pipelines

You can use the following fly commands to troubleshoot possible pipeline problems.

Fly Command Description
pipelinesLists configured pipelines.
buildsShows build history. This is useful for listing build IDs of one-off tasks ran previously using execute.
validate-pipelineValidates a pipeline’s configuration without calling set-pipeline.
check-resourceChecks for new versions. This is useful when developing a new resource.
watchShows logs of in-progress builds.
interceptDisplays build steps for a running or recent build.
executeSubmits local tasks. This is useful for spinning up a task quickly to test it before putting it in a job.

Common Concourse for PCF Issues

The following shows some common problems and solutions.

Problem Error Description Solution
The worker is out of disk space An error states there is an inability to create a storage volume and may state that permissions are denied. Increase persistent disk size for the worker or increase the number of worker VMs.
Container limit is reached Cannot create container: limit of 250 containers reached

This error state is unlikely to appear.
  1. Check fly containers.
  2. Increase number of worker VMs.
  3. Decrease gc_interval if set to custom value. A large interval could mean that there are too many expired containers.
Job does not start This error may present as the build getting stuck in Pending state. Restart the ATC job:
  1. Log in as a root user on Concourse for PCF web VMs where the ATC job is located.
  2. Run the monit restart atc command.
Build fails when updating Concourse for PCF from a job on the same instance When a build fails after BOSH deploys a Concourse for PCF update from a job running on that Concourse for PCF instance, typically the job fails with a “worker for container not found” error. This is expected behavior; the BOSH Director will recreate the worker VM. Run the job again.
BOSH cannot finish worker upgrade while tasks are running BOSH is not able to restart the worker job to finalize the upgrade until all work is completed. If you have a long-running task, wait for the task to be completed.
If you need to upgrade quickly, cancel running tasks and jobs.

Access Concourse Logs

You might need to contact Pivotal Support for help identifying a problem. In that case, support might ask you to send job log files.

For general information about accessing log files, see Location and use of logs and Advanced Troubleshooting with the BOSH CLI.

Other Troubleshooting Resources

The following links provide other troubleshooting resources.

TopicLink
Enabling syslog forwarding and getting logs for other Concourse componentsVM Logs
Common BOSH issuesBOSH tips
Create a pull request or raise an issue on the source for this page in GitHub