Monitoring and Troubleshooting

This topic broadly outlines techniques for troubleshooting your Concourse for PCF installation.

Concourse for PCF Log Collection

Find the logs for a specific job in the VM on which that job was running. Those logs are stored at /var/vcap/sys/log/<CONCOURSE-JOB-NAME>/*.log.

Troubleshooting With Fly Commands

Concourse for PCF environment troubleshooting

containers: Lists active containers
This confirms which container or task got placed on which worker
workers: Lists registered workers
This helps you verify that the number of containers aren’t exceeding maximum allowable number of containers on a worker.
prune-worker: Reaps a non-running worker
Stops Concourse for PCF from tracking an out-of-commission worker.
volumes: Lists active volumes
Checks disk usage across workers.

Pipeline troubleshooting:

pipelines: Lists configured pipelines
builds: Shows build history
This is useful for getting build IDs of one-off tasks you’ve run using execute.
validate-pipeline: Validates a pipeline’s configuration
Checks pipeline for validity without calling set-pipeline.
check-resource: Checks for new versions
This is useful when developing a new resource.
watch: View logs of in-progress builds
intercept: Accesses a running or recent build’s steps
execute: Submits local tasks
This is useful for spinning up a task quickly to test before putting it in a job.

Common Concourse for PCF Issues

Problem Error Solution Other information
The worker is out of disk space An error displays about inability to create volume. It may say permissions are denied. Increase persistent disk for worker or increase number of worker VMs. N/A
Container limit reached Cannot create container: limit of 250 containers reached Check fly containers
Increase number of worker VMs
Decrease gc_interval if set to custom value (a large interval could mean that expired containers are kept around long enough that they build up)
This error state is unlikely to appear.
Job doesn’t start This error may present as the build getting stuck in Pending state Restart the ATC job.
To restart the ATC, log in as a root user on Concourse for PCF web VMs (the ones on which ATC job is located), then running monit restart atc.
Updating Concourse for PCF in Concourse for PCF job fails When a build fails after BOSH deploying a Concourse for PCF update from a job running on that Concourse for PCF instance, typically the job will error with “worker for container not found.” This is expected behavior; the BOSH Director will recreate the worker VM. Run the job again. N/A
BOSH Can’t Finish Worker Upgrade While Tasks Running If you have a long running task, BOSH won’t be able to finalize the upgrade by restarting the worker job until all of the work has stopped. Wait for the work to complete.
If you need to accomplish this quickly, cancel running tasks and jobs.

Other Troubleshooting Resources

For information about metrics for monitoring Concourse, see Metrics. For information about enabling syslog forwarding and about getting logs for other Concourse components, see VM Logs. For information about common BOSH issues, see the BOSH tips.

Create a pull request or raise an issue on the source for this page in GitHub