Monitoring and Troubleshooting Your Installation

This topic broadly outlines techniques for troubleshooting your Concourse OLM installation.

Concourse OLM Log Collection

Find the logs for a specific job in the VM on which that job was running. Those logs are stored at /var/vcap/sys/log/<CONCOURSE-JOB-NAME>/*.log.

Troubleshooting With Fly Commands

Concourse OLM environment troubleshooting

containers: Lists active containers This confirms which container or task got placed on which worker workers: Lists registered workers This helps you verify that the number of containers aren’t exceeding maximum allowable number of containers on a worker. prune-worker: Reaps a non-running worker Stops Concourse OLM from tracking an out-of-commission worker. volumes: Lists active volumes Checks disk usage across workers.

Pipeline troubleshooting:

pipelines: Lists configured pipelines builds: Shows build history This is useful for getting build IDs of one-off tasks you’ve run using execute. validate-pipeline: Validates a pipeline’s configuration Checks pipeline for validity without calling set-pipeline. check-resource: Checks for new versions
This is useful when developing a new resource. watch: View logs of in-progress builds intercept: Accesses a running or recent build’s steps execute: Submits local tasks This is useful for spinning up a task quickly to test before putting it in a job.

Common Concourse OLM Issues

Problem Error Solution Other information
The worker is out of disk space An error displays about inability to create volume. It may say permissions are denied. Increase persistent disk for worker or increase number of worker VMs. N/A
Container limit reached Cannot create container: limit of 250 containers reached Check fly containers
Increase number of worker VMs
Decrease gc_interval if set to custom value (a large interval could mean that expired containers are kept around long enough that they build up)
This error state is unlikely to appear.
Job doesn’t start This error may present as the build getting stuck in Pending state Restart the ATC job.
To restart the ATC, log in as a root user on Concourse OLM web VMs (the ones on which ATC job is located), then running monit restart atc.
N/A
Updating Concourse OLM in Concourse OLM job fails When a build fails after BOSH deploying a Concourse OLM update from a job running on that Concourse OLM instance, typically the job will error with “worker for container not found.” This is expected behavior; the BOSH Director will recreate the worker VM. Run the job again. N/A
Bosh Can’t Finish Worker Upgrade While Tasks Running If you have a long running task, BOSH won’t be able to finalize the upgrade by restarting the worker job until all of the work has stopped. Wait for the work to complete.
If you need to accomplish this quickly, cancel running tasks and jobs.
N/A

Other Troubleshooting Resources

For information about metrics for monitoring Concourse, see Metrics. For information about enabling syslog forwarding and about getting logs for other Concourse components, see VM Logs. For information about common BOSH issues, see the BOSH tips.

Create a pull request or raise an issue on the source for this page in GitHub