Monitoring and Troubleshooting
This topic broadly outlines techniques for troubleshooting your Concourse for VMware Tanzu installation. Jump to a topic with the table of contents on the right side of the page.
Troubleshoot with Fly Commands
fly is the Concourse CLI. To get help on all
fly commands, run:
In particular, the
fly commands and summaries listed below provide useful information to help you troubleshoot Concourse environments and pipelines.
Troubleshoot Concourse for VMware Tanzu Environments
You can use the following
fly commands to troubleshoot possible environment problems.
|fly Command||Short Description|
||Lists active containers, their type, and which worker they are running on.|
||Lists registered workers. This helps you verify that the number of containers does not exceed the maximum number allowable.|
||Removes a non-running worker. Stops Concourse from tracking an out-of-commission worker.|
||Lists active volumes and the worker on which they are located.|
You can use the following
fly commands to troubleshoot possible pipeline problems.
||Lists configured pipelines.|
||Shows build history. This is useful for listing build IDs of one-off tasks ran previously using
||Validates a pipeline's configuration without calling
||Checks for new versions. This is useful when developing a new resource.|
||Shows logs of in-progress builds.|
||Displays build steps for a running or recent build and optionally connect to one of the active containers.|
||Submits local tasks. This is useful for spinning up a task quickly to test it before putting it in a job.|
Common Concourse Issues
The following shows some common problems and solutions.
The worker is out of disk space
Problem: An error states there is an inability to create a storage volume and might state that permissions are denied.
Solution: Increase persistent disk size for the worker or increase the number of worker VMs.
Container limit is reached
Problem: Cannot create container: limit of 250 containers reached. This error state is unlikely to appear.
Solution: Increase the number of worker VMs. Change container placement strategy. Decrease
gc_interval if set to custom value. A large interval could mean that there are too many expired containers.
Job does not start
Problem: This error might present as the build getting stuck in Pending state.
Solution: Restart the ATC job: Log in as a root user on the Concourse web VMs where the ATC job is located. Alternatively, run the
monit restart atc command.
Build fails when updating Concourse from a job on the same instance
Problem: When a build fails after BOSH deploys a Concourse update from a job running on that Concourse instance, typically the job fails with a "worker for container not found" error.
Solution: This is expected behavior; the BOSH Director re-creates the worker VM. Run the job again.
BOSH cannot finish worker upgrade while tasks are running
Problem: BOSH is not able to restart the worker job to finalize the upgrade until all work is completed.
Solution: If you have a long-running task, wait for the task to be completed. If you need to upgrade quickly, cancel running tasks and jobs.
Access Concourse Logs
You might need to contact VMware Global Support Services for help identifying a problem. In that case, support might ask you to send job log files.
Other Troubleshooting Resources
The following links provide other troubleshooting resources.
|Enabling syslog forwarding and getting logs for other Concourse components||VM Logs|
|Common BOSH issues||BOSH tips|