Skip to content

Monitoring and Troubleshooting

This topic broadly outlines techniques for troubleshooting your Concourse for VMware Tanzu installation. Jump to a topic with the table of contents on the right side of the page.


Troubleshoot with Fly Commands

fly is the Concourse CLI. To get help on all fly commands, run:

1
fly --help

In particular, the fly commands and summaries listed below provide useful information to help you troubleshoot Concourse environments and pipelines.


Troubleshoot Concourse for VMware Tanzu Environments

You can use the following fly commands to troubleshoot possible environment problems.

fly Command Short Description
containers Lists active containers, their type, and which worker they are running on.
workers Lists registered workers. This helps you verify that the number of containers does not exceed the maximum number allowable.
prune-worker Removes a non-running worker. Stops Concourse from tracking an out-of-commission worker.
volumes Lists active volumes and the worker on which they are located.

Troubleshoot Pipelines

You can use the following fly commands to troubleshoot possible pipeline problems.

fly Command Description
pipelines Lists configured pipelines.
builds Shows build history. This is useful for listing build IDs of one-off tasks ran previously using execute.
validate-pipeline Validates a pipeline's configuration without calling set-pipeline.
check-resource Checks for new versions. This is useful when developing a new resource.
watch Shows logs of in-progress builds.
intercept Displays build steps for a running or recent build and optionally connect to one of the active containers.
execute Submits local tasks. This is useful for spinning up a task quickly to test it before putting it in a job.

Common Concourse Issues

The following shows some common problems and solutions.

The worker is out of disk space

Problem: An error states there is an inability to create a storage volume and might state that permissions are denied.

Solution: Increase persistent disk size for the worker or increase the number of worker VMs.

Container limit is reached

Problem: Cannot create container: limit of 250 containers reached. This error state is unlikely to appear.

Solution: Increase the number of worker VMs. Change container placement strategy. Decrease gc_interval if set to custom value. A large interval could mean that there are too many expired containers.

Job does not start

Problem: This error might present as the build getting stuck in Pending state.

Solution: Restart the ATC job: Log in as a root user on the Concourse web VMs where the ATC job is located. Alternatively, run the monit restart atc command.

Build fails when updating Concourse from a job on the same instance

Problem: When a build fails after BOSH deploys a Concourse update from a job running on that Concourse instance, typically the job fails with a "worker for container not found" error.

Solution: This is expected behavior; the BOSH Director re-creates the worker VM. Run the job again.

BOSH cannot finish worker upgrade while tasks are running

Problem: BOSH is not able to restart the worker job to finalize the upgrade until all work is completed.

Solution: If you have a long-running task, wait for the task to be completed. If you need to upgrade quickly, cancel running tasks and jobs.


Access Concourse Logs

You might need to contact VMware Global Support Services for help identifying a problem. In that case, support might ask you to send job log files.

For general information about accessing log files, see Location and use of logs in the BOSH documentation and Advanced Troubleshooting with the BOSH CLI.


Other Troubleshooting Resources

The following links provide other troubleshooting resources.

Topic Link
Enabling syslog forwarding and getting logs for other Concourse components VM Logs
Common BOSH issues BOSH tips