Frequently Asked Questions
See the Table of Contents on the right to jump to a specific question.
What's the fastest way to get started?
Provided you have Docker installed, a quick start approach to using Concourse on a local machine can be achieved by pasting this line in the terminal:
1 |
|
This pulls the latest version of Concourse and runs it on your local machine. Concourse will be running at localhost:8080. You can log in with the username/password as test
/test
.
Next, install Concourse's CLI component, fly, by downloading it from the web UI and target your local Concourse as the test user:
1 |
|
From here, you can head to a Hello World tutorial to start learning how to put Concourse to work.
Max Containers Error
After running smoothly for a period of time, Concourse sometimes gives a 'max containers reached' error and stops working. What gives? Currently support's recommendation is to recreate the worker VM, is there a better solution?
The short answer is that recreating worker VMs is the fastest and easiest way to solve the problem and get back up and running. This bug has been solved in later releases (from v5.5.4 onward), so upgrading is key to a long term fix.
Why does Concourse do this?
Originally, Concourse could only garbage-collect containers that were tracked in the database. In some niche cases, it is possible for containers and/or volumes to be created on the worker, but the database (via the web node) assumes their creation has failed. If this occurs, these untracked containers can pile up on the worker and use resources, eventually leading to the 'max containers reached' error.
How did the team fix this in v5.5.4?
Concourse now garbage collects containers that aren't tracked in the database, cleaning up these orphaned containers before they can cause problems. See Issue #3600 for more details.
High Database CPU
I am currently running a version of Concourse that is pre v6.0.0 and my database CPU is constantly under heavy load, what can I do to lower it if I cannot upgrade right away?
There were huge improvements with the scheduling algorithm that helped decrease database CPU within the v6.0.0 release, so the best method to solve DB load issues is to upgrade. However, if upgrading is not an option and the cause of the high database CPU is indeed due to the scheduling algorithm, there are two temporary workarounds.
Verify the problem by running DB Queries
One way to find out if the scheduling algorithm is the cause of high database CPU is to run a few queries in your database. After connecting to your database, you can run this query to find the slowest query. If you see the following queries within the results of the slowest queries, the high CPU is most likely due to the scheduling algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Which Workaround do I need?
- If the slowest query is the one to fetch build inputs for a pipeline, the first workaround, Delete and Re-Set Pipelines is recommended.
- If the slowest query is the query to fetch all the resource versions in a pipeline, the second workaround, is recommended.
First Workaround - Delete and Re-set Pipelines
The first workaround requires deleting and re-setting pipelines, meaning it will wipe out build history and resource version history. This workaround essentially reduces the number of rows in the tables that are causing the scheduling algorithm queries to be slow, which is the main cause of the high database CPU. It is only recommended if it is not important to keep around existing build and resource version history. You can use the following steps to implement this workaround.
-
Identify busy pipelines that have a lot of resource versions and build history. There are a few ways to go about figuring out which pipelines are considered to be "busy". These are a few recommended ways to find the pipelines:
- Manually figure out which pipelines have a lot of builds or resource versions. You can do so by hoping into the concourse UI and taking a look at the jobs in each pipeline. If they have more than (for example) 1000 builds or 100 resource versions, that would be a good threshold for determining if the pipeline is busy.
- Determine which pipelines have a lot of build inputs from the result of this query. Build inputs are the real culprit to the slowness from the scheduling algorithm and this query will fetch the number of build inputs per pipeline, ordered by most to least.
- Another way to find busy pipelines is by finding pipelines that have a lot of builds. This query will list all the pipelines that have more than 1000 builds.1 2 3 4 5 6 7 8 9 10 11
SELECT p.id as pipeline_id, p.name as pipeline_name, t.name as team_name, c.build_input_versions FROM pipelines p JOIN teams t ON p.team_id = t.id LEFT JOIN ( SELECT b.pipeline_id, count(*) as build_input_versions FROM build_resource_config_version_inputs i JOIN builds b ON b.id = i.build_id GROUP BY b.pipeline_id ORDER BY count(*) DESC ) c ON c.pipeline_id = p.id ORDER BY c.build_input_versions DESC;
1 2 3 4 5 6 7 8 9 10 11
SELECT p.id as pipeline_id, p.name as pipeline_name, t.name as team_name, b.num_builds FROM pipelines p JOIN teams t ON p.team_id = t.id JOIN ( SELECT b.pipeline_id, count(*) AS num_builds FROM builds b GROUP BY b.pipeline_id HAVING count(*) > 1000 ) b ON b.pipeline_id = p.id ORDER BY b.num_builds DESC;
-
Once you have your list of busy pipelines, you can now pause them. You can either do this through fly by manually running
fly -t <target> pause-pipeline -p <pipeline-name>
within the appropriate teams or you can pause the pipelines through the database using the following query:
1 2 3 |
|
- After all the busy pipelines have been paused, you now need to re-name them so that we can re-use the original names for the new piplines. For example, a renamed pipeline might be set to
<pipeline-name>-old
. These paused and renamed pipelines will exist in order to keep around the build history. You can do so through fly by manually setting the pipelinesfly -t <target> rename-pipeline --old-name <pipeline-name> --new-name <pipeline-name>-old
or you can rename the pipelines through the database using the following query:
1 2 3 |
|
- Set and unpause new pipelines using the same pipeline configs of the busy pipelines. These will be the pipelines that will continue to run workloads but without all the past histories of builds and resource versions.
- The last step is to destroy all the paused pipelines. You can do so through fly by manually destroying each pipeline
fly -t <target> destroy-pipeline -p <renamed-pipeline-name>
or you can destroy the pipelines through the database using the following query:
1 2 |
|
Second Workaround - Manually Delete Resource Versions
This workaround involves manually deleting resource versions from the database. It requires access to the database and will remove a number of old resource versions. In order to implement this workaround, the only step is to run the following query. The suggested number of max_versions
is 100, which means that it will keep around 100 resource versions for every resource and all versions older than the 100th version will be deleted. This max_versions
number can be adjusted by modifying the number passed into the last_check_order_to_retain
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|