Skip to content

Frequently Asked Questions

See the Table of Contents on the right to jump to a specific question.


What's the fastest way to get started?

Provided you have Docker installed, a quick start approach to using Concourse on a local machine can be achieved by pasting this line in the terminal:

1
wget https://concourse-ci.org/docker-compose.yml && docker-compose up -d

This pulls the latest version of Concourse and runs it on your local machine. Concourse will be running at localhost:8080. You can log in with the username/password as test/test.

Next, install Concourse's CLI component, fly, by downloading it from the web UI and target your local Concourse as the test user:

1
fly -t tutorial login -c http://localhost:8080 -u test -p test

From here, you can head to a Hello World tutorial to start learning how to put Concourse to work.


Max Containers Error

After running smoothly for a period of time, Concourse sometimes gives a 'max containers reached' error and stops working. What gives? Currently support's recommendation is to recreate the worker VM, is there a better solution?

The short answer is that recreating worker VMs is the fastest and easiest way to solve the problem and get back up and running. This bug has been solved in later releases (from v5.5.4 onward), so upgrading is key to a long term fix.

Why does Concourse do this?

Originally, Concourse could only garbage-collect containers that were tracked in the database. In some niche cases, it is possible for containers and/or volumes to be created on the worker, but the database (via the web node) assumes their creation has failed. If this occurs, these untracked containers can pile up on the worker and use resources, eventually leading to the 'max containers reached' error.

How did the team fix this in v5.5.4?

Concourse now garbage collects containers that aren't tracked in the database, cleaning up these orphaned containers before they can cause problems. See Issue #3600 for more details.


High Database CPU

I am currently running a version of Concourse that is pre v6.0.0 and my database CPU is constantly under heavy load, what can I do to lower it if I cannot upgrade right away?

There were huge improvements with the scheduling algorithm that helped decrease database CPU within the v6.0.0 release, so the best method to solve DB load issues is to upgrade. However, if upgrading is not an option and the cause of the high database CPU is indeed due to the scheduling algorithm, there are two temporary workarounds.

Verify the problem by running DB Queries

One way to find out if the scheduling algorithm is the cause of high database CPU is to run a few queries in your database. After connecting to your database, you can run this query to find the slowest query. If you see the following queries within the results of the slowest queries, the high CPU is most likely due to the scheduling algorithm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-- query to fetch build inputs for a pipeline
SELECT v.id, v.check_order, r.id, i.build_id, i.name, b.job_id, b.status = 'succeeded' 
FROM build_resource_config_version_inputs i 
JOIN builds b ON b.id = i.build_id 
JOIN resource_config_versions v ON v.version_md5 = i.version_md5 
JOIN resources r ON r.id = i.resource_id 
WHERE r.resource_config_scope_id = v.resource_config_scope_id 
AND (r.id, v.version_md5) NOT IN (SELECT resource_id, version_md5 from resource_disabled_versions) 
AND v.check_order <> 0 AND r.pipeline_id = ?;

-- query to fetch resource versions for a pipeline
SELECT v.id, v.check_order, r.id 
FROM resource_config_versions v 
JOIN resources r ON r.resource_config_scope_id = v.resource_config_scope_id 
LEFT JOIN resource_disabled_versions d ON d.resource_id = r.id AND d.version_md5 = v.version_md5 
WHERE v.check_order <> 0 AND d.resource_id IS NULL AND d.version_md5 IS NULL AND r.pipeline_id = 1;

Which Workaround do I need?

First Workaround - Delete and Re-set Pipelines

The first workaround requires deleting and re-setting pipelines, meaning it will wipe out build history and resource version history. This workaround essentially reduces the number of rows in the tables that are causing the scheduling algorithm queries to be slow, which is the main cause of the high database CPU. It is only recommended if it is not important to keep around existing build and resource version history. You can use the following steps to implement this workaround.

  1. Identify busy pipelines that have a lot of resource versions and build history. There are a few ways to go about figuring out which pipelines are considered to be "busy". These are a few recommended ways to find the pipelines:

    • Manually figure out which pipelines have a lot of builds or resource versions. You can do so by hoping into the concourse UI and taking a look at the jobs in each pipeline. If they have more than (for example) 1000 builds or 100 resource versions, that would be a good threshold for determining if the pipeline is busy.
    • Determine which pipelines have a lot of build inputs from the result of this query. Build inputs are the real culprit to the slowness from the scheduling algorithm and this query will fetch the number of build inputs per pipeline, ordered by most to least.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    SELECT p.id as pipeline_id, p.name as pipeline_name, t.name as team_name, c.build_input_versions 
    FROM pipelines p 
    JOIN teams t ON p.team_id = t.id 
    LEFT JOIN (
      SELECT b.pipeline_id, count(*) as build_input_versions 
        FROM build_resource_config_version_inputs i 
        JOIN builds b ON b.id = i.build_id 
        GROUP BY b.pipeline_id ORDER BY count(*) DESC
        ) c
    ON c.pipeline_id = p.id
    ORDER BY c.build_input_versions DESC;
    
    - Another way to find busy pipelines is by finding pipelines that have a lot of builds. This query will list all the pipelines that have more than 1000 builds.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    SELECT p.id as pipeline_id, p.name as pipeline_name, t.name as team_name, b.num_builds
    FROM pipelines p
    JOIN teams t ON p.team_id = t.id 
    JOIN (
        SELECT b.pipeline_id, count(*) AS num_builds
        FROM builds b
        GROUP BY b.pipeline_id
        HAVING count(*) > 1000
    ) b
    ON b.pipeline_id = p.id
    ORDER BY b.num_builds DESC; 
    
  2. Once you have your list of busy pipelines, you can now pause them. You can either do this through fly by manually running fly -t <target> pause-pipeline -p <pipeline-name> within the appropriate teams or you can pause the pipelines through the database using the following query:

1
2
3
UPDATE pipelines
SET paused = true
WHERE id IN (<list of busy pipeline ids>);
  1. After all the busy pipelines have been paused, you now need to re-name them so that we can re-use the original names for the new piplines. For example, a renamed pipeline might be set to <pipeline-name>-old. These paused and renamed pipelines will exist in order to keep around the build history. You can do so through fly by manually setting the pipelines fly -t <target> rename-pipeline --old-name <pipeline-name> --new-name <pipeline-name>-old or you can rename the pipelines through the database using the following query:
1
2
3
UPDATE pipelines
SET name = name || '-old'
WHERE id IN (<list of busy pipeline ids>);
  1. Set and unpause new pipelines using the same pipeline configs of the busy pipelines. These will be the pipelines that will continue to run workloads but without all the past histories of builds and resource versions.
  2. The last step is to destroy all the paused pipelines. You can do so through fly by manually destroying each pipeline fly -t <target> destroy-pipeline -p <renamed-pipeline-name> or you can destroy the pipelines through the database using the following query:
1
2
DELETE FROM pipelines
WHERE id IN (<list of busy pipeline ids>);

Second Workaround - Manually Delete Resource Versions

This workaround involves manually deleting resource versions from the database. It requires access to the database and will remove a number of old resource versions. In order to implement this workaround, the only step is to run the following query. The suggested number of max_versions is 100, which means that it will keep around 100 resource versions for every resource and all versions older than the 100th version will be deleted. This max_versions number can be adjusted by modifying the number passed into the last_check_order_to_retain function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
DROP FUNCTION last_check_order_to_retain;
CREATE FUNCTION last_check_order_to_retain(max_versions integer) RETURNS TABLE (rcs integer, check_order_retain integer, result integer, deleted_versions integer) as
$$
DECLARE
    r RECORD;

BEGIN

    FOR r IN
        SELECT count(*) a, resource_config_scope_id 
        FROM resource_config_versions 
        WHERE check_order != 0 
        GROUP BY resource_config_scope_id 
        HAVING count(*) > max_versions 
        ORDER BY a DESC
    LOOP
        rcs := r.resource_config_scope_id;
        result := r.a;

        SELECT check_order INTO check_order_retain 
        FROM resource_config_versions
        WHERE resource_config_scope_id = r.resource_config_scope_id AND check_order != 0
        ORDER BY check_order DESC
        OFFSET max_versions LIMIT 1;
        WITH deleted AS (
            DELETE FROM resource_config_versions
            WHERE resource_config_scope_id = r.resource_config_scope_id and check_order < check_order_retain and check_order != 0 and version not in (
                SELECT version
                FROM resource_pins
                WHERE resource_id in (
                    SELECT resource_id
                    FROM resource_config_scopes
                    WHERE id = r.resource_config_scope_id
                )
            ) RETURNING *)
        SELECT count(*) INTO deleted_versions FROM deleted;     
        RETURN NEXT;
    END LOOP;
END;
$$ language plpgsql;

-- You can change number passed in as a parameter. It is default set to 100.
select * from last_check_order_to_retain(100);