Configuring Installation Values for VMware Spring Cloud® Data Flow for Kubernetes

This topic describes how to configure the VMware Spring Cloud® Data Flow for Kubernetes (SCDF for Kubernetes) installation resources before deploying to the Kubernetes cluster. Before configuring the installation resources, ensure that you have followed the steps described in Preparing to Install SCDF for Kubernetes and Installing Command-Line Tools.

Overview

Configuration of SCDF for Kubernetes involves a number of features:

  • Support for multiple deployment environments
  • Application properties configuration
  • Kubernetes resources configuration
  • Support for air-gapped environments

For information about these features and the steps to configure them, see the following sections.

Support for Multiple Deployment Environments

The SCDF for Kubernetes installation process uses Kustomize, a tool for customizing Kubernetes resource configuration. For each resource, Kustomize takes a base configuration file and applies patch files that modify the base configuration to target different deployment environments. Kustomize is built into the kubectl CLI.

SCDF for Kubernetes provides Kustomize configuration files for a development environment and for a production environment. The development environment is intended for quick exploration of SCDF for Kubernetes. Its configuration provisions a RabbitMQ message broker and a PostgreSQL database.

Directory Structure

The “SCDF for Kubernetes installation files” archive, downloaded from VMware Tanzu Network, contains the following directory structure:

  • bin: Scripts for installing to the development environment, as well as a script for performing image relocation
  • apps: Kubernetes resource files and application configuration files to support deployment to multiple environments
  • services: Kubernetes resource files for deploying RabbitMQ and PostgreSQL to the development environment

The apps directory contains the following directory structure, based on Kustomize naming conventions.

.
├── data-flow
│   ├── images
│   ├── kustomize
│   │   ├── base
│   │   └── overlays
│   └── schemas
├── ingress
│   └── kustomize
│       ├── base
│       └── overlays
└── skipper
    ├── images
    ├── kustomize
    │   ├── base
    │   └── overlays
    └── schemas

Each application included as part of SCDF for Kubernetes has a directory: the data-flow directory contains the Data Flow server, and the skipper directory contains the Skipper server. The ingress directory contains configuration files for an Ingress resource.

Each of the two application directories contains the following directory structure:

  • images: Container images (only present if you have downloaded and extracted the “SCDF for Kubernetes installation images” archive)
  • kustomize: Kubernetes and application configuration files for use with Kustomize
  • schemas: Database schemas (you can install these manually if you do not want each server to install them on startup)

For each application, you can edit files in the kustomize directory to configure the application and Kubernetes resources. The contents of the directory are shown below.

.
├── base
│   ├── deployment.yaml
│   ├── kustomization.yaml
│   ├── role-binding.yaml
│   ├── roles.yaml
│   ├── service-account.yaml
│   └── service.yaml
└── overlays
    ├── dev
    │   ├── application.yaml
    │   ├── deployment-patch.yaml
    │   ├── kustomization.yaml
    │   └── service-patch.yaml
    └── production
        ├── application.yaml
        ├── deployment-patch.yaml
        ├── kustomization.yaml
        └── service-patch.yaml

The base Directory

The base directory contains Kubernetes resource files with default configuration values. These files are then patched by files in the overlays directory. The kustomization.yaml file references the other YAML files in the directory.

The overlays Directory

The overlays directory contains one subdirectory for each deployment environment: a dev subdirectory and a production subdirectory. Within each subdirectory, a kustomization.yaml file references the kustomization.yaml file in the base directory, and adds additional patches to modify values that are unique to the target deployment environment.

You can set the target namespace to deploy, and add labels and name prefixes, by adding additional configuration fields to the kustomization.yaml file. See the Kustomize documentation for information about the additional fields that are available.

Configuration Steps

Within each environment-specific subdirectory in the overlays directory, the application.yaml file is a Spring Boot configuration file that you can edit to configure application (the Data Flow server or Skipper server). For example, you can edit application.yaml to configure spring.datasource properties or spring.cloud.dataflow.features properties.

Also within each environment-specific subdirectory, the deployment-patch.yaml and service-patch.yaml files are used to patch the base Kubernetes resource files, substituting values that are appropriate for the specific environment. For example, you might configure memory or CPU allocations and ingress configuration using one set of values for your dev environment and another for your production environment.

For information about how to configure specific settings, see the following sections.

Database Configuration

You can customize database configuration settings in the application.yaml and deployment-patch.yaml files. These files are preconfigured to connect to a PostgreSQL database, which can be installed either by using the manifests provided in services/dev/postgresql or by installing the bitnami/postgresql Helm chart.

Database Connections Settings

To use a different database, you must replace or remove the secret volume and volumeMount in the deployment-patch.yaml file. If you wish to replace the secret, VMware recommends that you provide database-password as the path for the password key. This minimizes the changes needed in the application.yaml file.

You must modify database settings for both the skipper application and the data-flow application. The following example uses a secret named postgresql with a postgresql-password key. (You can specify additional keys that are available in the secret, as long as they reference the paths that you specify in the application.yaml file.)

      containers:
      - name: skipper
        volumeMounts:
          - name: database
            mountPath: /workspace/runtime/secrets/database
            readOnly: true
  ...

      volumes:
        - name: database
          secret:
            secretName: postgresql
            items:
            - key: postgresql-password
              path: database-password

In the application.yaml file, you must edit settings under spring.datasource. Update the url, username, password, and driverClassName values to match your database settings, as shown in the example below.

spring:

  ...

  datasource:
    url: jdbc:postgresql://database-host:5432/dataflow-db
    username: postgres
    password: ${database-password}
    driverClassName: org.postgresql.Driver
    testOnBorrow: true
    validationQuery: "SELECT 1"

Initialize the Database Schema

To initialize your database schema, you can use the DDL scripts provided in apps/skipper/schemas and apps/data-flow/schemas. SCDF for Kubernetes provides DDL scripts for IBM DB2, MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. If applying multiple files, ensure that you apply them in the correct order (the V1- file first, then the V2- file, then the V3- file, and so on).

If you manually initialize the schema for the database, you can disable automatic database initialization for the Skipper and Data Flow server applications. Add the following settings to the application.yaml files for the applications:

spring:

  ...

  jpa:
    hibernate:
      ddlAuto: none
  flyway:
    enabled: false

Add a JDBC Driver (Optional)

The published application artifacts for SCDF for Kubernetes include JDBC drivers for MySQL (via the MariaDB driver), HSQLDB, PostgreSQL, and embedded H2. If you wish, you can instead use your own JDBC driver.

To add a JDBC driver, perform the following steps:

  1. Download the “SCDF Pro Server jar file” release artifact from VMware Tanzu Network.

  2. Download the server JAR files for Skipper and the Composed Task Runner:

    $ wget https://repo1.maven.org/maven2/org/springframework/cloud/spring-cloud-skipper-server/2.6.1/spring-cloud-skipper-server-2.6.1.jar
    $ wget https://repo1.maven.org/maven2/org/springframework/cloud/spring-cloud-dataflow-composed-task-runner/2.7.1/spring-cloud-dataflow-composed-task-runner-2.7.1.jar
    
  3. Within each application directory (the apps/data-flow and apps/skipper directories), create a directory for the new JDBC driver:

    $ mkdir -p BOOT-INF/lib
    
  4. Copy your JDBC driver jar to the new directories.

  5. Add the driver to the downloaded JAR files:

    $ jar -u0f spring-cloud-skipper-server-2.6.1.jar BOOT-INF/lib/*.jar
    $ jar -u0f scdf-pro-server-1.1.2.jar BOOT-INF/lib/*.jar
    $ jar -u0f spring-cloud-dataflow-composed-task-runner-2.7.1.jar BOOT-INF/lib/*.jar
    
  6. Create new container images using the updated server JAR files. The easiest way to build new container images is to use the pack CLI utility. Use a base image that is based on JDK 8.

     

    First, create a registry_prefix environment variable containing the prefix that you want to use:

    $ export registry_prefix=registry.example.com/example
    

    Then build the three container images.

     

    To build the container image for the skipper app, run the following command:

    $ pack build --env BP_JVM_VERSION=8 \
      --path spring-cloud-skipper-server-2.6.1.jar \
      --builder gcr.io/paketo-buildpacks/builder:base \
      --publish ${registry_prefix}/spring-cloud-skipper-server:2.6.1-jdbc
    

    To build the container image for the data-flow app, run the following command:

    $ pack build --env BP_JVM_VERSION=8 \
      --path scdf-pro-server-1.1.2.jar \
      --builder gcr.io/paketo-buildpacks/builder:base \
      --publish ${registry_prefix}/scdf-pro-server:1.1.2-jdbc
    

    To build the container image for the composed-task-runner app, run the following command:

    $ pack build --env BP_JVM_VERSION=8 \
      --path spring-cloud-dataflow-composed-task-runner-2.7.1.jar \
      --builder gcr.io/paketo-buildpacks/builder:base \
      --publish ${registry_prefix}/spring-cloud-dataflow-composed-task-runner:2.7.1-jdbc
    
  7. Update your Kustomize manifests with the locations of the new images.

     

    For the skipper application, update the newName and newTag values in apps/skipper/kustomize/overlays/production/kustomization.yaml. Update the digest field with the new image digest, or remove the digest field altogether.

    images:
    - name: springcloud/spring-cloud-skipper-server  # Used for Kustomize matching
      newName: registry.example.com/example/spring-cloud-skipper-server
      newTag: 2.6.1-jdbc
    configMapGenerator:
    - name: skipper
      files:
        - bootstrap.yaml
        - application.yaml
    bases:
      - ../../base
    patches:
    - deployment-patch.yaml
    - service-patch.yaml
    

    For the data-flow application, update the newName and newTag values in apps/data-flow/kustomize/overlays/production/kustomization.yaml. Update the digest field with the new image digest, or remove the digest field altogether.

    images:
    - name: springcloud/spring-cloud-dataflow-server  # Used for Kustomize matching
      newName: registry.example.com/example/scdf-pro-server
      newTag: 1.1.2-jdbc
      configMapGenerator:
      - name: scdf-server
        files:
          - bootstrap.yaml
          - application.yaml
      bases:
        - ../../base
      patches:
      - deployment-patch.yaml
      - service-patch.yaml
    

    For the composed-task-runner image, update the apps/data-flow/kustomize/overlays/production/application.yaml file. Update the spring.cloud.dataflow.task.composedTaskRunnerUri entry to match your new image. The value should be prefixed with docker://. You can provide either a tag or a digest that matches your new image. See the following example:

     spring:
      cloud:
        config:
          enabled: false
        dataflow:
          server:
            uri: http://${SCDF_SERVER_SERVICE_HOST}:${SCDF_SERVER_SERVICE_PORT}
          features:
            schedules-enabled: true
          task:
            composedTaskRunnerUri: docker://registry.example.com/my-project/spring-cloud-dataflow-composed-task-runner:2.7.1-jdbc
    
    ...
    

If you add your own JDBC driver, you should manually initialize the database schema and disable automatic database initialization for the Skipper and Data Flow server applications. See Initializing the Database Schema.

Message Broker Configuration

You can customize message broker configuration settings in the application.yaml and the deployment-patch.yaml files. These files are preconfigured to connect to a RabbitMQ broker, which can be installed either by using the manifests provided in services/dev/rabbitmq or by installing the bitnami/rabbitmq Helm chart.

Use RabbitMQ as the Message Broker

If you want to use RabbitMQ as the message broker, review the skipper application settings provided in the Kustomize overlays for the dev and production environments. As mentioned earlier, the default configuration connects to a RabbitMQ broker that runs in the same namespace as SCDF for Kubernetes.

If your RabbitMQ broker runs in a different namespace, or if you use an external broker, you must edit the deployment-patch.yaml file in dev or production overlays. In the volume and container.volumeMounts sections, remove references to the rabbitmq secret.

The RabbitMQ broker connection settings are located in the application.yaml file for the Skipper application. This file is included in the dev and production overlays. When deploying a Spring Cloud Stream application, SCDF for Kubernetes passes the value of the environmentVariables field to the application.

The configuration must include the following environment variables:

  • SPRING_RABBITMQ_HOST
  • SPRING_RABBITMQ_PORT
  • SPRING_RABBITMQ_USERNAME
  • SPRING_RABBITMQ_PASSWORD

If the RabbitMQ cluster runs in the same namespace as SCDF for Kubernetes, you can reference the service environment variables provided by Kubernetes. If not, replace these references with the actual values.

spring:
  cloud:
    skipper:
      server:
        platform:
          kubernetes:
            accounts:
              default:
                environmentVariables: 'SPRING_CLOUD_CONFIG_ENABLED=false,SPRING_RABBITMQ_HOST=${RABBITMQ_SERVICE_HOST},SPRING_RABBITMQ_PORT=${RABBITMQ_SERVICE_PORT_AMQP},SPRING_RABBITMQ_USERNAME=user,SPRING_RABBITMQ_PASSWORD=${rabbitmq-password}'

Setting SPRING_CLOUD_CONFIG_ENABLED to false prevents Spring Cloud Stream applications from attempting to discover a Spring Cloud Config server instance.

Use Kafka as the Message Broker

If you want to use Kafka as the message broker, review the skipper application settings provided in the Kustomize overlays for the dev and production environments. Edit the deployment-patch.yaml file. In the volume and container.volumeMounts sections, remove references to the rabbitmq secret.

The Kafka broker connection settings are located in the application.yaml file for the Skipper application. This file is included in the dev and production overlays.

You can either provide your own Kafka cluster, or use the bitnami/kafka Helm chart to install one. The following example assumes that you used the Helm chart. If using a different Kafka installation, adjust the environment variables specified in the application.yaml file.

Remove or comment out the line with environmentVariables for RabbitMQ settings and uncomment the line with the corresponding Kafka settings. The value of the environmentVariables property is passed to the deployed Spring Cloud Stream apps.

The configuration must include the following environment variables:

  • SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS (value: the host and port of the Kafka broker)
  • SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES (value: the host and port for ZooKeeper)

If the Kafka cluster runs in the same namespace as SCDF for Kubernetes, you can reference the service environment variables provided by Kubernetes. If not, replace these references with the actual values.

spring:
  cloud:
    skipper:
      server:
        platform:
          kubernetes:
            accounts:
              default:
                environmentVariables: 'SPRING_CLOUD_CONFIG_ENABLED=false,SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZOOKEEPER_SERVICE_HOST}:${KAFKA_ZOOKEEPER_SERVICE_PORT}'

Setting SPRING_CLOUD_CONFIG_ENABLED to false prevents Spring Cloud Stream applications from attempting to discover a Spring Cloud Config server instance.

Ingress Resource

If your Kubernetes cluster supports Kubernetes Services of type LoadBalancer, you may use that type of service to provision a load balancer via an Ingress Controller such as NGINX or Contour.

If your cluster has an ingress controller configured, you can use an Ingress resource manager to manage external access to the Data Flow server service.

Edit the ingress-patch.yaml file in either the dev overlay (apps/ingress/kustomize/overlays/dev/) or the production overlay (apps/ingress/kustomize/overlays/production/). Under spec.rules, update the host entry, providing the DNS name required for your ingress controller configuration.

If your DNS name is data-flow.example.com, add that DNS name as the host:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: scdf-ingress
spec:
  rules:
  - host: data-flow.example.com
    http:
      paths:
      - backend:
          serviceName: scdf-server
          servicePort: 80
        path: /

Security

The Spring Cloud Data Flow and Skipper servers use the Spring Security library to configure authentication and authorization with OAuth 2.0 and OpenID Connect. You can use this to integrate SCDF for Kubernetes into Single Sign-On (SSO) environments.

To configure the servers, edit the application.yaml configuration file for each server. In the Spring Cloud Data Flow Reference Guide, the Security section describes how to configure the Data Flow server, and the Azure section describes the configuration to use with Azure Active Directory.

To configure the application.yaml file for the Skipper server, follow the same steps, using the same documentation from the Spring Cloud Data Flow Reference Guide.

Container Registry

The Data Flow server displays Spring Boot application metadata, which is stored as a label in the container image. To retrieve this metadata, you can configure the Data Flow server to access the container registry for the application images that you will deploy.

Place this configuration under the spring.cloud.dataflow.container.registry-configurations section of the application.yaml file. For examples of how to configure this section for Docker Hub, JFrog Artifactory, Amazon Elastic Container Registry, Azure Container Registry, and Harbor, see the Container Metadata section of the Spring Cloud Data Flow Reference Guide.

Application ImagePullSecrets

When launching a Spring Cloud Task or Spring Cloud Stream application, the Kubernetes nodes must be able to pull the corresponding container image. If the image is stored in a private repository, you must provide authentication information for the nodes to use in pulling the image from a registry.

You can provide this authentication information by creating a secret in the namespace where you will install SCDF for Kubernetes. To create the secret and configure the SCDF for Kubernetes server applications to use it, follow the steps below.

  1. Set username and password environment variables. These variables should contain your login credentials for the registry that stores the application images.

    $ registry=<your registry host>
    $ username=<your user name>
    $ password=<your password>
    
  2. Run the kubectl create secret command, using the variables set in the previous step.

    $ kubectl create secret \
    docker-registry scdf-apps-regcred \
    --docker-server=${registry} \
    --docker-username=${username} \
    --docker-password=${password}
    
  3. Add the name of the secret to the application.yaml file for the Skipper application. This will be used for Spring Cloud Stream applications.

    spring:
      cloud:
        skipper:
          server:
            platform:
              kubernetes:
                accounts:
                  default:
                    imagePullSecret: scdf-apps-regcred
    

    If you want to use the Composed Task Runner (CTR), create a secret that contains the credentials for the appropriate registry (either the Tanzu Net registry, or if you relocated the CTR image, the registry to which you relocated the image).

  4. Add the name of the secret to the application.yaml file for the Data Flow application. This will be used for Spring Cloud Task applications.

    spring:
      cloud:
        dataflow:
          task:
            platform:
              kubernetes:
                accounts:
                  default:
                    imagePullSecret: scdf-apps-regcred
    

Task Limiting

SCDF for Kubernetes provides a configurable limit for the maximum number of concurrently-running tasks. This limit can prevent the saturation of IaaS or hardware resources. If the number of concurrently running tasks on a platform instance is greater than or equal to the limit, the next task launch request fails, and SCDF for Kubernetes returns an error message through the RESTful API, the shell, or the UI.

By default, the concurrent task limit is set to 20. You can configure the limit by setting the maximumConcurrentTasks property in the account:

spring:
  cloud:
    dataflow:
      task:
        platform:
          kubernetes:
            accounts:
              default:
                namespace: default
                maximumConcurrentTasks: 40
                limits:
                  memory: 1024Mi

A successful task launch results in a running pod, which is expected to eventually either complete or fail. Because there is no convenient way to identify the pods that correspond to a task, SCDF for Kubernetes counts only pods that were launched by the Data Flow server. The task launcher provides the task ID as a label in the pod metadata, and SCDF for Kubernetes filters all running pods by this label.

Next: Configure Monitoring for SCDF for Kubernetes

For information about configuring monitoring for SCDF for Kubernetes, see Configuring Monitoring for SCDF for Kubernetes.