Administration and Maintenance¶
Understanding build parameters¶
Please refer to Build Parameters for information on how options are configured within OSBS builds.
When submitting a new build request to OSBS as a user, this request is for an orchestrator build. When the orchestrator build wants to create worker builds it also does this through osbs-client.
As a result there are two osbs.conf files to consider:
- the one external to OSBS, for creating orchestrator builds, and
- the one internal to OSBS, stored in a Kubernetes secret (named by client_config_secret) in the orchestrator cluster
These can have the same content. The important features are discussed below.
can_orchestrate defaults to false. The API method
create_orchestrator_build will fail unless
true for the chosen instance section.
reactor_config_map specifies the name of a
Kubernetes configmap holding Server-side Configuration for atomic-reactor. A pre-build plugin will
read its value from REACTOR_CONFIG environment variable.
client_config_secret is specified this is the name of a
Kubernetes secret (holding a key
osbs.conf) for use by
atomic-reactor when it creates worker builds. The orchestrate_build
plugin is told the path to this.
token_secrets is specified the specified secrets (space
separated) will be mounted in the OpenShift build. When “:” is used,
the secret will be mounted at the specified path, i.e. the format is:
token_secrets = secret:path secret:path ...
This allows an
osbs.conf file (from client_config_secret) to
be constructed with a known value to use for
When an entry with the pattern
node_selector.platform (for some
platform) is specified, builds for this platform submitted to this
cluster must include the given node selector, so as to run on a node
of the correct architecture. This allows for installations that have
mixed-architecture clusters and where node labels differentiate
If the value is
none, this platform is the only one available and
no node selector is required.
When a section name begins with “platform:” it is interpreted not as an OSBS instance but as a platform description. The remainder of the section name is the platform name being described. The section has the following keys:
- architecture (optional)
- the GOARCH for the platform – the platform name is assumed to be the same as the GOARCH if this is not specified
Specifies the build image (AKA “buildroot”) to be used for building container
images, to be set in the
BuildConfig OpenShift objects under the
.spec.strategy.customStrategy.from object. This can be a full reference to
a specific container image in a container registry; or it can reference an
Updating this globally effectively deploys a different version of OSBS.
It takes one of the following forms:
- use the image from the specified OpenShift ImageStreamTag
- pull the image from the specified pullspec (including registry, repository, and either tag or digest)
Deploy OSBS on OpenShift¶
The orchestrator cluster will have a service account (with edit role) created for use by Koji builders. Those Koji builders will use the service account’s persistent token to authenticate to the orchestrator cluster and submit builds to it.
Since the orchestrator build initiates worker builds on the worker cluster, it must have permission to do so. A service account should be created on each worker cluster in order to generate a persistent token. This service account should have edit role. On the orchestrator cluster, a secret for each worker cluster should be created to store the corresponding service account tokens. When osbs-client creates the orchestrator build it must specify the names of the secret files to be mounted in the BuildConfig. The orchestrator build will extract the token from the mounted secret file.
Server-side Configuration for atomic-reactor¶
This will list the maximum number of jobs that should be active at any given time for each cluster. It will also list worker clusters in order of preference. It may also contain additional environment configuration such as ODCS integration.
The runtime configuration will take the form of a Kubernetes secret with content as in the example below:
--- clusters: x86_64: - name: prod-x86_64-osd max_concurrent_builds: 16 - name: prod-x86_64 max_concurrent_builds: 6 enabled: true - name: prod-other max_concurrent_builds: 2 enabled: false ppc64le: - name: prod-ppc64le max_concurrent_builds: 6 odcs: signing_intents: - name: release keys: [AB123] - name: beta keys: [BT456, AB123] - name: unsigned keys:  # Value must match one of the names above. default_signing_intent: release
This maps each platform to a list of clusters and their concurrent build limits. For each platform to build for, a worker cluster is chosen as follows:
- clusters with the enabled key set to false are discarded
- each remaining cluster in turn will be queried to discover all currently active worker builds (not failed, complete, in error, or cancelled)
- the cluster load is computed by dividing the number of active worker builds by the specified maximum number of concurrent builds allowed on the cluster
- the worker build is submitted to whichever cluster has the lowest load; in this way, an even load distribution across all clusters is enforced
There are several throttles preventing too many worker builds being submitted. Each worker cluster can be configured to only schedule a certain number of worker builds at a time by setting a default resource request. The orchestrator cluster will similarly only run a certain number of orchestrator builds at a time based on the resource request in the orchestrator build JSON template. A Koji builder will only run a certain number of containerbuild tasks based on its configured capacity.
This mechanism can also be used to temporarily disable a worker
cluster by removing it from the list or adding
enabled: false to
the cluster description for each platform.
Section used for ODCS related configuration.
- List of signing intents in their restrictive order.
- Name of the default signing intent to be used when one is not provided
Setting up koji for container image builds¶
Example configuration file: Koji builder¶
The configuration required for submitting an orchestrator build is
different than that required for the orchestrator build itself to
submit worker builds. The
osbs.conf used by the Koji builder would
[general] build_json_dir = /usr/share/osbs/ [platform:x86_64] architecture = amd64 [default] openshift_url = https://orchestrator.example.com:8443/ build_image = example.registry.com/buildroot:blue distribution_scope = public can_orchestrate = true # allow orchestrator builds # This secret contains configuration relating to which worker # clusters to use and what their capacities are: reactor_config_map = reactorconf # This secret contains the osbs.conf which atomic-reactor will use # when creating worker builds client_config_secret = osbsconf # These additional secrets are mounted inside the build container # and referenced by token_file in the build container's osbs.conf token_secrets = workertoken:/var/run/secrets/atomic-reactor/workertoken # and auth options, registries, secrets, etc [scratch] openshift_url = https://orchestrator.example.com:8443/ build_image = example.registry.com/buildroot:blue reactor_config_map = reactorconf client_config_secret = osbsconf token_secrets = workertoken:/var/run/secrets/atomic-reactor/workertoken # All scratch builds have distribution-scope=private distribution_scope = private # This causes koji output not to be configured, and for the low # priority node selector to be used. scratch = true # and auth options, registries, secrets, etc
This shows the configuration required to submit a build to the
orchestrator cluster using
Also shown is the configuration for scratch builds, which will be identical to regular builds but with “private” distribution scope for built images and with the scratch option enabled.
Example configuration file: inside builder image¶
osbs.conf used by the builder image for the orchestrator
cluster, and which is contained in the Kubernetes secret named by
client_config_secret above, would include:
[general] build_json_dir = /usr/share/osbs/ [platform:x86_64] architecture = amd64 [prod-mixed] openshift_url = https://worker01.example.com:8443/ node_selector.x86_64 = beta.kubernetes.io/arch=amd64 node_selector.ppc64le = beta.kubernetes.io/arch=ppc64le use_auth = true # This is the path to the token specified in a token_secrets secret. token_file = /var/run/secrets/atomic-reactor/workertoken/worker01-serviceaccount-token # The same builder image is used for the orchestrator and worker # builds, but used with different configuration. It should not # be specified here. # build_image = registry.example.com/buildroot:blue # and auth options, registries, secrets, etc [prod-osd] openshift_url = https://api.prod-example.openshift.com/ node_selector.x86_64 = none use_auth = true token_file = /var/run/secrets/atomic-reactor/workertoken/osd-serviceaccount-token # and auth options, registries, secrets, etc
In this configuration file there are two worker clusters, one which builds for both x86_64 and ppc64le platforms using nodes with specific labels (prod-mixed), and another which only accepts x86_64 builds (prod-osd).
Supporting Operator Manifests extraction¶
koji call addBType operator-manifests
Enabling integration with OMPS service¶
To enable optional integration with OMPS service to allow automatically pushing
operators manifests to application registry (like quay)
section must be added into atomic-reactor configuration.
See configuration details in config.json.
omps: omps_url: https://omps-service.example.com omps_namespace: organization omps_secret: /dir/where/token/file/will/be/mounted appregistry_url: https://quay.io/cnr
Priority of Container Image Builds¶
For a build system it’s desirable to prioritize different kinds of builds in order to better utilize resources. Unfortunately, OpenShift’s scheduling algorithm does not support setting a priority value for a given build. To achieve some sort of build prioritization, we can leverage node selectors to allocate different resources to different build types.
Consider the following types of container builds:
- scratch build
- explicit build
- auto rebuild
As the name implies, scratch builds are meant to be used as a one-off unofficial container build. No guarantees are made for storing the created container images long term. It’s also not meant to be shipped to customers. These are clearly low priority builds.
Explicit builds are those triggered by a user, either directly via fedpkg/koji CLI, or indirectly via pungi (as in the case of base images). These are official builds that will go through the normal life cycle of being tested and, eventually, shipped.
Auto rebuilds are created by OpenShift when a change in the parent image is detected. It’s likely that layered images should be rebuilt in order to pick up changes in latest parent image.
For any explicit build or auto rebuild, they may or may not be high priority. In some cases, a build is high priority due to a security fix, for instance. In other cases, it could be due to an in-progress feature. For this reason, it cannot be said that all explicit builds are higher priority than auto rebuilds, or vice-versa.
However, auto rebuilds have the potential of completely consuming OSBS’s infrastructure. There must be some mechanism to throttle the amount of auto rebuilds. For this reason, OSBS uses a different node selector for each different build type:
- scratch build: builds_scratch=true
- explicit build: builds_explicit=true
- auto rebuild: builds_auto=true
By controlling each type of builds individually, OSBS will have the necessary control for adjusting its infrastructure.
For example, consider an OpenShift cluster with 5 compute nodes:
In this case, scratch builds can be scheduled only on Node 1; explicit builds on any node except Node 3; and auto builds on any node except Node 2.
Worker Builds Node Selectors¶
The build type node selectors are only applied to worker builds. This gives more granular control over available resources. Since worker builds are the ones that actually perform the container image building steps, it requires more resources than orchestrator builds. For this reason, a deployment is more likely to have more nodes available for worker builds than orchestrator builds. This is important because the amount of nodes available defines the granularity of how builds are spread across the cluster.
For instance, consider a large deployment in which only 2 orchestrator nodes are needed. If build type node selectors are applied to orchestrator builds, builds can only be throttled by a factor of 2. In contrast, this same deployment may use 20 worker builds, allowing builds to be throttled by a factor of 20.
Orchestrator Builds Allocation¶
Usually in a deployment, the amount of allowed orchestrator builds matches the amount of allowed worker builds for any given platform. Additional orchestrator builds should be allowed to fully leverage the build type node selectors on worker builds since some orchestrator builds will wait longer than usual for their worker builds to be scheduled. This provides a buffer that allows OpenShift to properly schedule worker builds according to their build type via node selectors. Because OpenShift scheduling is used, worker builds of same type will run in the order they were submitted.
Koji Builder Capacity¶
The task load of the Koji builders used by OSBS will not reflect the actual load on the OpenShift cluster used by OSBS. The disparity is due to auto rebuilds not having a corresponding Koji task. This creates a scenario where a buildContainer Koji task is started, but the OpenShift build remains in pending state. The Koji builder capacity should be set based on how many nodes allow scratch builds and/or explicit builds. In the example above, there are 4 nodes that allow such builds.
The log file, osbs-client.log, in a Koji task gives users a better understanding of any delays due to scheduling.
Builds will automatically cancel themselves if any worker takes more than 3
hours to complete or the entire task takes more than 4 hours to complete.
Administrators can override these run time values with the
orchestrator_max_run_hours settings in the
Obtaining Atomic Reactor stack trace¶
atomic-reactor captures SIGUSR1 signals. When receiving such signal, atomic-reactor responds by showing the current stack trace for every thread it was running when the signal was received.
An administrator can use this to inspect the orchestrator or a specific worker build. It is specially useful to diagnose stuck builds.
As an administrator, use
podman kill --signal=SIGUSR1
podman exec <BUILDROOT_CONTAINER> kill -s SIGUSR1
1 to send the signal to the buildroot container you wish to inspect.
atomic-reactor will dump stack traces for all its threads into the buildroot
container logs. For instance:
Thread 0x7f6e88a1b700 (most recent call first): File "/usr/lib/python2.7/site-packages/atomic_reactor/inner.py", line 277, in run File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner File "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap Current thread 0x7f6e95dbf740 (most recent call first): File "/usr/lib/python2.7/site-packages/atomic_reactor/util.py", line 74, in dump_traceback File "/usr/lib/python2.7/site-packages/atomic_reactor/util.py", line 1562, in dump_stacktraces File "/usr/lib64/python2.7/socket.py", line 476, in readline File "/usr/lib64/python2.7/httplib.py", line 620, in _read_chunked File "/usr/lib64/python2.7/httplib.py", line 578, in read File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 203, in read File "/usr/lib/python2.7/site-packages/docker/client.py", line 247, in _stream_helper File "/usr/lib/python2.7/site-packages/atomic_reactor/util.py", line 297, in wait_for_command File "/usr/lib/python2.7/site-packages/atomic_reactor/plugins/build_docker_api.py", line 46, in run File "/usr/lib/python2.7/site-packages/atomic_reactor/plugin.py", line 239, in run File "/usr/lib/python2.7/site-packages/atomic_reactor/plugin.py", line 449, in run File "/usr/lib/python2.7/site-packages/atomic_reactor/inner.py", line 444, in build_docker_image File "/usr/lib/python2.7/site-packages/atomic_reactor/inner.py", line 547, in build_inside File "/usr/lib/python2.7/site-packages/atomic_reactor/cli/main.py", line 95, in cli_inside_build File "/usr/lib/python2.7/site-packages/atomic_reactor/cli/main.py", line 292, in run File "/usr/lib/python2.7/site-packages/atomic_reactor/cli/main.py", line 310, in run File "/usr/bin/atomic-reactor", line 11, in <module>
In this example, this build is stuck talking to the docker client (