infra-docs-fpo/modules/sysadmin_guide/pages/coreos-cincinnati.adoc

98 lines
3.5 KiB
Text
Raw Normal View History

= Fedora CoreOS Cincinnati SOP
Cincinnati is the update service/backend for Fedora CoreOS (FCOS) machines.
This SOP describes how to access and how to troubleshoot it.
== Contact Information
Owner::
Fedora CoreOS Team
Contact::
#fedora-coreos
== Details
Source::
https://github.com/coreos/fedora-coreos-cincinnati
Playbook::
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/coreos-cincinnati.yml
Location::
OpenShift cluster (production): https://console-openshift-console.apps.ocp.fedoraproject.org/
Project::
coreos-cincinnati:
https://console-openshift-console.apps.ocp.fedoraproject.org/k8s/cluster/projects/coreos-cincinnati
Deployment::
https://console-openshift-console.apps.ocp.fedoraproject.org/k8s/ns/coreos-cincinnati/deploymentconfigs/coreos-cincinnati
Containers::
* `fcos-graph-builder` (GB - raw updates graph)
* `fcos-policy-engine` (PE - frontend handling client requests)
Routes::
* `coreos-updates-raw` (GB web service)
* `coreos-updates-raw-status` (GB status and metrics)
* `coreos-updates` (PE web service)
* `coreos-updates-status` (PE status and metrics)
== Troubleshooting
Each FCOS Cincinnati service exposes live metrics in Prometheus format:
Graph-builder::
https://status.raw-updates.coreos.fedoraproject.org/metrics
Policy-engine::
https://status.updates.coreos.fedoraproject.org/metrics
== Upgrades
=== Building a new version
FCOS Cincinnati is built as container image directly from source,
referencing a pinned git commit.
In order to build a new version, you will first have to find the
relevant commit (i.e. the latest on the `main` branch) at
https://github.com/coreos/fedora-coreos-cincinnati .
Once you have identified the target commit, these are the steps to build
a new container image:
* update the `fcos_cincinnati_build_git_sha` playbook variable in
`roles/openshift-apps/coreos-cincinnati/vars/staging.yml`
* update the `fcos_cincinnati_build_git_sha` playbook variable in
`roles/openshift-apps/coreos-cincinnati/vars/production.yml`
* commit and push the update to the `fedora-infra/ansible` repository
* SSH to `batcave01.iad2.fedoraproject.org`
* run `sudo rbac-playbook openshift-apps/coreos-cincinnati.yml` using
your FAS password and your second-factor OTP
* schedule a new build by running
`sudo rbac-playbook -t build openshift-apps/coreos-cincinnati.yml`
=== Deploying a new version
Once the target commit has been built into a container image, these are
the steps to deploy the new image:
* update the `fcos_cincinnati_deploy_git_sha` playbook variable in
`roles/openshift-apps/coreos-cincinnati/vars/staging.yml`
* update the `fcos_cincinnati_deploy_git_sha` playbook variable in
`roles/openshift-apps/coreos-cincinnati/vars/production.yml`
* commit and push the update to the `fedora-infra/ansible` repository
* SSH to `batcave01.iad2.fedoraproject.org`
* run `sudo rbac-playbook openshift-apps/coreos-cincinnati.yml` using
your FAS password and your second-factor OTP
== Things that could go wrong
=== Application build is stuck
Issues in the underlying OpenShift cluster may result in builds being
permanently stuck.
If a build does not complete within a reasonable amount of time (i.e. 15
minutes):
* go to the build overview at https://console-openshift-console.apps.ocp.fedoraproject.org/k8s/ns/coreos-cincinnati/builds
* click on the build
* cancel it through the "Cancel Build" button
* go back to the build overview page
* schedule a new build through the "Start Build" button