Infrastructure/fedora-infrastructure

Fork 0

persistentvolume-controller, "fedora-ostree-content-volume-2" already bound to a different claim #12555

New issue

Closed

opened 2025-05-12 14:33:12 +00:00 by c4rt0 · 15 comments

c4rt0 commented

2025-05-12 14:33:12 +00:00

I'm trying to migrate a few CoreOS related projects from DeploymentConfig to Deployment. The one I will be focusing in this example is the fedora-ostree-pruner.
Currently number of the replicas is set to 1, after the build is complete, the replica which is being created remains in the Pending stage.

adamsky@fedorapc  ~/Work/ansible  ↱ main ±  oc get pods
NAME                                   READY   STATUS      RESTARTS   AGE
fedora-ostree-pruner-build-1-build     0/1     Completed   0          2m11s
fedora-ostree-pruner-f64475887-gjwt5   0/1     Pending     0          96s

Upon somewhat of an investigation it was observed that the volume is already bound to a different claim:

 adamsky@fedorapc  ~/Work/ansible  ↱ main ±  oc describe pods
...
Volumes:
  fedora-ostree-content-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  fedora-ostree-content-volume
    ReadOnly:   false
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  107s  default-scheduler  0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.

This is true for staging... and as I already found out, also for production 👽

OpenShift Workloads Dashboard returns:

Conditions:
Type, Status, Updated, Reason, 
PodScheduled, False,   May 12, 2025, 1:12 PM, Unschedulable

Message:
0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.

Describe what you would like us to do: Due to permission issues I am unable to dig deeper into the matter, it would be nice if I could get some assistance in figuring this out.

When do you need this to be done by? (YYYY/MM/DD) : ASAP

I'm trying to migrate a few CoreOS related projects from DeploymentConfig to Deployment. The one I will be focusing in this example is the [fedora-ostree-pruner](https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/fedora-ostree-pruner.yml). Currently number of the replicas is set to 1, after the build is complete, the replica which is being created remains in the `Pending` stage. ``` adamsky@fedorapc  ~/Work/ansible  ↱ main ±  oc get pods NAME READY STATUS RESTARTS AGE fedora-ostree-pruner-build-1-build 0/1 Completed 0 2m11s fedora-ostree-pruner-f64475887-gjwt5 0/1 Pending 0 96s ``` Upon somewhat of an investigation it was observed that the volume is already bound to a different claim: ``` adamsky@fedorapc  ~/Work/ansible  ↱ main ±  oc describe pods ... Volumes: fedora-ostree-content-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: fedora-ostree-content-volume ReadOnly: false ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 107s default-scheduler 0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling. ``` This is true for staging... and as I already found out, also for production :alien: OpenShift Workloads Dashboard returns: ``` Conditions: Type, Status, Updated, Reason, PodScheduled, False, May 12, 2025, 1:12 PM, Unschedulable Message: 0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling. ``` ---- Describe what you would like us to do: Due to permission issues I am unable to dig deeper into the matter, it would be nice if I could get some assistance in figuring this out. ---- When do you need this to be done by? (YYYY/MM/DD) : ASAP ----

c4rt0 commented

2025-05-12 14:35:34 +00:00

Author

cc @cverna

cverna commented

2025-05-12 14:45:38 +00:00

On stg we tried deleting the PVC and re-running the playbook, but that didn't seems to help.

zlopez commented

2025-05-12 14:49:17 +00:00

Contributor

Metadata Update from @zlopez:

Issue priority set to: Waiting on Assignee (was: Needs Review)
Issue tagged with: Needs investigation, high-gain

**Metadata Update from @zlopez**: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: Needs investigation, high-gain

darknao commented

2025-05-12 15:17:46 +00:00

You need to specify the storage class name in your pvc definition, otherwise it uses ocs-storagecluster-ceph-rbd by default.
If you want to use a netapp volume (nfs) then use

spec:
  storageClassName: ""

(note that there is no PV fedora-ostree-content-volume-2 on stg, so the PVC will stay in pending until an admin creates that PV.
If you want an RBD volume, then use storageClassName: ocs-storagecluster-ceph-rbd and remove the volumeName. ODF will provision a new volume for you automatically.
Note that you cannot use ReadWriteMany in Filesystem mode with RBD.
If you want to use that combination, use the ocs-storagecluster-cephfs storage class instead.

You need to specify the storage class name in your pvc definition, otherwise it uses `ocs-storagecluster-ceph-rbd` by default. If you want to use a netapp volume (nfs) then use ``` spec: storageClassName: "" ``` (note that there is no PV `fedora-ostree-content-volume-2` on stg, so the PVC will stay in pending until an admin creates that PV. If you want an RBD volume, then use `storageClassName: ocs-storagecluster-ceph-rbd` and remove the `volumeName`. ODF will provision a new volume for you automatically. Note that you cannot use `ReadWriteMany` in `Filesystem` mode with RBD. If you want to use that combination, use the `ocs-storagecluster-cephfs` storage class instead.

cverna commented

2025-05-12 17:58:01 +00:00

@c4rt0 It looks like in coreos-ostree-importer we have a pvc template --> https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/coreos-ostree-importer/templates/pvc.yml.j2

Maybe we should have the same for fedore-ostree-pruner

@c4rt0 It looks like in coreos-ostree-importer we have a pvc template --> https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/coreos-ostree-importer/templates/pvc.yml.j2 Maybe we should have the same for fedore-ostree-pruner

c4rt0 commented

2025-05-13 12:01:19 +00:00

Author

@c4rt0 It looks like in coreos-ostree-importer we have a pvc template --> https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/coreos-ostree-importer/templates/pvc.yml.j2

Maybe we should have the same for fedore-ostree-pruner

I can see we already do, and from my understanding @darknao is referring to it. It's identical to the one coreos-ostree-importer:
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-ostree-pruner/templates/pvc.yml.j2

Thank you both of your answers. I think we can use ocs-storagecluster-cephfs as it is probably a good idea to keep the already existing ReadWriteMany. To be completely honest I'm not sure if we need the RBD volume.
I will post an update here as soon as I will test the above.

> @c4rt0 It looks like in coreos-ostree-importer we have a pvc template --> https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/coreos-ostree-importer/templates/pvc.yml.j2 > > Maybe we should have the same for fedore-ostree-pruner I can see we already do, and from my understanding @darknao is referring to it. It's identical to the one coreos-ostree-importer: https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-ostree-pruner/templates/pvc.yml.j2 Thank you both of your answers. I think we can use `ocs-storagecluster-cephfs` as it is probably a good idea to keep the already existing `ReadWriteMany`. To be completely honest I'm not sure if we need the RBD volume. I will post an update here as soon as I will test the above.

darknao commented

2025-05-13 14:13:00 +00:00

If the volume is shared between apps in and out of OpenShift, then the NetApp (NFS) volume makes sense.
If it's shared with other pods in the same namespace (or more than 1 replicas running), then CephFS.
For any other use (single pod, 1 replica), then RBD provides the best performance.

If the volume is shared between apps in and out of OpenShift, then the NetApp (NFS) volume makes sense. If it's shared with other pods in the same namespace (or more than 1 replicas running), then CephFS. For any other use (single pod, 1 replica), then RBD provides the best performance.

dustymabe commented

2025-05-13 17:10:41 +00:00

The volumes for fedora-ostree-pruner and coreos-ostree-importer are special because they are essentially maps into a netapp volume where the main (compose & prod) ostree repos are stored.

I guess we need the volume created in staging for us?

The volumes for fedora-ostree-pruner and coreos-ostree-importer are special because they are essentially maps into a netapp volume where the main (compose & prod) ostree repos are stored. I guess we need the volume created in staging for us?

c4rt0 commented

2025-05-13 17:17:53 +00:00

Author

We spent some time with @cverna on it today. We have the volume specified, but we still have the pod hanging both in staging and in production. As you can see our updated pvc file contains both storageClassName and volumeName :/

We spent some time with @cverna on it today. We have the volume specified, but we still have the pod `hanging` both in staging and in production. As you can see our [updated pvc file](https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-ostree-pruner/templates/pvc.yml.j2) contains both storageClassName and volumeName :/

c4rt0 commented

2025-05-13 17:29:37 +00:00

Author

 ✘ adamsky@fedorapc  ~/Work/ansible   PR/pvc_patch  oc get pvc
NAME                           STATUS    VOLUME                           CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE
fedora-ostree-content-volume   Pending   fedora-ostree-content-volume-1   0                         ocs-storagecluster-cephfs   <unset>                 38m

``` ✘ adamsky@fedorapc  ~/Work/ansible   PR/pvc_patch  oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE fedora-ostree-content-volume Pending fedora-ostree-content-volume-1 0 ocs-storagecluster-cephfs <unset> 38m ```

c4rt0 commented

2025-05-14 15:56:25 +00:00

Author

After reverting the changes in pvc.yml.j2 back to it's original stage and after a manual binding of the volume by @kevin the fedora-ostree-pruner in production is working once again as expected, using deployment:

 adamsky@fedora  ~/Work/ansible   PR/revert_to_DC  oc get deployment
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
fedora-ostree-pruner   1/1     1            1           28m

Staging is still an issue.

After reverting the [changes in pvc.yml.j2](https://pagure.io/fedora-infra/ansible/pull-request/2624) back to it's original stage and after a manual binding of the volume by @kevin the `fedora-ostree-pruner` in production is working once again as expected, using deployment: ``` adamsky@fedora  ~/Work/ansible   PR/revert_to_DC  oc get deployment NAME READY UP-TO-DATE AVAILABLE AGE fedora-ostree-pruner 1/1 1 1 28m ``` Staging is still an issue.