httpd 2.4.61 causing issue in fedora infrastructure #12092
Labels
No labels
announcement
authentication
automate
aws
backlog
blocked
bodhi
ci
Closed As
Duplicate
Closed As
Fixed
Closed As
Fixed with Explanation
Closed As
Initiative Worthy
Closed As
Insufficient data
Closed As
Invalid
Closed As
Spam
Closed As
Upstream
Closed As/Will Not
Can Not fix
cloud
communishift
copr
database
deprecated
dev
discourse
dns
downloads
easyfix
epel
factory2
firmitas
gitlab
greenwave
hardware
help wanted
high-gain
high-trouble
iad2
koji
koschei
lists
low-gain
low-trouble
mbs
medium-gain
medium-trouble
mini-initiative
mirrorlists
monitoring
Needs investigation
notifier
odcs
OpenShift
ops
OSBS
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
rdu-cc
release-monitoring
releng
repoSpanner
request-for-resources
s390x
security
SMTP
src.fp.o
staging
taiga
unfreeze
waiverdb
websites-general
wiki
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: infrastructure/fedora-infrastructure#12092
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe what you would like us to do:
I'm creating this ticket to track what machines were affected by this httpd issue and to be able to close the other tickets related to this issue as those are actually solved.
So for now we have httpd updates disabled for:
When do you need this to be done by? (YYYY/MM/DD)
This is waiting for apache folks to provide a fix.
ipsilon02.iad2.fedoraproject.org
and
ipsilon01.stg.iad2.fedoraproject.org
also. ;)
I tested out https://bodhi.fedoraproject.org/updates/FEDORA-2024-e83af0855e and it's still causing the issue. I downgraded it back to working version on
ipsilon01.stg
, so people can use it.@luhliarik is backporting patches for the regressions fixed upstream. So hopefully the issue will be fixed soon.
Is this solved? I see the errata pushed out?
Or do we need to wait for freeze to end to confirm?
I reached to @luhliarik few days back, but didn't get any response.
The last version I tested still had the issue.
Any news here?
The ipsilon hosts are on f39 still... we could try moving to f40 and see if the issue is solved there? (But I suspect not, because it's the same httpd version).
@kevin I will try to ping @luhliarik again as he probably forgot about this.
@luhliarik Is still looking into that. There is no new upstream version, so he needs to patch the current version up.
I created new httpd-2.4.62-4.fc42 build including the latest mod_rewrite regression fixes. Feel free to test it.
I installed the new build on
ipsilon01.stg
and login on https://stg.pagure.io works. I can confirm that this build is solving the issue we have.Now we need to officially get it to F39 and EPEL9. So we can fix our servers.
After another testing on https://src.stg.fedoraproject.org we (me and @luhliarik) found out that the issue is still there. Not sure why it didn't show up on https://stg.pagure.io. But unfortunately the issue is still not solved for us with the update.
@kevin Would it be possible to give @luhliarik access to ipsilon01.stg, so he can test out the changes himself?
Sure. We should look at moving them to f40 (or I suppose we could wait for f41 to come out and move them to f41 then).
@kevin What is needed to give @luhliarik the correct permissions?
Let's wait for F41 release before upgrading the server.
Well, it seems to do not have any ipa groups/shell access/sudo defined for ipsilon, so I think we need to set at least
ipa_client_shell_groups
ipa_client_sudo_groups
ipa_host_group
ipa_host_group_desc
in inventory/groups/ipsilon_stg
(make sure STAGING ONLY)
Then, add them to a group listed there. Or make a new group for it in staging...
I think that should work?
@kevin I created a PR for it, when it gets merged I will create the group as well.
Looks good to me.
I merged the PR, but when creating the
sysadmin-ipsilon
group on staging I got following error:@kevin Did you saw this before?
Found this on
ipa01.stg
logs:I have seen this before... there's some IPA uid/gid range thing... you from time to time need to allocate new ranges. It should be in the ipa docs... I don't recall the details. ;(
@kevin I remember that, I just didn't realized this is the issue. Let me fix that.
I added the DNA ranges, but it seems like the replicas are somewhat broken I can see plenty of errors in
/var/log/dirsrv/slapd-STG-FEDORAPROJECT-ORG/errors
onipa01.stg
and trying to fix that.Currently trying to re-initialize replica by running
ipa-replica-manage re-initialize
, but it doesn't seems to be helping. Not sure what actually happened and why it's not working as it should.I changed the group to have access in
ipsilon_stg
tosysadmin-noc
and added @luhliarik to the group.@kevin Do I need to run some playbook to reflect the changes? Or do I need to add permissions to
sysadmin-noc
group on staging IPA?Nothing should be needed. sssd on the vm will read the info from ipa, so it should be updated...
I notified @luhliarik that he should now had access to the machine.
ok, so, I moved our ipsilon servers to f41. (because f39 went eol last week).
And the problem still seems to be happening. ;(
We can't keep kicking this can down the road.
I saved off the xml and lv for the old f39 ones, I guess I can roll back to them... but... I really don't want to do that.
I might look and see if I can figure out anything first.
I downgraded the prod ones to httpd-2.4.59-2.fc41 and it resumed working again. ;(
At least the old package is still available on fc41.
I reached to @luhliarik and got info that most of the regressions should be fixed in the latest httpd version.
I tested out ipsilon01.stg with httpd-2.4.63-1.fc41.x86_64 and it seems that the SSO from stg.pagure.io is working without issue. I will try the same with people01 and see the output, but it seems that this issue is solved now.
Unfortunately on EPEL9 there is still httpd-2.4.62-1.el9_5.2.x86_64, which has the issue. So people01 still has the problem.
Updated httpd on ipsilon01 and ipsilon02 and the issue didn't show up on pagure.io and src.fedoraproject.org, but I'm still getting too many redirects from https://copr.fedorainfracloud.org/. So it's not completely fixing the issue, but at least partially it does.
The problem is that I can't undo the dnf transaction:
I will try and find the builds in koji and restore it from that.
I reverted the changes back using old koji build, the packages are in
/root/httpd_rpm_backup
folder on ipsilon01 and ipsilon02, just in case we need them in future.Just slight side note... we are in freeze now, so we should be careful about making changes to prod without +1s. ;)
Too bad it didn't fix it entirely though. ;(
Forgot that this needs a freeze break :-/. Not sure if the reason is that COPR is still using OpenID authentication or there is still something wrong with httpd package (there is definitely a regression, but the migration to OIDC could maybe solve the issue as well).
Any news on a fix here?
Unfortunately not, the last experiment with update didn't work as expected.
I will try to update it again and test it out with COPR as that was the only one which didn't work and it's not migrated to OIDC.