This is a temp host until we ship the first round of systems out next month to the
new datacenter. At that point it will go away, and it will come back later in june
after we have gotten things moved and back up again.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This is causing f31 updates to not be synced. The cron job reports:
Subject: Cron <s3-mirror@mm-backend01> /usr/local/bin/lock-wrapper s3sync-updates-current "/usr/local/bin/s3-sync-path.sh /pub/fedora/linux/updates/31/Everything/x86_64/os" 2>&1 |
/usr/local/bin/nag-once s3-updates-current.sh 1d 2>&1
Syntax: /usr/local/bin/s3-sync-path.sh /pub/path/to/sync/
NOTE! Path must end with a trailing /
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
We have been having the cluster fall over for still unknown reasons,
but this patch should at least help prevent them:
first we increase the net_ticktime parameter from it's default of 60 to 120.
rabbitmq sends 4 'ticks' to other cluster members over this time and if 25%
of them are lost it assumes that cluster member is down. All these vm's are
on the same net and in the same datacenter, but perhaps heavy load
from other vm's causes them to sometimes not get a tick in time?
http://www.rabbitmq.com/nettick.html
Also, set our partitioning strategy to autoheal. Currently if some cluster
member gets booted out, it gets paused, and stops processing at all.
With autoheal it will try and figure out a 'winning' partition and restart
all the nodes that are not in that partition.
https://www.rabbitmq.com/partitions.html
Hopefully the first thing will make partitions less likely and the second
will make them repair without causing massive pain to the cluster.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
openqa-aarch64-02.qa is broken in some very mysterious way:
https://pagure.io/fedora-infrastructure/issue/8750
until we can figure that out, this should prevent it picking up
normal jobs, but let us manually target a job at it whenever we
need to for debugging.
Signed-off-by: Adam Williamson <awilliam@redhat.com>