Using the 'fix outage' clause in freeze here. ;)
Basically adjust db-koji01 to use more memory and avoid
saturating i/o. With these settings, page loads look faster
and i/o is not saturated. We should try adding more cpus and such,
but that will require a reboot, so avoiding for now.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
db-koji01 is our only postgresql 15 install so far, but split out the
config from the 12 one we are using on RHEL8 to avoid making changes
there.
Also, lets try tweaking things:
- I am bumping cpus up to 88
- Tweak max workers/etc
- Try a higher i/o level since this db server is running on a virthost
with ssds.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
First we need to pipe stderr into the grep to filter out the timescaledb
warnings. So, |& does that.
Then, there's no reason to backup the staging database. Disable that.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
db-datanommer02 uses timescaledb. When you do a pg_dump there's warnings
due to this, but according to upstream they are all completely harmless.
So, to avoid an email to everyone every day, lets just try and supress
these, but yet hopefully not supress real errors if they every occur.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
The datanommer_ro user was created in the task, but never got privilege to read
from datanommer2 db. This commit is fixing that.
Signed-off-by: Michal Konečný <mkonecny@redhat.com>
Looks like this role hasn't been used on a Fedora box for a
while so things are kinda broken. Re-arrange all the package
install sections to be together, use newer package names on
Fedora (the Fedora and EL >= 8 sections are identical for now
but I figured I'd keep them separate in case that changes), and
use the newer config file, not the older one, on Fedora.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
In prod db-fas01 is rhel7 and uses postgresql 9.6.
In staging db-fas01.stg is rhel8, and should also use postgresql 9.6,
but we were blanket making rhel8 hosts use postgresql 12.
We could drop this by reinstalling db-fas01.stg with rhel7, or waiting
until we finally kill fas2 and just setting them both to use postgresql
12.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Turns out copy module doesn't actually have a 'absent' state.
So, just remove this (we no longer need it as there's a timer on koji
hub that does this from there).
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
db-koji01 has been running with this since before the mass rebuild, and
it seems to make it have a higher load, but process faster and without
stalling when doing backups or when long/bad koji-gc queries for old
versions of texlive hit it.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Turns out we were not setting effective_cache_size even tho it was set
for some servers (pagure). Adjust a few parameters on db-koji to try and
get some more performance out of it.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
I took the default postgresql.conf from postgresql 12 and then added in
various changes we already manually made and variable substitions we
already had setup back in the postgresq 9.2 days.
This will apply to db-koji01, db-qa01, db-datanommer01 at least.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
In phx2 we had a warm standby database host (db-koji02), but we no
longer have that host in iad2, so we shouldn't try and make db-koji01
handle that. Also, this was just changed mistakenly as it's the warm
standby host that should get the recovery.conf file.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Basically, if the variables are defined in the host, use them, otherwise
use the current values.
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
Our postgresql.conf is from postgresql 9.2 while RHEL8 ships 10.x which
leads to postgresql no longer wanting to start (as seen on pagure-stg01).
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
By default apache uses prefork and a limit of 250. It's possible that this limit was
the thing causing us issues over the last week. This moves to the event mpm and ups limits
a lot. It also needs to up limits on db connections or the increased workers will just
cause the db server to overload.
With this setup, builders are no longer dropping out, but it's not clear if it's solved
all the issues we have been seeing.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>