Added the infra SOPs ported to asciidoc.

This commit is contained in:
Adam Saleh 2021-07-26 10:39:47 +02:00
parent 8a7f111a12
commit a0301e30f1
148 changed files with 18575 additions and 17 deletions

View file

@ -7,11 +7,9 @@ they may be maintained by other people or team).
Services handling identity and providing personal space to our contributors. Services handling identity and providing personal space to our contributors.
FAS https://fas.fedoraproject.org[fas.fp.o]:: Accounts https://accounts.fedoraproject.org/[accounts.fp.o]::
The __F__edora __A__ccount __S__ystem, our directory and identity management Our directory and identity management tool provides community members with a single account to login on Fedora
tool, provides community members with a single account to login on Fedora services. Registering an account there is one of the first things to do if you plan to work on Fedora.
services. https://admin.fedoraproject.org/accounts/user/new[Creating an
account] is one of the first things to do if you plan to work on Fedora.
Fedora People https://fedorapeople.org/[fedorapeople.org]:: Fedora People https://fedorapeople.org/[fedorapeople.org]::
Personnal web space provided to community members to share files, git Personnal web space provided to community members to share files, git

View file

@ -1 +0,0 @@
* xref:index.adoc[Communishift documentation]

View file

@ -1,10 +0,0 @@
:experimental:
= Communishift documentation
link:https://console-openshift-console.apps.os.fedorainfracloud.org/[Communishift] is the name for the OpenShift community cluster run by the Fedora project.
It's intended to be a place where community members can test/deploy/run things that are of benefit to the community at a lower SLE (Service Level Expectation) than services directly run and supported by infrastructure, additionally doing so in a self service manner.
It's also an incubator for applications that may someday be more fully supported once they prove their worth.
Finally, it's a place for Infrastructure folks to learn and test and discover OpenShift in a less constrained setting than our production clusters.
This documentation focuses on implementation details of Fedora's OpenShift instance, not on OpenShift usage in general.
These instructions are already covered by link:https://docs.openshift.com/container-platform/4.1/welcome/index.html[upstream documentation].

View file

@ -1 +1,144 @@
* link:https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/index.html[System Administrator Guide] * xref:index.adoc[Sysadmin Guide]
** xref:2-factor.adoc[Two factor auth]
** xref:accountdeletion.adoc[Account Deletion SOP]
** xref:anitya.adoc[Anitya Infrastructure SOP]
** xref:ansible.adoc[ansible - SOP in review ]
** xref:apps-fp-o.adoc[apps-fp-o - SOP in review ]
** xref:archive-old-fedora.adoc[archive-old-fedora - SOP in review ]
** xref:arm.adoc[arm - SOP in review ]
** xref:askbot.adoc[askbot - SOP in review ]
** xref:aws-access.adoc[aws-access - SOP in review ]
** xref:basset.adoc[basset - SOP in review ]
** xref:bastion-hosts-info.adoc[bastion-hosts-info - SOP in review ]
** xref:bladecenter.adoc[bladecenter - SOP in review ]
** xref:blockerbugs.adoc[blockerbugs - SOP in review ]
** xref:bodhi.adoc[bodhi - SOP in review ]
** xref:bugzilla2fedmsg.adoc[bugzilla2fedmsg - SOP in review ]
** xref:bugzilla.adoc[bugzilla - SOP in review ]
** xref:cloud.adoc[cloud - SOP in review ]
** xref:collectd.adoc[collectd - SOP in review ]
** xref:communishift.adoc[communishift - SOP in review ]
** xref:compose-tracker.adoc[compose-tracker - SOP in review ]
** xref:contenthosting.adoc[contenthosting - SOP in review ]
** xref:copr.adoc[copr - SOP in review ]
** xref:cyclades.adoc[cyclades - SOP in review ]
** xref:darkserver.adoc[darkserver - SOP in review ]
** xref:database.adoc[database - SOP in review ]
** xref:datanommer.adoc[datanommer - SOP in review ]
** xref:debuginfod.adoc[debuginfod - SOP in review ]
** xref:denyhosts.adoc[denyhosts - SOP in review ]
** xref:departing-admin.adoc[departing-admin - SOP in review ]
** xref:dns.adoc[dns - SOP in review ]
** xref:docs.fedoraproject.org.adoc[docs.fedoraproject.org - SOP in review ]
** xref:fas-notes.adoc[fas-notes - SOP in review ]
** xref:fas-openid.adoc[fas-openid - SOP in review ]
** xref:fedmsg-certs.adoc[fedmsg-certs - SOP in review ]
** xref:fedmsg-gateway.adoc[fedmsg-gateway - SOP in review ]
** xref:fedmsg-introduction.adoc[fedmsg-introduction - SOP in review ]
** xref:fedmsg-irc.adoc[fedmsg-irc - SOP in review ]
** xref:fedmsg-new-message-type.adoc[fedmsg-new-message-type - SOP in review ]
** xref:fedmsg-relay.adoc[fedmsg-relay - SOP in review ]
** xref:fedmsg-websocket.adoc[fedmsg-websocket - SOP in review ]
** xref:fedocal.adoc[fedocal - SOP in review ]
** xref:fedorapackages.adoc[fedorapackages - SOP in review ]
** xref:fedorapastebin.adoc[fedorapastebin - SOP in review ]
** xref:fedora-releases.adoc[fedora-releases - SOP in review ]
** xref:fedorawebsites.adoc[fedorawebsites - SOP in review ]
** xref:fmn.adoc[fmn - SOP in review ]
** xref:fpdc.adoc[fpdc - SOP in review ]
** xref:freemedia.adoc[freemedia - SOP in review ]
** xref:freenode-irc-channel.adoc[freenode-irc-channel - SOP in review ]
** xref:freshmaker.adoc[freshmaker - SOP in review ]
** xref:gather-easyfix.adoc[gather-easyfix - SOP in review ]
** xref:gdpr_delete.adoc[gdpr_delete - SOP in review ]
** xref:gdpr_sar.adoc[gdpr_sar - SOP in review ]
** xref:geoip-city-wsgi.adoc[geoip-city-wsgi - SOP in review ]
** xref:github2fedmsg.adoc[github2fedmsg - SOP in review ]
** xref:github.adoc[github - SOP in review ]
** xref:gitweb.adoc[gitweb - SOP in review ]
** xref:greenwave.adoc[greenwave - SOP in review ]
** xref:guestdisk.adoc[guestdisk - SOP in review ]
** xref:guestedit.adoc[guestedit - SOP in review ]
** xref:haproxy.adoc[haproxy - SOP in review ]
** xref:hosted_git_to_svn.adoc[hosted_git_to_svn - SOP in review ]
** xref:hotfix.adoc[hotfix - SOP in review ]
** xref:hotness.adoc[hotness - SOP in review ]
** xref:hubs.adoc[hubs - SOP in review ]
** xref:ibm_rsa_ii.adoc[ibm_rsa_ii - SOP in review ]
** xref:index.adoc[index - SOP in review ]
** xref:infra-git-repo.adoc[infra-git-repo - SOP in review ]
** xref:infra-hostrename.adoc[infra-hostrename - SOP in review ]
** xref:infra-raidmismatch.adoc[infra-raidmismatch - SOP in review ]
** xref:infra-repo.adoc[infra-repo - SOP in review ]
** xref:infra-retiremachine.adoc[infra-retiremachine - SOP in review ]
** xref:infra-yubikey.adoc[infra-yubikey - SOP in review ]
** xref:ipsilon.adoc[ipsilon - SOP in review ]
** xref:iscsi.adoc[iscsi - SOP in review ]
** xref:jenkins-fedmsg.adoc[jenkins-fedmsg - SOP in review ]
** xref:kerneltest-harness.adoc[kerneltest-harness - SOP in review ]
** xref:kickstarts.adoc[kickstarts - SOP in review ]
** xref:koji.adoc[koji - SOP in review ]
** xref:koji-archive.adoc[koji-archive - SOP in review ]
** xref:koji-builder-setup.adoc[koji-builder-setup - SOP in review ]
** xref:koschei.adoc[koschei - SOP in review ]
** xref:layered-image-buildsys.adoc[layered-image-buildsys - SOP in review ]
** xref:librariesio2fedmsg.adoc[librariesio2fedmsg - SOP in review ]
** xref:linktracking.adoc[linktracking - SOP in review ]
** xref:loopabull.adoc[loopabull - SOP in review ]
** xref:mailman.adoc[mailman - SOP in review ]
** xref:making-ssl-certificates.adoc[making-ssl-certificates - SOP in review ]
** xref:massupgrade.adoc[massupgrade - SOP in review ]
** xref:mastermirror.adoc[mastermirror - SOP in review ]
** xref:mbs.adoc[mbs - SOP in review ]
** xref:memcached.adoc[memcached - SOP in review ]
** xref:message-tagging-service.adoc[message-tagging-service - SOP in review ]
** xref:mirrorhiding.adoc[mirrorhiding - SOP in review ]
** xref:mirrormanager.adoc[mirrormanager - SOP in review ]
** xref:mirrormanager-S3-EC2-netblocks.adoc[mirrormanager-S3-EC2-netblocks - SOP in review ]
** xref:mote.adoc[mote - SOP in review ]
** xref:nagios.adoc[nagios - SOP in review ]
** xref:netapp.adoc[netapp - SOP in review ]
** xref:new-hosts.adoc[new-hosts - SOP in review ]
** xref:nonhumanaccounts.adoc[nonhumanaccounts - SOP in review ]
** xref:nuancier.adoc[nuancier - SOP in review ]
** xref:odcs.adoc[odcs - SOP in review ]
** xref:openqa.adoc[openqa - SOP in review ]
** xref:openshift.adoc[openshift - SOP in review ]
** xref:openvpn.adoc[openvpn - SOP in review ]
** xref:orientation.adoc[orientation - SOP in review ]
** xref:outage.adoc[outage - SOP in review ]
** xref:packagedatabase.adoc[packagedatabase - SOP in review ]
** xref:packagereview.adoc[packagereview - SOP in review ]
** xref:pagure.adoc[pagure - SOP in review ]
** xref:pdc.adoc[pdc - SOP in review ]
** xref:pesign-upgrade.adoc[pesign-upgrade - SOP in review ]
** xref:planetsubgroup.adoc[planetsubgroup - SOP in review ]
** xref:privatefedorahosted.adoc[privatefedorahosted - SOP in review ]
** xref:publictest-dev-stg-production.adoc[publictest-dev-stg-production - SOP in review ]
** xref:rabbitmq.adoc[rabbitmq - SOP in review ]
** xref:rdiff-backup.adoc[rdiff-backup - SOP in review ]
** xref:registry.adoc[registry - SOP in review ]
** xref:requestforresources.adoc[requestforresources - SOP in review ]
** xref:resultsdb.adoc[resultsdb - SOP in review ]
** xref:retrace.adoc[retrace - SOP in review ]
** xref:reviewboard.adoc[reviewboard - SOP in review ]
** xref:scmadmin.adoc[scmadmin - SOP in review ]
** xref:selinux.adoc[selinux - SOP in review ]
** xref:sigul-upgrade.adoc[sigul-upgrade - SOP in review ]
** xref:simple_koji_ci.adoc[simple_koji_ci - SOP in review ]
** xref:sshaccess.adoc[sshaccess - SOP in review ]
** xref:sshknownhosts.adoc[sshknownhosts - SOP in review ]
** xref:staging.adoc[staging - SOP in review ]
** xref:status-fedora.adoc[status-fedora - SOP in review ]
** xref:syslog.adoc[syslog - SOP in review ]
** xref:tag2distrepo.adoc[tag2distrepo - SOP in review ]
** xref:torrentrelease.adoc[torrentrelease - SOP in review ]
** xref:unbound.adoc[unbound - SOP in review ]
** xref:virt-image.adoc[virt-image - SOP in review ]
** xref:virtio.adoc[virtio - SOP in review ]
** xref:virt-notes.adoc[virt-notes - SOP in review ]
** xref:voting.adoc[voting - SOP in review ]
** xref:waiverdb.adoc[waiverdb - SOP in review ]
** xref:wcidff.adoc[wcidff - SOP in review ]
** xref:wiki.adoc[wiki - SOP in review ]
** xref:zodbot.adoc[zodbot - SOP in review ]

View file

@ -0,0 +1,98 @@
= Two factor auth
Fedora Infrastructure has implemented a form of two factor auth for
people who have sudo access on Fedora machines. In the future we may
expand this to include more than sudo but this was deemed to be a high
value, low hanging fruit.
== Using two factor
http://fedoraproject.org/wiki/Infrastructure_Two_Factor_Auth
To enroll a Yubikey, use the fedora-burn-yubikey script like normal. To
enroll using FreeOTP or Google Authenticator, go to
https://admin.fedoraproject.org/totpcgiprovision/
=== What's enough authentication?
FAS Password+FreeOTP or FAS Password+Yubikey Note: don't actually enter
a +, simple enter your FAS Password and press your yubikey or enter your
FreeOTP code.
== Administrating and troubleshooting two factor
Two factor auth is implemented by a modified copy of the
https://github.com/mricon/totp-cgi project doing the authentication and
pam_url submitting the authentication tokens.
totp-cgi runs on the fas servers (currently fas01.stg and
fas01/fas02/fas03 in production), listening on port 8443 for pam_url
requests.
FreeOTP, Google authenticator and yubikeys are supported as tokens to
use with your password.
=== FreeOTP, Google authenticator:
FreeOTP application is preferred, however Google authenticator works as
well. (Note that Google authenticator is not open source)
This is handled via totpcgi. There's a command line tool to manage
users, totpprov. See 'man totpprov' for more info. Admins can use this
tool to revoke lost tokens (google authenticator only) with 'totpprov
delete-user username'
To enroll using FreeOTP or Google Authenticator for production machines,
go to https://admin.fedoraproject.org/totpcgiprovision/
To enroll using FreeOTP or Google Authenticator for staging machines, go
to https://admin.stg.fedoraproject.org/totpcgiprovision/
You'll be prompted to login with your fas username and password.
Note that staging and production differ.
=== YubiKeys:
Yubikeys are enrolled and managed in FAS. Users can self-enroll using
the fedora-burn-yubikey utility included in the fedora-packager package.
=== What do I do if I lose my token?
Send an email to admin@fedoraproject.org that is encrypted/signed with
your gpg key from FAS, or otherwise identifies you are you.
=== How to remove a token (so the user can re-enroll)?
First we MUST verify that the user is who they say they are, using any
of the following:
* Personal contact where the person can be verified by member of
sysadmin-main.
* Correct answers to security questions.
* Email request to admin@fedoraproject.org that is gpg encrypted by the
key listed for the user in fas.
Then:
. For google authenticator,
+
____
.. ssh into batcave01 as root
.. ssh into os-master01.iad2.fedoraproject.org
.. $ oc project fas
.. $ oc get pods
.. $ oc rsh <pod> (Pick one of totpcgi pods from the above list)
.. $ totpprov delete-user <username>
____
. For yubikey: login to one of the fas machines and run:
/usr/local/bin/yubikey-remove.py username
The user can then go to
https://admin.fedoraproject.org/totpcgiprovision/ and reprovision a new
device.
If the user emails admin@fedoraproject.org with the signed request, make
sure to reply to all indicating that a reset was performed. This is so
that other admins don't step in and reset it again after its been reset
once.

View file

@ -0,0 +1,294 @@
= Account Deletion SOP
For the most part we do not delete accounts. In the case that a deletion
is paramount, it will need to be coordinated with appropriate entities.
Disabling accounts is another story but is limited to those with the
appropriate privileges. Reasons for accounts to be disabled can be one
of the following:
____
* Person has placed SPAM on the wiki or other sites.
* It is seen that the account has been compromised by a third party.
* A person wishes to leave the Fedora Project and wants the account
disabled.
____
== Contents
* Disabling
** Disable Accounts
** Disable Groups
* User Requested disables
* Renames
** Rename Accounts
** Rename Groups
* Deletion
** Delete Accounts
** Delete Groups
=== Disable
Disabling accounts is the easiest to accomplish as it just blocks people
from using their account. It does not remove the account name and
associated UID so we don't have to worry about future, unintentional
collisions.
== Disable Accounts
To begin with, accounts should not be disabled until there is a ticket
in the Infrastructure ticketing system. After that the contents inside
the ticket need to be verified (to make sure people aren't playing
pranks or someone is in a crappy mood). This needs to be logged in the
ticket (who looked, what they saw, etc). Then the account can be
disabled.:
....
ssh db02
sudo -u postgres pqsql fas2
fas2=# begin;
fas2=# select * from people where username = 'FOOO';
....
Here you need to verify that the account looks right, that there is only
one match, or other issues. If there are multiple matches you need to
contact one of the main sysadmin-db's on how to proceed.:
....
fas2=# update people set status = 'admin_disabled' where username = 'FOOO';
fas2=# commit;
fas2=# /q
....
== Disable Groups
There is no explicit way to disable groups in FAS2. Instead, we close
the group for adding new members and optionally remove existing members
from it. This can be done from the web UI if you are an administrator of
the group or you are in the accounts group. First, go to the group info
page. Then click the (edit) link next to Group Details. Make sure that
the Invite Only box is checked. This will prevent other users from
requesting the group on their own.
If you want to remove the existing users, View the Group info, then
click on the View Member List link. Click on All under the Results
heading. Then go through and click on Remove for each member.
Doing this in the database instead can be quicker if you have a lot of
people to remove. Once again, this requires someone in sysadmin-db to do
the work:
....
ssh db02
sudo -u postgres pqsql fas2
fas2=# begin;
fas2=# update group, set invite_only = true where name = 'FOOO';
fas2=# commit;
fas2=# begin;
fas2=# select p.name, g.name, r.role_status from people as p, person_roles as r, groups as g
where p.id = r.person_id and g.id = r.group_id
and g.name = 'FOOO';
fas2=# -- Make sure that the list of users in the groups looks correct
fas2=# delete from person_roles where person_roles.group_id = (select id from groups where g.name = 'FOOO');
fas2=# -- number of rows in both of the above should match
fas2=# commit;
fas2=# /q
....
=== User Requested Disables
According to our Privacy Policy, a user may request that their personal
information from FAS if they want to disable their account. We can do
this but need to do some extra work over simply setting the account
status to disabled.
== Record User's CLA information
If the user has signed the CLA/FPCA, then they may have contributed
something to Fedora that we'll need to contact them about at a later
date. For that, we need to keep at least the following information:
* Fedora username
* human name
* email address
All of this information should be on the CLA email that is sent out when
a user signs up. We need to verify with spot (Tom Callaway) that he has
that record. If not, we need to get it to him. Something like:
....
select id, username, human_name, email, telephone, facsimile, postal_address from people where username = 'USERNAME';
....
and send it to spot to keep.
== Remove the personal information
The following sequence of db commands should do it:
....
fas2=# begin;
fas2=# select * from people where username = 'USERNAME';
....
Here you need to verify that the account looks right, that there is only
one match, or other issues. If there are multiple matches you need to
contact one of the main sysadmin-db's on how to proceed.:
....
fas2=# update people set human_name = '', gpg_keyid = null, ssh_key = null, unverified_email = null, comments = null, postal_address = null, telephone = null, facsimile = null, affiliation = null, ircnick = null, status = 'inactive', locale = 'C', timezone = null, latitude = null, longitude = null, country_code = null, email = 'disabled1@fedoraproject.org' where username = 'USERNAME';
....
Make sure only one record was updated:
....
fas2=# select * from people where username = 'USERNAME';
....
Make sure the correct record was updated:
....
fas2=# commit;
....
[NOTE]
.Note
====
The email address is both not null and unique in the database. Due to
this, you need to set it to a new string for every user who requests
deletion like this.
====
=== Renames
In general, renames do not require as much work as deletions but they
still require coordination. This is because renames do not change the
UID/GID but some of our applications save information based on
username/groupname rather than UID/GID.
== Rename Accounts
[WARNING]
.Warning
====
Needs more eyes This list may not be complete.
====
* Check the databases for koji, pkgdb, and bodhi for occurrences of
the old username and update them to the new username.
* Check fedorapeople.org for home directories and yum repositories under
the old username that would need to be renamed
* Check (or ask the user to check and update) mailing list subscriptions
on fedorahosted.org and lists.fedoraproject.org under the old
username@fedoraproject.org email alias
* Check whether the user has a username@fedoraproject.org bugzilla
account in python-fedora and update that. Also ask the user to update
that in bugzilla.
* If the user is in a sysadmin-* group, check for home directories on
bastion and other infrastructure boxes that are owned by them and need
to be renamed (Could also just tell the user to backup any files there
themselves b/c they're getting a new home directory).
* grep through ansible for occurrences of the username
* Check for entries in trac on fedorahosted.org for the username as an
"Assigned to" or "CC" entry.
* Add other places to check here
== Rename Groups
[WARNING]
.Warning
====
Needs more eyes This list may not be complete.
====
* grep through ansible for occurrences of the group name.
* Check for group-members,group-admins,group-sponsors@fedoraproject.org
email alias presence in any fedorahosted.org or lists.fedoraproject.org
mailing list
* Check for entries in trac on fedorahosted.org for the username as an
"Assigned to" or "CC" entry.
* Add other places to check here
=== Deletion
Deletion is the toughest one to audit because it requires that we look
through our systems looking for the UID and GID in addition to looking
for the username and password. The UID and GID are used on things like
filesystem permissions so we have to look there as well. Not catching
these places may lead to security issus should the UID/GID ever be
reused.
[NOTE]
.Note
====
Recommended to rename instead When not strictly necessary to purge all
traces of an account, it's highlyrecommended to rename the user or group
to something like DELETED_oldusername instead of deleting. This avoids
the problems and additional checking that we have to do below.
====
== Delete Accounts
[WARNING]
.Warning
====
Needs more eyes This list may be incomplete. Needs more people to look
at this and find places that may need to be updated
====
* Check everything for the #Rename Accounts case.
* Figure out what boxes a user may have had access to in the past. This
means you need to look at all the groups a user may ever have been
approved for (even if they are not approved for those groups now). For
instance, any git*, svn*, bzr*, hg* groups would have granted access to
hosted03 and hosted04. packager would have granted access to
pkgs.fedoraproject.org. Pretty much any group grants access to
fedorapeople.org.
* For those boxes, run a find over the files there to see if the UID
owns any files on the system:
+
....
# find / -uid 100068 -print
....
+
Any files owned by that uid must be reassigned to another user or::
removed.
[WARNING]
.Warning
====
What to do about backups? Backups pose a special problem as they may
contain the uid that's being removed. Need to decide how to handle this
====
* Add other places to check here
== Delete Groups
[WARNING]
.Warning
====
Needs more eyes This list may be incomplete. Needs more people to look
at this and find places that may need to be updated
====
* Check everything for the #Rename Groups case.
* Figure out what boxes may have had files owned by that group. This
means that you'd need to look at the users in that group, what boxes
they have shell accounts on, and then look at those boxes. groups used
for hosted would also need to add hosted03 and hosted04 to that list and
the box that serves the hosted mailing lists.
* For those boxes, run a find over the files there to see if the GID
owns any files on the system:
+
....
# find / -gid 100068 -print
....
+
Any files owned by that GID must be reassigned to another group or
removed.
[WARNING]
.Warning
====
What to do about backups? Backups pose a special problem as they may
contain the gid that's being removed. Need to decide how to handle this
====
* Add other places to check here

View file

@ -0,0 +1,210 @@
= Anitya Infrastructure SOP
Anitya is used by Fedora to track upstream project releases and maps
them to downstream distribution packages, including (but not limited to)
Fedora.
Anitya staging instance: https://stg.release-monitoring.org
Anitya production instance: https://release-monitoring.org
Anitya project page: https://github.com/fedora-infra/anitya
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, #fedora-apps
Persons::
zlopez
Location::
iad2.fedoraproject.org
Servers::
Production
+
* os-master01.iad2.fedoraproject.org
+
Staging
+
* os-master01.stg.iad2.fedoraproject.org
Purpose::
Map upstream releases to Fedora packages.
== Hosts
The current deployment is made up of release-monitoring OpenShift
namespace.
=== release-monitoring
This OpenShift namespace runs following pods:
* The apache/mod_wsgi application for release-monitoring.org
* A libraries.io SSE client
* A service checking for new releases
This OpenShift project relies on:
* A postgres db server running in OpenShift
* Lots of external third-party services. The anitya webapp can scrape
pypi, rubygems.org, sourceforge and many others on command.
* Lots of external third-party services. The check service makes all
kinds of requests out to the Internet that can fail in various ways.
* Fedora messaging RabbitMQ hub for publishing messages
Things that rely on this host:
* `hotness-sop` is a fedora messaging consumer running in Fedora Infra
in OpenShift. It listens for Anitya messages from here and performs
actions on koji and bugzilla.
== Releasing
The release process is described in
https://anitya.readthedocs.io/en/latest/contributing.html#release-guide[Anitya
documentation].
=== Deploying
Staging deployment of Anitya is deployed in OpenShift on
os-master01.stg.iad2.fedoraproject.org.
To deploy staging instance of Anitya you need to push changes to staging
branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub
webhook will then automatically deploy a new version of Anitya on
staging.
Production deployment of Anitya is deployed in OpenShift on
os-master01.iad2.fedoraproject.org.
To deploy production instance of Anitya you need to push changes to
production branch on https://github.com/fedora-infra/anitya[Anitya
GitHub]. GitHub webhook will then automatically deploy a new version of
Anitya on production.
==== Configuration
To deploy the new configuration, you need
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/sshaccess.html[ssh
access] to batcave01.iad2.fedoraproject.org and
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/ansible.html[permissions
to run the Ansible playbook].
All the following commands should be run from batcave01.
First, ensure there are no configuration changes required for the new
update. If there are, update the Ansible anitya role(s) and optionally
run the playbook:
....
$ sudo rbac-playbook openshift-apps/release-monitoring.yml
....
The configuration changes could be limited to staging only using:
....
$ sudo rbac-playbook openshift-apps/release-monitoring.yml -l staging
....
This is recommended for testing new configuration changes.
==== Upgrading
===== Staging
To deploy new version of Anitya you need to push changes to staging
branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub
webhook will then automatically deploy a new version of Anitya on
staging.
===== Production
To deploy new version of Anitya you need to push changes to production
branch on https://github.com/fedora-infra/anitya[Anitya GitHub]. GitHub
webhook will then automatically deploy a new version of Anitya on
production.
Congratulations! The new version should now be deployed.
== Administrating release-monitoring.org
Anitya web application offers some functionality to administer itself.
User admin status is tracked in Anitya database. Admin users can grant
or revoke admin priviledges to users in the
https://release-monitoring.org/users[users tab].
Admin users have additional functionality available in web interface. In
particular, admins can view flagged projects, remove projects and remove
package mappings etc.
For more information see
https://anitya.readthedocs.io/en/stable/admin-user-guide.html[Admin user
guide] in Anitya documentation.
=== Flags
Anitya lets users flag projects for administrator attention. This is
accessible to administrators in the
https://release-monitoring.org/flags[flags tab].
== Monitoring
To monitor the activity of Anitya you can connect to Fedora infra
OpenShift and look at the state of pods.
For staging look at the [.title-ref]#release-monitoring# namespace in
https://os.stg.fedoraproject.org/console/project/release-monitoring/overview[staging
OpenShift instance].
For production look at the [.title-ref]#release-monitoring# namespace in
https://os.fedoraproject.org/console/project/release-monitoring/overview[production
OpenShift instance].
== Troubleshooting
This section contains various issues encountered during deployment or
configuration changes and possible solutions.
=== Fedmsg messages aren't sent
*Issue:* Fedmsg messages aren't sent.
*Solution:* Set USER environment variable in pod.
*Explanation:* Fedmsg is using USER env variable as a username inside
messages. Without USER env set it just crashes and didn't send anything.
=== Cronjob is crashing
*Issue:* Cronjob pod is crashing on start, even after configuration
change that should fix the behavior.
*Solution:* Restart the cronjob. This could be done by OPS.
*Explanation:* Every time the cronjob is executed after crash it is
trying to actually reuse the pod with bad configuration instead of
creating a new one with new configuration.
=== Database migration is taking too long
*Issue:* Database migration is taking few hours to complete.
*Solution:* Stop every pod and cronjob before migration.
*Explanation:* When creating new index or doing some other complex
operation on database, the migration script needs exclusive access to
the database.
=== Old version is deployed instead the new one
*Issue:* The pod is deployed with old version of Anitya, but it says
that it was triggered by correct commit.
*Solution:* Set [.title-ref]#dockerStrategy# in buildconfig.yml to
noCache.
*Explanation:* The OpenShift is by default caching the layers of docker
containers, so if there is no change in Dockerfile it will just use the
cached version and don't run the commands again.

View file

@ -0,0 +1,252 @@
= Ansible infrastructure SOP/Information.
== Background
Fedora infrastructure used to use func and puppet for system change
management. We are now using ansible for all system change mangement and
ad-hoc tasks.
== Overview
Ansible runs from batcave01 or backup01. These hosts run a ssh-agent
that has unlocked the ansible root ssh private key. (This is unlocked
manually by a human with the passphrase each reboot, the passphrase
itself is not stored anywhere on the machines). Using 'sudo -i',
sysadmin-main members can use this agent to access any machines with the
ansible root ssh public key setup, either with 'ansible' for one-off
commands or 'ansible-playbook' to run playbooks.
Playbooks are idempotent (or should be). Meaning you should be able to
re-run the same playbook over and over and it should get to a state
where 0 items are changing.
Additionally (see below) there is a rbac wrapper that allows members of
some other groups to run playbooks against specific hosts.
=== GIT repositories
There are 2 git repositories associated with Ansible:
* The Fedora Infrastructure Ansible repository and replicas.
+
[CAUTION]
.Caution
====
This is a public repository. Never commit private data to this repo.
====
+
image:ansible-repositories.png[image]
+
This repository exists as several copies or replicas:
** The "upstream" repository on Pagure.
+
https://pagure.io/fedora-infra/ansible
+
This repository is the public facing place where people can contribute
(e.g. pull requests) as well as the authoritative source. Members of the
`sysadmin` FAS group or the `fedora-infra` Pagure group have commit
access to this repository.
+
To contribute changes, fork the repository on Pagure and submit a Pull
Request. Someone from the aforementioned groups can then review and
merge them.
+
It is recommended that you configure git to use `pull --rebase` by
default by running `git config --bool pull.rebase true` in your ansible
clone directory. This configuration prevents unneeded merges which can
occur if someone else pushes changes to the remote repository while you
are working on your own local changes.
** Two bare mirrors on [.title-ref]#batcave01#, `/srv/git/ansible.git`
and `/srv/git/mirrors/ansible.git`
+
[CAUTION]
.Caution
====
These are public repositories. Never commit private data to these
repositories. Don't commit or push to these repos directly, unless
Pagure is unavailable.
====
+
The `mirror_pagure_ansible` service on [.title-ref]#batcave01# receives
bus messages about changes in the repository on Pagure, fetches these
into `/srv/git/mirrors/ansible.git` and pushes from there to
`/srv/git/ansible.git`. When this happens, various actions are triggered
via git hooks:
*** The working copy at `/srv/web/infra/ansible` is updated.
*** A mail about the changes is sent to [.title-ref]#sysadmin-members#.
*** The changes are announced on the message bus, which in turn triggers
announcements on IRC.
+
You can check out the repo locally on [.title-ref]#batcave01# with:
+
....
git clone /srv/git/ansible.git
....
+
If the Ansible repository on Pagure is unavailable, members of the
[.title-ref]#sysadmin# group may commit directly, provided this
procedure is followed:
[arabic]
. The synchronization service is stopped and disabled:
+
....
sudo systemctl disable --now mirror_pagure_ansible.service
....
. Changes are applied to the repository on [.title-ref]#batcave01#.
. After Pagure is available again, the changes are pushed to the
repository there.
. The synchronization service is enabled and started:
+
....
sudo systemctl enable --now mirror_pagure_ansible.service
....
** `/srv/web/infra/ansible` on [.title-ref]#batcave01#, the working copy
from which playbooks are run.
+
[CAUTION]
.Caution
====
This is a public repository. Never commit private data to this repo.
Don't commit or push to this repo directly, unless Pagure is
unavailable.
====
+
You can access it also via a cgit web interface at:
https://pagure.io/fedora-infra/ansible/
+
[verse]
--
--
* `/srv/git/ansible-private` on [.title-ref]#batcave01#.
+
[CAUTION]
.Caution
====
This is a private repository for passwords and other sensitive data. It
is not available in cgit, nor should it be cloned or copied remotely.
====
+
This repository is only accessible to members of 'sysadmin-main'.
=== Cron job/scheduled runs
With use of run_ansible-playbook_cron.py that is run daily via cron we
walk through playbooks and run them with [.title-ref]#--check --diff#
params to perform a dry-run.
This way we make sure all the playbooks are idempotent and there is no
unexpected changes on servers (or playbooks).
=== Logging
We have in place a callback plugin that stores history for any
ansible-playbook runs and then sends a report each day to
sysadmin-logs-members with any CHANGED or FAILED actions. Additionally,
there's a fedmsg plugin that reports start and end of ansible playbook
runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of
when and what commands and playbooks were run.
=== role based access control for playbooks
There's a wrapper script on batcave01 called 'rbac-playbook' that allows
non sysadmin-main members to run specific playbooks against specific
groups of hosts. This is part of the ansible_utils package. The upstream
for ansible_utils is: https://bitbucket.org/tflink/ansible_utils
To add a new group:
[arabic]
. add the playbook name and sysadmin group to the rbac-playbook
(ansible-private repo)
. add that sysadmin group to sudoers on batcave01 (also in
ansible-private repo)
To use the wrapper:
....
sudo rbac-playbook playbook.yml
....
== Directory setup
=== Inventory
The inventory directory tells ansible all the hosts that are managed by
it and the groups they are in. All files in this dir are concatenated
together, so you can split out groups/hosts into separate files for
readability. They are in ini file format.
Additionally under the inventory directory are host_vars and group_vars
subdirectories. These are files named for the host or group and
containing variables to set for that host or group. You should strive to
set variables in the highest level possible, and precedence is in:
global, group, host order.
=== Vars
This directory contains global variables as well as OS specific
variables. Note that in order to use the OS specific ones you must have
'gather_facts' as 'True' or ansible will not have the facts it needs to
determine the OS.
=== Roles
Roles are a collection of tasks/files/templates that can be used on any
host or group of hosts that all share that role. In other words, roles
should be used except in cases where configuration only applies to a
single host. Roles can be reused between hosts and groups and are more
portable/flexable than tasks or specific plays.
=== Scripts
In the ansible git repo under scripts are a number of utilty scripts for
sysadmins.
=== Playbooks
In the ansible git repo there's a directory for playbooks. The top level
contains utility playbooks for sysadmins. These playbooks perform
one-off functions or gather information. Under this directory are hosts
and groups playbooks. These playbooks are for specific hosts and groups
of hosts, from provision to fully configured. You should only use a host
playbook in cases where there will never be more than one of that thing.
=== Tasks
This directory contains one-off tasks that are used in playbooks. Some
of these should be migrated to roles (we had this setup before roles
existed in ansible). Those that are truely only used on one host/group
could stay as isolated tasks.
=== Syntax
Ansible now warns about depreciated syntax. Please fix any cases you see
related to depreciation warnings.
Templates use the jinja2 syntax.
== Libvirt virtuals
* TODO: add steps to make new libvirt virtuals in staging and production
* TODO: merge in new-hosts.txt
== Cloud Instances
* TODO: add how to make new cloud instances
* TODO: merge in from ansible README file.
== rdiff-backups
see:
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/rdiff-backup.html
== Additional Reading/Resources
Upstream docs:::
https://docs.ansible.com/
Example repo with all kinds of examples:::
* https://github.com/ansible/ansible-examples
* https://gist.github.com/marktheunissen/2979474
Jinja2 docs:::
http://jinja.pocoo.org/docs/

View file

@ -0,0 +1,31 @@
= apps-fp-o SOP
Updating and maintaining the landing page at
https://apps.fedoraproject.org/
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-apps, #fedora-admin
Servers:::
proxy0*
Purpose:::
Have a nice landing page for all our webapps.
== Description
We have a number of webapps, many of which our users don't know about.
This page was created so there was a central place where users could
stumble through them and learn.
The page is generated by a ansible role in ansible/roles/apps-fp-o/ It
makes use of an RPM package, the source code for which is at
https://github.com/fedora-infra/apps.fp.o
You can update the page by updating the apps.yaml file in that ansible
module.
When ansible is run next, the two ansible handlers should see your
changes and regenerate the static html and json data for the page.

View file

@ -0,0 +1,60 @@
= How to Archive Old Fedora Releases
The Fedora download servers contain terabytes of data, and to allow for
mirrors to not have to take all of that data, infrastructure regularly
moves data of end of lifed releases (from /pub/fedora/linux) to the
archives section (/pub/archive/fedora/linux)
== Steps Involved
[arabic]
. log into batcave01.phx2.fedoraproject.org and ssh to bodhi-backend01
+
$ sudo -i ssh root@bodhi-backend01.iad2.fedoraproject.org # su - ftpsync
$
. Then change into the releases directory.
+
$ cd /pub/fedora/linux/releases
. Check to see that the target directory doesn't already exist.
+
$ ls /pub/archive/fedora/linux/releases/
. If the target directory does not already exist, do a recursive link
copy of the tree you want to the target
+
$ cp -lvpnr 21 /pub/archive/fedora/linux/releases/21
. If the target directory already exists, then we need to do a recursive
rsync to update any changes in the trees since the previous copy.
+
$ rsync -avAXSHP --delete ./21/ /pub/archive/fedora/linux/releases/21/
. We now do the updates and updates/testing in similar ways.
+
$ cd ../updates/ $ cp -lpnr 21 /pub/archive/fedora/linux/updates/21 $ cd
testing $ cp -lpnr 21 /pub/archive/fedora/linux/updates/testing/21
Alternative if this is a later refresh of an older copy.
____
$ cd ../updates/ $ rsync -avAXSHP 21/
/pub/archive/fedora/linux/updates/21/ $ cd testing $ rsync -avAXSHP 21/
/pub/archive/fedora/linux/updates/testing/21/
____
[arabic, start=7]
. Do the same with fedora-secondary.
. Announce to the mirror list this has been done and that in 2 weeks you
will move the old trees to archives.
. In two weeks, log into mm-backend01 and run the archive script
+
sudo -u mirrormanager mm2_move-to-archive --originalCategory="Fedora
Linux" --archiveCategory="Fedora Archive" --directoryRe='/21/Everything'
. If there are problems, the postgres DB may have issues and so you need
to get a DBA to update the backend to fix items.
. Wait an hour or so then you can remove the files from the main tree.
+
ssh bodhi-backend01 cd /pub/fedora/linux cd releases/21 ls # make sure
you have stuff here rm -rf * ln ../20/README . cd ../../updates/21 ls #
make sure you have stuff here rm -rf * ln ../20/README . cd
../testing/21 ls # make sure you have stuff here rm -rf * ln
../20/README .
This should complete the archiving.

View file

@ -0,0 +1,205 @@
= Fedora ARM Infrastructure
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-releng
Location::
Phoenix
Servers::
arm01, arm02, arm03, arm04
Purpose::
Information on working with the arm SOCs
== Description
We have 4 arm chassis in phx2, each containing 24 SOCs (System On Chip).
Each chassis has 2 physical network connections going out from it. The
first one is used for the management interface on each SOC. The second
one is used for eth0 for each SOC.
Current allocations (2016-03-11):
arm01::
primary builders attached to koji.fedoraproject.org
arm02::
primary arch builders attached to koji.fedoraproject.org
arm03::
In cloud network, public qa/packager and copr instances
arm04::
primary arch builders attached to koji.fedoraproject.org
== Hardware Configuration
Each SOC has:
* eth0 and eth1 (unused) and a management interface.
* 4 cores
* 4GB ram
* a 300GB disk
SOCs are addressed by:
....
arm{chassisnumber}-builder{number}.arm.fedoraproject.org
....
Where chassisnumber is 01 to 04 and number is 00-23
== PXE installs
Kickstarts for the machines are in the kickstarts repo.
PXE config is on noc01. (or cloud-noc01.cloud.fedoraproject.org for
arm03)
The kickstart installs the latests Fedora and sets them up with a base
package set.
== IPMI tool Management
The SOCs are managed via their mgmt interfaces using a custom ipmitool
as well as a custom python script called 'cxmanage'. The ipmitool
changes have been submitted upstream and cxmanage is under review in
Fedora.
The ipmitool is currently installed on noc01 and it has ability to talk
to them on their management interface. noc01 also serves dhcp and is a
pxeboot server for the SOCs.
However you will need to add it to your path:
....
export PATH=$PATH:/opt/calxeda/bin/
....
Some common commands:
To set the SOC to boot the next time only with pxe:
....
ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org chassis bootdev pxe
....
To set the SOC power off:
....
ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power off
....
To set the SOC power on:
....
ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power on
....
To get a serial over lan console from the SOC:
....
ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org -I lanplus sol activate
....
== DISK mapping
Each SOC has a disk. They are however mapped to the internal 00-23 in a
non direct manner:
....
HDD Bay EnergyCard SOC (Port 1) SOC Num
0 0 3 03
1 0 0 00
2 0 1 01
3 0 2 02
4 1 3 07
5 1 0 04
6 1 1 05
7 1 2 06
8 2 3 11
9 2 0 08
10 2 1 09
11 2 2 10
12 3 3 15
13 3 0 12
14 3 1 13
15 3 2 14
16 4 3 19
17 4 0 16
18 4 1 17
19 4 2 18
20 5 3 23
21 5 0 20
22 5 1 21
23 5 2 22
....
Looking at the system from the front, the bay numbering starts from left
to right.
== cxmanage
The cxmanage tool can be used to update firmware or gather diag info.
Until cxmanage is packaged, you can use it from a python virtualenv:
....
virtualenv --system-site-packages cxmanage
cd cxmanage
source bin/activate
pip install --extra-index-url=http://sources.calxeda.com/python/packages/ cxmanage
<use cxmanage>
deactivate
....
Some cxmanage commands
....
cxmanage sensor arm03-builder00-mgmt.arm.fedoraproject.org
Getting sensor readings...
1 successes | 0 errors | 0 nodes left | .
MP Temp 0
arm03-builder00-mgmt.arm.fedoraproject.org: 34.00 degrees C
Minimum : 34.00 degrees C
Maximum : 34.00 degrees C
Average : 34.00 degrees C
... (and about 20 more sensors)...
....
....
cxmanage info arm03-builder00-mgmt.arm.fedoraproject.org
Getting info...
1 successes | 0 errors | 0 nodes left | .
[ Info from arm03-builder00-mgmt.arm.fedoraproject.org ]
Hardware version : EnergyCard X04
Firmware version : ECX-1000-v2.1.5
ECME version : v0.10.2
CDB version : v0.10.2
Stage2boot version : v1.1.3
Bootlog version : v0.10.2
A9boot version : v2012.10.16-3-g66a3bf3
Uboot version : v2013.01-rc1_cx_2013.01.17
Ubootenv version : v2013.01-rc1_cx_2013.01.17
DTB version : v3.7-4114-g34da2e2
....
firmware update:
....
cxmanage --internal-tftp 10.5.126.41:6969 --all-nodes fwupdate package ECX-1000_update-v2.1.5.tar.gz arm03-builder00-mgmt.arm.fedoraproject.org
....
(note that this runs against the 00 management interface for the chassis
and updates all the nodes), and that we must run a tftpserver on port
6969 for firewall handling.
== Links
http://sources.calxeda.com/python/packages/cxmanage/
== Contacts
help.desk@boston.co.uk is the contact to send repair requests to.

View file

@ -0,0 +1,359 @@
= Ask Fedora SOP
To set up https://ask.fedoraproject.org based on Askbot as a question
and answer support forum for the Fedora community. A production instance
could be seen at https://ask.fedoraproject.org and the staging instance
is at http://ask.stg.fedoraproject.org/
This page describes how to set up and customize it from scratch.
== Contents
[arabic]
. Contact Information
. Creating database
. Setting up the forum
. Adding administrators
. Change settings within the forum
. Database tweaks
. Debugging
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
anyone from the sysadmin team
Sponsor::
nirik
Location::
phx2
Servers::
ask01 , ask01.stg
Purpose::
To host Ask Fedora
== Creating database
We use the postgresql database backend. To add the database to a
postgresql server:
....
# psql -U postgres
postgres# create user askfedora with password 'xxx';
postgres# create database askfedora;
postgres# ALTER DATABASE askfedora owner to askfedora;
postgres# \q;
....
Now setup the db tables if this is a new install:
....
python manage.py syncdb
python manage.py migrate askbot
python manage.py migrate django_authopenid #embedded login application
....
== Setting up the forum
Askbot is packaged and available in Rawhide, Fedora 16 and EPEL 6. On a
RHEL 6 system, you need to install EPEL 6 repo first.:
....
# yum install askbot
....
The /etc/askbot/sites/ask/conf/settings.py file should look something
like:
....
DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = 'testaskbot'
DATABASE_USER = 'askbot'
DATABASE_PASSWORD = 'xxxxx'
DATABASE_HOST = '127.0.0.1'
DATABASE_PORT = '5432'
# Outgoing mail server settings
#
DEFAULT_FROM_EMAIL = 'askfedora@fedoraproject.org'
EMAIL_SUBJECT_PREFIX = '[Askfedora]'
EMAIL_HOST='127.0.0.1'
EMAIL_PORT='25'
# This variable points to the Askbot plugin which will be used for user
# authentication. Not enabled yet because we don't need FAS auth but use
# Fedora id as a openid provider.
#
# ASKBOT_CUSTOM_AUTH_MODULE = 'authfas'
Now Ask Fedora website should be accessible from the browser.
....
== Adding administrators
As of Askbot version 0.7.21, the first user who logs in automatically
becomes the administrator. In previous versions, you have to do the
following.:
....
# cd /etc/askbot/sites/ask/conf/
# python manage.py add_admin 1
Do you really wish to make user (id=1, name=pjp) a site administrator?
yes/no: yes
....
Once a user is marked as a administrator, he or she can go into anyone's
profile, go the "moderation" tab in the end and mark them as
administrator or moderator as well as block or suspend a user.
== Change settings within the forum
* {blank}
+
Data entry and display:::
** Disable "Allow asking questions anonymously"
** Enable "Force lowercase the tags"
** Change "Format of tag list" to "cloud"
** Change "Minimum length of search term for Ajax search" to "3"
** Change "Number of questions to list by default" to "50"
** Change "What should "unanswered question" mean?" to "Question has
no
** answers"
* {blank}
+
Email and email alert settings::
** Change "Default news notification frequency" to "Instantly"
* {blank}
+
Flatpages - about, privacy policy, etc.::
Change "Text of the Q&A forum About page (html format)" to the
following:
+
....
Ask Fedora provides a community edited knowledge base and support forum
for the Fedora community. Make sure you read the FAQ and search for
existing questions before asking yours. If you want to provide feedback,
just a question in this site! Tag your questions "meta" to highlight your
questions to the administrators of Ask Fedora.
....
* {blank}
+
Login provider settings::
** Disable "Activate local login"
* {blank}
+
Q&A forum website parameters and urls::
** {blank}
+
Change "Site title for the Q&A forum" to "Ask Fedora: Community
Knowledge;;
Base and Support Forum"
** {blank}
+
Change "Comma separated list of Q&A site keywords" to "Ask Fedora,
forum,;;
community, support, help"
** {blank}
+
Change "Copyright message to show in the footer" to "All content is
under;;
Creative Commons Attribution Share Alike License. Ask Fedora is
community maintained and Red Hat or Fedora Project is not
responsible for content"
** {blank}
+
Change "Site description for the search engines" to "Ask Fedora:
Community;;
Knowledge Base and Support Forum"
** Change "Short name for your Q&A forum" to "Ask Fedora"
** {blank}
+
Change "Base URL for your Q&A forum, must start with http or https"
to;;
"http://ask.fedoraproject.org"
* {blank}
+
Sidebar widget settings - main page::
** Disable "Show avatar block in sidebar"
** Disable "Show tag selector in sidebar"
* Skin and User Interface settings
** Upload "Q&A site logo"
** Upload "Site favicon". Must be a ICO format file because that is the
only one IE supports as a fav icon.
** Enable "Apply custom style sheet (CSS)"
** Upload the following custom CSS:
+
....
#ab-main-nav a {
color: #333333;
background-color: #d8dfeb;
border: 1px solid #888888;
border-bottom: none;
padding: 0px 12px 3px 12px;
height: 25px;
line-height: 30px;
margin-right: 10px;
font-size: 18px;
font-weight: 100;
text-decoration: none;
display: block;
float: left;
}
#ab-main-nav a.on {
height: 24px;
line-height: 28px;
border-bottom: 1px solid #0a57a4;
border-right: 1px solid #0a57a4;
border-top: 1px solid #0a57a4;
border-left: 1px solid #0a57a4; /*background:#A31E39; */
background: #0a57a4;
color: #FFF;
font-weight: 800;
text-decoration: none
}
#ab-main-nav a.special {
font-size: 18px;
color: #072b61;
font-weight: bold;
text-decoration: none;
}
/* tabs stuff */
.tabsA { float: right; }
.tabsC { float: left; }
.tabsA a.on, .tabsC a.on, .tabsA a:hover, .tabsC a:hover {
background: #fff;
color: #072b61;
border-top: 1px solid #babdb6;
border-left: 1px solid #babdb6;
border-right: 1px solid #888a85;
border-bottom: 1px solid #888a85;
height: 24px;
line-height: 26px;
margin-top: 3px;
}
.tabsA a.rev.on, tabsA a.rev.on:hover {
padding: 0px 2px 0px 7px;
}
.tabsA a, .tabsC a{
background: #f9f7eb;
border-top: 1px solid #eeeeec;
border-left: 1px solid #eeeeec;
border-right: 1px solid #a9aca5;
border-bottom: 1px solid #888a85;
color: #888a85;
display: block;
float: left;
height: 20px;
line-height: 22px;
margin: 5px 0 0 4px;
padding: 0 7px;
text-decoration: none;
}
.tabsA .label, .tabsC .label {
float: left;
font-weight: bold;
color: #777;
margin: 8px 0 0 0px;
}
.tabsB a {
background: #eee;
border: 1px solid #eee;
color: #777;
display: block;
float: left;
height: 22px;
line-height: 28px;
margin: 5px 0px 0 4px;
padding: 0 11px 0 11px;
text-decoration: none;
}
a {
color: #072b61;
text-decoration: none;
cursor: pointer;
}
div.side-box
{
width:200px;
padding:10px;
border:3px solid #CCCCCC;
margin:0px;
background: -moz-linear-gradient(top, #DDDDDD, #FFFFFF);
}
....
== Database tweaks
To automatically delete expired sessions, we run a trigger that makes
PostgreSQL delete them upon inserting a new one.
The code used to create this trigger was:
....
askfedora=# CREATE FUNCTION delete_old_sessions() RETURNS trigger
askfedora-# LANGUAGE plpgsql
askfedora-# AS $$
askfedora$# BEGIN
askfedora$# DELETE FROM django_session WHERE expire_date<current_timestamp;
askfedora$# RETURN NEW;
askfedora$# END
askfedora$# $$;
CREATE FUNCTION
askfedora=# CREATE TRIGGER old_sessions_gc
askfedora-# AFTER INSERT ON django_session
askfedora-# EXECUTE PROCEDURE delete_old_sessions();
....
In case this trigger causes any problems, please remove it by running:
`DROP TRIGGER old_sessions_gc;`
To make this perform, we have a custom index that's not in upstream
askbot, please remember to add that when recreating the trigger:
....
CREATE INDEX CONCURRENTLY django_session_expire_date ON django_session (expire_date);
....
If you deleted the trigger, or reinstalled without trigger, please make
sure to run `manage.py clean_sessions` regularly, so you don't end up
with a database that's too massive in size.
== Debugging
Set DEBUG to True in settings.py file and restart Apache.
== Auth issues
Users can login to ask with a variety of social media accounts. Once
they login with one they can attach other ones as well.
If a user forgets what social media they used, you can look in the
database:
Login to database host (db01.phx2.fedoraproject.org) # sudo -u postgres
psql askfedora psql> select * from django_authopenid_userassociation
where user_id like '%username%';
If they can login again with the same auth, ask them to do so. If not,
you can add the fedora account system openid auth to allow them to login
with that:
psql> insert into django_authopenid_userassociation (user_id,
openid_url,provider_name) VALUES (2595,
'http://name.id.fedoraproject.org', 'fedoraproject');
Use the ID from the previous query and replace name with the users fas
name.

View file

@ -0,0 +1,152 @@
= Amazon Web Services Access
AWS includes a highly granular set of access policies, which can be
combined into roles and groups. Ipsilon is used to translate between IAM
policy groupings and groups in the Fedora Account System (FAS). Tags and
namespaces are used to keep roles resources seperate.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
nirik, pfrields
Location::
?
Servers::
N/A
Purpose::
Provide AWS resource access to contributors via FAS group membership.
== Accessing the AWS Console
To access the AWS Console via Ipsilon authentication, use
https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com[this
SAML link].
You must be in the
https://admin.fedoraproject.org/accounts/group/view/aws-iam[aws-iam FAS
group] (or another group with access) to perform this action.
=== Adding a role to AWS IAM
Sign into AWS via the URL above, and visit
https://console.aws.amazon.com/iam/home[Identity and Access Management
(IAM)] in the Security, Identity and Compliance tools.
Choose Roles to view current roles. Confirm there is not already a role
matching the one you need. If not, create a new role as follows:
[arabic]
. Select _Create role_.
. Select _SAML 2.0 federation_.
. Choose the SAML provider _id.fedoraproject.org_, which should already
be populated as a choice from previous use.
. Select the attribute _SAML:aud_. For value, enter
_https://signin.aws.amazon.com/saml_. Do not add a condition. Proceed to
the next step.
. Assign the appropriate policies from the pre-existing IAM policies.
It's unlikely you'll have to create your own, which is outside the scope
of this SOP. Then proceed to the next step.
. Set the role name and description. It is recommended you use the
_same_ role name as the FAS group for clarity. Fill in a longer
description to clarify the purpose of the role. Then choose _Create
role_.
Note or copy the Role ARN (Amazon Resource Name) for the new role.
You'll need this in the mapping below.
=== Adding a group to FAS
When finished, login to FAS and create a group to correspond to the new
role. Use the prefix _aws-_ to denote new AWS roles in FAS. This makes
them easier to locate in a search.
It may be appropriate to set group ownership for _aws-_ groups to an
Infrastructure team principal, and then add others as users or sponsors.
This is especially worth considering for groups that have modify (full)
access to an AWS resource.
=== Adding an IAM role mapping in Ipsilon
Add the new role mapping for FAS group to Role ARN in the ansible git
repo, under _roles/ipsilon/files/infofas.py_. Current mappings look like
this:
....
aws_groups = {
'aws-master': 'arn:aws:iam::125523088429:role/aws-master',
'aws-iam': 'arn:aws:iam::125523088429:role/aws-iam',
'aws-billing': 'arn:aws:iam::125523088429:role/aws-billing',
'aws-atomic': 'arn:aws:iam::125523088429:role/aws-atomic',
'aws-s3-readonly': 'arn:aws:iam::125523088429:role/aws-s3-readonly'
}
....
Add your mapping to the dictionary as shown. Start a new build/rollout
of the ipsilon project in openshift to make the changes live.
=== User accounts
If you only need to use the web interface to aws, a role (and associated
policy) should be all you need, however, if you need cli access, you
will need a user and a token. Users should be named the same as the role
they are associated with.
=== Role and User policies
Each Role (and user if there is a user needed for the role) should have
the same policy attached to it. Policies are named
'fedora-$rolename-$service' ie, 'fedora-infra-ec2'. A copy of polices is
available in the ansible repo under files/aws/iam/policies. These are in
json form.
Policies are setup such that roles/users can do most things with a
resource if it's untagged. If it's tagged it MUST be tagged with their
group: FedoraGroup / $groupname. If it's tagged with another group name,
they cannot do anything with or to that resource. (Aside from seeing it
exists).
If there's a permssion you need, please file a ticket and it will be
evaluated.
Users MUST keep tokens private and secure. YOU are responsible for all
use of tokens issued to you from Fedora Infrastructure. Report any
compromised or possibly public tokens as soon as you are aware.
Users MUST tag resources with their FedoraGroup tag within one day, or
the resource may be removed.
=== ec2
users/roles with ec2 permissions should always tag their instances with
their FedoraGroup as soon as possible. Untagged resources can be
terminated at any time.
=== s3
users/roles with s3 permissions will be given specific bucket(s) that
they can manage/use. Care should be taken to make sure nothing in them
is public that should not be.
=== cloudfront
Please file a ticket if you need cloudfront and infrastructure will do
any needed setup if approved.
== Regions
Users/groups are encouraged to use regions 'near' them or wherever makes
the most sense. If you are trying to create ec2 instances you will need
infrastructure to create a vpc in the region with network, etc. File a
ticket for such requests.
== Other Notes
AWS resource access that is not read-only should be treated with care.
In some cases, Amazon or other entities may absorb AWS costs, so changes
in usage can cause issues if not controlled or monitored. If you have
doubts about access, consult the Fedora Project Leader or Fedora
Engineering Manager.

View file

@ -0,0 +1,118 @@
= Basset anti-spam service
Since the Fedora Project has come under targeted spam attacks, we have
decided to create a service that all our applications can hook into to
have a central repository for anti-spam procedures. Basset is this
service, and it's hosted on https://pagure.io/basset.
== Contents
[arabic]
. Contact Information
. Overview
. FAS
. Trac
. Wiki
. Setup
. Outage
== Contact Information
Owner::
Patrick Uiterwijk (puiterwijk)
Contact::
#fedora-admin, #fedora-apps, #fedora-noc, sysadmin-main
Location::
basset01
Purpose::
Centralized anti-spam
== Overview
Basset is a central anti-spam service: it received messages from
services that certain actions happened, and will then decide to accept
or deny the request, or pass it on to an administrator.
At the moment, we have the following modules live: FAS, trac, wiki.
== FAS
This module receives notifications from FAS about new users
registrations and new users signing the FPCA. With Basset enabled, FAS
will not automatically accept a new user registration or a FPCA signing,
but instead let Basset know a user tried to perform these actions and
then depend on Basset to enact this.
In the case of registration this is done by setting the user to a
spamcheck_awaiting status. As soon as Basset made a decision, it will
set the user to spamcheck_manual, spamcheck_denied or active. If it sets
the user to active, it will also send the welcome email to the user. If
it made a wrong decision, and the user is set as spamcheck_manual or
spamcheck_denied, a member of the accounts team can go to that users'
page and click the "Enable" button to override the decision. If this
needed to be done, please notify puiterwijk so that the rules Basset
uses can be updated.
For the case of the FPCA, FAS will request the cla_fpca group
membership, but not sponsor the user. At the moment that Basset decides
it accepts the request, it will sponsor the user into the group. If it
declined the FPCA request, it will remove the user from the group. To
override this decision, a member of the accounts group can go to FAS and
manually add the user to the cla_fpca group and sponsor them into it.
== Trac
For Trac, if a post gets denied, the content item gets deleted, the Trac
account gets blocked cross-instance and the FAS account gets blocked.
To unblock the user, log in to hosted03, and remove
/srv/web/trac/blocks/$username. For info on how to unblock the FAS user,
see the notes under FAS.
== Wiki
For Wiki, if an edit gets denied, the page gets deleted, the wiki
account blocked and the FAS account gets blocked.
For the wiki parts of undoing this, follow the regular mediawiki unblock
procedures using:::
* https://fedoraproject.org/wiki/Special:BlockList to check if an user
is blocked or not
* https://fedoraproject.org/wiki/Special:Unblock to unblock that user
Don't forget to unblock the account as in FAS.
== Setup
At this moment, Basset runs on a single server (basset01(.stg)), and
runs the frontend, message broker and worker all on a single server. For
all of it to work, the following services are used: - httpd (frontend) -
rabbitmq-server (broker) - mongod (mongo database server for storage of
internal info) - basset-worker (worker)
== Outage
The consequences of certain services not being up results in various
conditions:
If the httpd or frontend aren't up, no new messages will come in. FAS
will set the user to spamcheck_awaiting, but not submit it to Basset.
Work is in progress on a script to submit such entries to the queue
after Basset frontend is back. However, since this part of the code is
so small, this is not likely to be the part that's down. (You can know
that it is because the FAS logs will log an error instead of "result:
checking".)
If the worker or the mongo server are down, no messages will be
processed, but all messages queued up will be processed the moment both
of the services start again: as long as a message makes it into the
queue, it will be processed until completion.
If the worker encounters an error during processing of a message, it
will dump a tracedump into the journal log file, and stop processing any
messages. Resolve the condition reported in the error and restart the
basset-worker service, and all work will be continued, starting with the
message it was processing when it errored out.
This means that as long as the message is queued, the worker will pick
it up and handle it.

View file

@ -0,0 +1,31 @@
= Fedora Bastion Hosts
== Description
There are 2 primary bastion hosts in the phx2 datacenter. One will be
active at any given time and the second will be a hot spare, ready to
take over. Switching between bastion hosts is currently a manual process
that requires changes in ansible.
There is also a bastion-comm01 bastion host for the qa.fedoraproject.org
network. This is used in cases where users only need to access resources
in that qa.fedoraproject.org.
All of the bastion hosts have an external IP that is mapped into them.
The reverse dns for these IPs is controlled by RHIT, so any changes must
be carefully coordinated.
The active bastion host performs the following functions:
* Outgoing smtp from fedora servers. This includes email aliases,
mailing list posts, build and commit notices, mailing list posts, etc.
* Incoming smtp from servers in phx2 or on the fedora vpn. Incoming mail
directly from the outside is NOT accepted or forwarded.
* ssh access to all phx2/vpn connected servers.
* openvpn hub. This is the hub that all vpn clients connect to and talk
to each other via. Taking down or stopping this service will be a major
outage of services as all proxy and app servers use the vpn to talk to
each other.
When rebuilding these machines, care must be taken to match up the dns
names externally, and to preserve the ssh host keys.

View file

@ -0,0 +1,52 @@
= BladeCenter Access Infrastructure SOP
Many of the builders in PHX are blades in a blade center. A few other
machines are also on blades.
== Contents
[arabic]
. Contact Information
. Common Tasks
____
[arabic]
. Logging into the web interface
. Using the Serial Console of Blades
____
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
PHX
Purpose::
Contains blades used for buildsystems, etc
== Common Tasks
=== Logging into the web interface
The web interface to the bladecenters let you reset power, etc. They are
bc01-mgmt and bc02-mgmt.
=== Using the Serial Console of Blades
All of the blades are set up with a serial console over lan (SOL). To
use this, ssh into the bladecenter. You can then pick your system and
bring up a console with:
....
env -T system:blade[x]
console -o
....
where x is the blade number (can be determined from web interface, etc)
To leave the console session, press Esc (
For more details on BladeCenter SOL, see
http://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=MIGR-54666

View file

@ -0,0 +1,157 @@
= Blockerbugs Infrastructure SOP
https://pagure.io/fedora-qa/blockerbugs[Blockerbugs] is an app developed
by Fedora QA to aid in tracking items related to release blocking and
freeze exception bugs in branched Fedora releases.
== Contents
[arabic]
. Contact Information
. File Locations
. Upgrade Process
* Upgrade Preparation (for all upgrades)
* Minor Upgrade (no db change)
* Major Upgrade (with db changes)
== Contact Information
Owner::
Fedora QA Devel
Contact::
#fedora-qa
Location::
Phoenix
Servers::
blockerbugs01.phx2, blockerbugs02.phx2, blockerbugs01.stg.phx2
Purpose::
Hosting the https://pagure.io/fedora-qa/blockerbugs[blocker bug
tracking application] for QA
== File Locations
`/etc/blockerbugs/settings.py` - configuration for the app
=== Node Roles
blockerbugs01.stg.phx2::
the staging instance, it is not load balanced
blockerbugs01.phx2::
one of the load balanced production nodes, it is responsible for
running bugzilla/bodhi/koji sync
blockerbugs02.phx2::
the other load balanced production node. It does not do any sync
operations
== Building for Infra
=== Do not use mock
For whatever reason, the `epel7-infra` koji tag rejects SRPMs with the
`el7.centos` dist tag. Make sure that you build SRPMs with:
....
rpmbuild -bs --define='dist .el7' blockerbugs.spec
....
Also note that this expects the release tarball to be in
`~/rpmbuild/SOURCES/`.
=== Building with Koji
You'll need to ask someone who has rights to build into `epel7-infra`
tag to make the build for you:
....
koji build epel7-infra blockerbugs-0.4.4.11-1.el7.src.rpm
....
[NOTE]
.Note
====
The fun bit of this is that `python-flask` is only available on `x86_64`
builders. If your build is routed to one of the non-x86_64, it will
fail. The only solution available to us is to keep submitting the build
until it's routed to one of the x86_64 builders and doesn't fail.
====
Once the build is complete, it should be automatically tagged into
`epel7-infra-stg` (after a ~15 min delay), so that you can test it on
blockerbugs staging instance. Once you've verified it's working well,
ask someone with infra rights to move it to `epel7-infra` tag so that
you can update it in production.
== Upgrading
Blockerbugs is currently configured through ansible and all
configuration changes need to be done through ansible.
=== Upgrade Preparation (all upgrades)
Blockerbugs is not packaged in epel, so the new build needs to exist in
the infrastructure stg repo for deployment to stg or the infrastructure
repo for deployments to production.
See the blockerbugs documentation for instructions on building a
blockerbugs RPM.
=== Minor Upgrades (no database changes)
Run the following on *both* `blockerbugs01.phx2` and
`blockerbugs02.phx2` if updating in production.
[arabic]
. Update ansible with config changes, push changes to the ansible repo:
+
....
roles/blockerbugs/templates/blockerbugs-settings.py.j2
....
. Clear yum cache and update the blockerbugs RPM:
+
....
yum clean expire-cache && yum update blockerbugs
....
. Restart httpd to reload the application:
+
....
service httpd restart
....
=== Major Upgrades (with database changes)
Run the following on *both* `blockerbugs01.phx2` and
`blockerbugs02.phx2` if updating in production.
[arabic]
. Update ansible with config changes, push changes to the ansible repo:
+
....
roles/blockerbugs/templates/blockerbugs-settings.py.j2
....
. Stop httpd on *all* relevant instances (if load balanced):
+
....
service httpd stop
....
. Clear yum cache and update the blockerbugs RPM on all relevant
instances:
+
....
yum clean expire-cache && yum update blockerbugs
....
. Upgrade the database schema:
+
....
blockerbugs upgrade_db
....
. Check the upgrade by running a manual sync to make sure that nothing
unexpected went wrong:
+
....
blockerbugs sync
....
. Start httpd back up:
+
....
service httpd start
....

View file

@ -0,0 +1,429 @@
= Bodhi Infrastructure SOP
Bodhi is used by Fedora developers to submit potential package updates
for releases and to manage buildroot overrides. From here, bodhi handles
all of the dirty work, from sending around emails, dealing with Koji, to
composing the repositories.
Bodhi production instance: https://bodhi.fedoraproject.org Bodhi project
page: https://github.com/fedora-infra/bodhi
== Contents
[arabic]
. Contact Information
. Adding a new pending release
. 0-day Release Actions
. Configuring all bodhi nodes
. Pushing updates
. Monitoring the bodhi composer output
. Resuming a failed push
. Performing a production bodhi upgrade
. Syncing the production database to staging
. Release EOL
. Adding notices to the front page or new update form
. Using the Bodhi Shell to modify updates by hand
. Using the Bodhi shell to fix uniqueness problems with e-mail addresses
. Troubleshooting and Resolution
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
bowlofeggs
Location::
Phoenix
Servers::
* bodhi-backend01.phx2.fedoraproject.org (composer)
* os.fedoraproject.org (web front end and backend task workers for
non-compose tasks)
* bodhi-backend01.stg.phx2.fedoraproject.org (staging composer)
* os.stg.fedoraproject.org (staging web front end and backend task
workers for non-compose tasks)
Purpose::
Push package updates, and handle new submissions.
== Adding a new pending release
Adding and modifying releases is done using the
[.title-ref]#bodhi-manage-releases# tool.
You can add a new pending release by running this command:
....
bodhi-manage-releases create --name F23 --long-name "Fedora 23" --id-prefix FEDORA --version 23 --branch f23 --dist-tag f23 --stable-tag f23-updates --testing-tag f23-updates-testing --candidate-tag f23-updates-candidate --pending-stable-tag f23-updates-pending --pending-testing-tag f23-updates-testing-pending --override-tag f23-override --state pending
....
== Pre-Beta Bodhi config
Enable pre_beta policy in bodhi config in ansible.::::
ansible/roles/bodhi2/base/templates/production.ini.j2
Uncomment or add the following lines:
....
#f29.status = pre_beta
#f29.pre_beta.mandatory_days_in_testing = 3
#f29.pre_beta.critpath.min_karma = 1
#f29.pre_beta.critpath.stable_after_days_without_negative_karma = 14
....
== Post-Beta Bodhi config
Enable post_beta policy in bodhi config in ansible.::::
ansible/roles/bodhi2/base/templates/production.ini.j2
Comment or remove the following lines corresponding to pre_beta policy:
....
#f29.status = pre_beta
#f29.pre_beta.mandatory_days_in_testing = 3
#f29.pre_beta.critpath.min_karma = 1
#f29.pre_beta.critpath.stable_after_days_without_negative_karma = 14
....
Uncomment or add the following lines for post_beta policy
....
#f29.status = post_beta
#f29.post_beta.mandatory_days_in_testing = 7
#f29.post_beta.critpath.min_karma = 2
#f29.post_beta.critpath.stable_after_days_without_negative_karma = 14
....
== 0-day Release Actions
* update atomic config
* run the ansible playbook
Going from pending to a proper release in bodhi requires a few steps:
Change state from pending to current:
....
bodhi-manage-releases edit --name F23 --state current
....
You may also need to disable any pre-beta or post-beta policy defined in
the bodhi config in ansible.:
....
ansible/roles/bodhi2/base/templates/production.ini.j2
....
Uncomment or remove the lines related to pre and post beta polcy
....
#f29.status = post_beta
#f29.post_beta.mandatory_days_in_testing = 7
#f29.post_beta.critpath.min_karma = 2
#f29.post_beta.critpath.stable_after_days_without_negative_karma = 14
#f29.status = pre_beta
#f29.pre_beta.mandatory_days_in_testing = 3
#f29.pre_beta.critpath.min_karma = 1
#f29.pre_beta.critpath.stable_after_days_without_negative_karma = 14
....
== Configuring all bodhi nodes
Run this command from the [.title-ref]#ansible# checkout to configure
all of bodhi in production:
....
# This will configure the backends
$ sudo rbac-playbook playbooks/groups/bodhi2.yml
# This will configure the frontend
$ sudo rbac-playbook openshift-apps/bodhi.yml
....
== Pushing updates
SSH into the [.title-ref]#bodhi-backend01# machine and run:
....
$ sudo -u apache bodhi-push
....
You can restrict the updates by release and/or request:
....
$ sudo -u apache bodhi-push --releases f23,f22 --request stable
....
You can also push specific builds:
....
$ sudo -u apache bodhi-push --builds openssl-1.0.1k-14.fc22,openssl-1.0.1k-14.fc23
....
This will display a list of updates that are ready to be pushed.
== Monitoring the bodhi composer output
You can monitor the bodhi composer via the `bodhi` CLI tool, or via the
systemd journal on `bodhi-backend01`:
....
# From the comfort of your own laptop.
$ bodhi composes list
# From bodhi-backend01
$ journalctl -f -u fedmsg-hub
....
== Resuming a failed push
If a push fails for some reason, you can easily resume it on
`bodhi-backend01` by running:
....
$ sudo -u apache bodhi-push --resume
....
== Performing a bodhi upgrade
=== Build Bodhi
Bodhi is deployed from the infrastructure Koji repositories. At the time
of this writing, it is deployed from the `f29-infra` and `f29-infra-stg`
(for staging) repositories. Bodhi is built for these repositories from
the `master` branch of the
https://src.fedoraproject.org/rpms/bodhi[bodhi dist-git repository].
As an example, to build a Bodhi beta for the `f29-infra-stg` repository,
you can use this command:
....
$ rpmbuild --define "dist .fc29.infra" -bs bodhi.spec
Wrote: /home/bowlofeggs/rpmbuild/SRPMS/bodhi-3.13.0-0.0.beta.e0ca5bc.fc29.infra.src.rpm
$ koji build f29-infra /home/bowlofeggs/rpmbuild/SRPMS/bodhi-3.13.0-0.0.beta.e0ca5bc.fc29.infra.src.rpm
....
When building a Bodhi release that is intended for production, we should
build from the production dist-git repo instead of uploading an SRPM:
....
$ koji build f29-infra git+https://src.fedoraproject.org/rpms/bodhi.git#d64f40408876ec85663ec52888c4e44d92614b37
....
All builds against the `f29-infra` build target will go into the
`f29-infra-stg` repository. If you wish to promote a build from staging
to production, you can do something like this command:
....
$ koji move-build f29-infra-stg f29-infra bodhi-3.13.0-1.fc29.infra
....
=== Staging
The upgrade playbook will apply configuration changes after running the
alembic upgrade. Sometimes you may need changes applied to the Bodhi
systems in order to get the upgrade playbook to succeed. If you are in
this situation, you can apply those changes by running the bodhi-backend
playbook:
....
sudo rbac-playbook -l staging groups/bodhi-backend.yml
....
In the [.title-ref]#os_masters inventory#
<https://pagure.io/fedora-infra/ansible/blob/main/f/inventory/group_vars/os_masters_stg>_,
edit the `bodhi_version` setting it to the version you wish to deploy to
staging. For example, to deploy `bodhi-3.13.0-1.fc29.infra` to staging,
I would set that varible like this:
....
bodhi_version: "bodhi-3.13.0-1.fc29.infra"
....
Run these commands:
....
# Synchronize the database from production to staging
$ sudo rbac-playbook manual/staging-sync/bodhi.yml -l staging
# Upgrade the Bodhi backend on staging
$ sudo rbac-playbook manual/upgrade/bodhi.yml -l staging
# Upgrade the Bodhi frontend on staging
$ sudo rbac-playbook openshift-apps/bodhi.yml -l staging
....
=== Production
The upgrade playbook will apply configuration changes after running the
alembic upgrade. Sometimes you may need changes applied to the Bodhi
systems in order to get the upgrade playbook to succeed. If you are in
this situation, you can apply those changes by running the bodhi-backend
playbook:
....
sudo rbac-playbook groups/bodhi-backend.yml -l bodhi-backend
....
In the [.title-ref]#os_masters inventory#
<https://pagure.io/fedora-infra/ansible/blob/main/f/inventory/group_vars/os_masters>_,
edit the `bodhi_version` setting it to the version you wish to deploy to
production. For example, to deploy `bodhi-3.13.0-1.fc29.infra` to
production, I would set that varible like this:
....
bodhi_version: "bodhi-3.13.0-1.fc29.infra"
....
To update the bodhi RPMs in production:
....
# Update the backend VMs (this will also run the migrations, if any)
$ sudo rbac-playbook manual/upgrade/bodhi.yml -l bodhi-backend
# Update the frontend
$ sudo rbac-playbook openshift-apps/bodhi.yml
....
== Syncing the production database to staging
This can be useful for testing issues with production data in staging:
....
$ sudo rbac-playbook manual/staging-sync/bodhi.yml -l staging
....
== Release EOL
....
bodhi-manage-releases edit --name F21 --state archived
....
== Adding notices to the front page or new update form
You can easily add notification messages to the front page of bodhi
using the [.title-ref]#frontpage_notice# option in
[.title-ref]#ansible/roles/bodhi2/base/templates/production.ini.j2#. If
you want to flash a message on the New Update Form, you can use the
[.title-ref]#newupdate_notice# variable instead. This can be useful for
announcing things like service outages, etc.
== Using the Bodhi Shell to modify updates by hand
The "bodhi shell" is a Python shell with the SQLAlchemy session and
transaction manager initialized. It can be run from any
production/staging backend instance and allows you to modify any models
by hand.
....
sudo pshell /etc/bodhi/production.ini
# Execute a script that sets up the `db` and provides a `delete_update` function.
# This will eventually be shipped in the bodhi package, but can also be found here.
# https://raw.githubusercontent.com/fedora-infra/bodhi/develop/tools/shelldb.py
>>> execfile('shelldb.py')
....
At this point you have access to a [.title-ref]#db# SQLAlchemy Session
instance, a [.title-ref]#t# [.title-ref]#transaction# module, and
[.title-ref]#m# for the [.title-ref]#bodhi.models#.
....
# Fetch an update, and tweak it as necessary.
>>> up = m.Update.get(u'u'FEDORA-2016-4d226a5f7e', db)
# Commit the transaction
>>> t.commit()
....
Here is an example of merging two updates together and deleting the
original.
....
>>> up = m.Update.get(u'FEDORA-2016-4d226a5f7e', db)
>>> up.builds
[<Build {'epoch': 0, 'nvr': u'resteasy-3.0.17-2.fc24'}>, <Build {'epoch': 0, 'nvr': u'pki-core-10.3.5-1.fc24'}>]
>>> b = up.builds[0]
>>> up2 = m.Update.get(u'FEDORA-2016-5f63a874ca', db)
>>> up2.builds
[<Build {'epoch': 0, 'nvr': u'resteasy-3.0.17-3.fc24'}>]
>>> up.builds.remove(b)
>>> up.builds.append(up2.builds[0])
>>> delete_update(up2)
>>> t.commit()
....
== Using the Bodhi shell to fix uniqueness problems with e-mail addresses
Bodhi currently enforces uniqueness on user e-mail addresses. There is
https://github.com/fedora-infra/bodhi/issues/2387[an issue] filed to
drop this upstream, but for the time being the constraint is enforced.
This can be a problem for users who have more than one FAS account if
they make one account use an e-mail address that was previously used by
another account, if that other account has not logged into Bodhi since
it was changed to use a different address. One way the user can fix this
themselves is to log in to Bodhi with the old account so that Bodhi
learns about its new address. However, an admin can also fix this by
hand by using the Bodhi shell.
For example, suppose a user has created `user_1` and `user_2`. Suppose
that `user_1` used to use `email_a@example.com` but has been changed to
use `email_b@example.com` in FAS, and `user_2` is now configured to use
`email_a@example.com` in FAS. If `user_2` attempts to log in to Bodhi,
it will cause a uniqueness violation since Bodhi does not know that
`user_1` has changed to `email_b@example.com`. The user can simply log
in as `user_1` to fix this, which will cause Bodhi to update its e-mail
address to `email_b@example.com`. Or an admin can fix it with a shell on
one of the Bodhi backend servers like this:
....
[bowlofeggs@bodhi-backend02 ~][PROD]$ sudo -u apache pshell /etc/bodhi/production.ini
2018-05-29 20:21:36,366 INFO [bodhi][MainThread] Using python-bugzilla
2018-05-29 20:21:36,367 DEBUG [bodhi][MainThread] Using Koji Buildsystem
2018-05-29 20:21:42,559 INFO [bodhi.server][MainThread] Bodhi ready and at your service!
Python 2.7.14 (default, Mar 14 2018, 13:36:31)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help" for more information.
Environment:
app The WSGI application.
registry Active Pyramid registry.
request Active request object.
root Root of the default resource tree.
root_factory Default root factory used to create `root`.
Custom Variables:
m bodhi.server.models
>>> u = m.User.query.filter_by(name=u'user_1').one()
>>> u.email = u'email_b@example.com'
>>> m.Session().commit()
....
== Troubleshooting and Resolution
=== Atomic OSTree compose failure
If the Atomic OSTree compose fails with some sort of [.title-ref]#Device
or Resource busy# error, then run [.title-ref]#mount# to see if there
are any stray [.title-ref]#tmpfs# mounts still active:
....
tmpfs on /var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq type tmpfs (rw,relatime,seclabel,mode=755)
....
You can then [.title-ref]#umount
/var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq#
and resume the push again.
=== nfs repodata cache IOError
Sometimes you may hit an IOError during the updateinfo.xml generation
process from createrepo_c:
....
IOError: Cannot open /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml: File /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml doesn't exists or not a regular file
....
This issue will be resolved with NFSv4, but in the mean time it can be
worked around by removing the [.title-ref]#.repocache# directory and
resuming the push:
....
rm -fr /mnt/koji/mash/updates/epel7.repocache
....

View file

@ -0,0 +1,122 @@
= Bugzilla Sync Infrastructure SOP
We do not run bugzilla.redhat.com. If bugzilla itself is down we need to
get in touch with Red Hat IT or one of the bugzilla hackers (for
instance, Dave Lawrence (dkl)) in order to fix it.
Infrastructure has some scripts that perform administrative functions on
bugzilla.redhat.com. These scripts sync information from FAS and the
Package Database into bugzilla.
== Contents
[arabic]
. Contact Information
. Description
. Troubleshooting and Resolution
____
[arabic]
. Errors while syncing bugzilla with the PackageDB
____
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
abadger1999
Location::
Phoenix, Denver (Tummy), Red Hat Infrastructure
Servers::
(fas1, app5) => Need to migrate these to bapp1, bugzilla.redhat.com
Purpose::
Sync Fedora information to bugzilla.redhat.com
== Description
At present there are two scripts that sync information from Fedora into
bugzilla.
=== export-bugzilla.py
`export-bugzilla.py` is the first script. It is responsible for syncing
Fedora Accounts into bugzilla. It adds Fedora packages and bug triagers
into a bugzilla group that gives the users extra permissions within
bugzilla. This script is run off of a cron job on FAS1. The source code
resides in the FAS git repo in `fas/scripts/export-bugzilla.*` however
the code we run on the servers presently lives in ansible:
....
roles/fas_server/files/export-bugzilla
....
=== pkgdb-sync-bugzilla
The other script is pkgdb-sync-bugzilla. It is responsible for syncing
the package owners and cclists to bugzilla from the pkgdb. The script
runs off a cron job on app5. The source code is in the packagedb bzr
repo is
`packagedb/fedora-packagedb-stable/server-scripts/pkgdb-sync-bugzilla.*`.
Just like FAS, a separate copy is presently installed from ansbile to
`/usr/local/bin/pkgdb-sync-bugzilla` but that should change ASAP as the
present fedora-packagedb package installs
`/usr/bin/pkgdb-sync-bugzilla`.
== Troubleshooting and Resolution
=== Errors while syncing bugzilla with the PackageDB
One frequent problem is that people will sign up to watch a package in
the packagedb but their email address in FAS isn't a bugzilla email
address. When this happens the scripts that try to sync the packagedb
information to bugzilla encounter an error and send an email like this:
....
Subject: Errors while syncing bugzilla with the PackageDB
The following errors were encountered while updating bugzilla with information
from the Package Database. Please have the problems taken care of:
({'product': u'Fedora', 'component': u'aircrack-ng', 'initialowner': u'baz@zardoz.org',
'initialcclist': [u'foo@bar.org', u'baz@zardoz.org']}, 504, 'The name foo@bar.org is not a
valid username. \n Either you misspelled it, or the person has not\n registered for a
Red Hat Bugzilla account.')
....
When this happens we attempt to contact the person with the problematic
mail address and get them to change it. Here's a boilerplate message:
....
To: foo@bar.org
Subject: Fedora Account System Email vs Bugzilla Email
Hello,
You are signed up to receive bug reports against the aircrack-ng package
in Fedora. Unfortunately, the email address we have for you in the
Fedora Account System is not a valid bugzilla email address. That means
that bugzilla won't send you mail and we're getting errors in the script
that syncs the cclist into bugzilla.
There's a few ways to resolve this:
1) Create a new bugzilla account with the email foo@bar.org as
an account at https://bugzilla.redhat.com.
2) Change an existing account on https://bugzilla.redhat.com to use the
foo@bar.org email address.
3) Change your email address in https://admin.fedoraproject.org/accounts
to use an email address that matches with an existing bugzilla email
address.
Please let me know what you want to do!
Thank you,
....
If the user does not reply someone in the cvsadmin group needs to go
into the pkgdb and remove the user from the cclist for the package.

View file

@ -0,0 +1,71 @@
= bugzilla2fedmsg SOP
Receive events from bugzilla over the RH "unified messagebus" and
rebroadcast them over our own fedmsg bus.
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc
Servers::
bugzilla2fedmsg01
Purpose::
Rebroadcast bugzilla events on our bus.
== Description
bugzilla2fedmsg is a small service running as the 'moksha-hub' process
which receives events from bugzilla via the RH "unified messagebus" and
rebroadcasts them to our fedmsg bus.
[NOTE]
.Note
====
Unlike _all_ of our other fedmsg services, this one runs as the
'moksha-hub' process and not as the 'fedmsg-hub'.
====
The bugzilla2fedmsg package provides a plugin to the moksha-hub that
connects out over the STOMP protocol to a 'fabric' of JBOSS activemq
FUSE brokers living in the Red Hat DMZ. We authenticate with a cert/key
pair that is kept in /etc/pki/fedmsg/. Those brokers should push
bugzilla events over STOMP to our moksha-hub daemon. When a message
arrives, we query bugzilla about the change to get some 'more
interesting' data to stuff in our payload, then we sign the message
using a fedmsg cert and fire it off to the rest of our bus.
This service has no database, no memcached usage. It depends on those
STOMP brokers and being able to query bugzilla.rh.com.
== Relevant Files
All managed by ansible, of course:
____
STOMP config: /etc/moksha/production.ini fedmsg config: /etc/fedmsg.d/
certs: /etc/pki/fedmsg code:
/usr/lib/python2.7/site-packages/bugzilla2fedmsg.py
____
== Useful Commands
To look at logs, run:
....
$ journalctl -u moksha-hub -f
....
To restart the service, run:
....
$ systemctl restart moksha-hub
....
== Internal Contacts
If we need to contact someone from the RH internal "unified messagebus"
team, search for "unified messagebus" in mojo. It is operated as a joint
project between RHIT and PnT Devops. See also the `#devops-message` IRC
channel, internally.

View file

@ -0,0 +1,169 @@
= Fedora OpenStack
== Quick Start
Controller:
....
sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml
....
Compute nodes:
....
sudo rbac-playbook groups/openstack-compute-nodes.yml
....
== Description
If you need to install OpenStack install, either make sure the machine
is clean. Or use `ansible.git/files/fedora-cloud/uninstall.sh` script to
brute force wipe off.
[NOTE]
.Note
====
by default, the script does not wipe LVM group with VM, you have to
clean them manually. There is commented line in that script.
====
On fed-cloud09, remove the file
`/etc/packstack_sucessfully_finished` to enforce run of packstack and
few other commands.
After that wipe, you have to:
....
ifdown eth1
configure eth1 to become normal Ethernet with ip
yum install openstack-neutron-openvswitch
/usr/bin/systemctl restart neutron-ovs-cleanup
ifup eth1
....
Additionally when reprovision OpenStack, all volumes on DellEqualogic
are preserved and you have to manually remove them (or remove them from
OS before it is reprovision). SSH to DellEqualogic (credentials are at
the bottom of `/etc/cinder/cinder.conf`) and run:
....
show (to get list of volumes)
volume select <volume_name> offline
volume delete <volume_name>
....
Before installing make sure:
____
* make sure rdo repo is enabled
* `yum install openstack-packstack openstack-packstack-puppet openstack-puppet-modules`
* {blank}
+
`vim /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`::
and missing parentheses:
+
....
``host_resources.append((ssl_key, 'ssl_ps_server.key'))``
....
____
Now you can run playbook:
....
sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml
....
If you run it after wipe (i.e. db has been reset), you have to:
____
* import ssh keys of users (only possible via webUI - RHBZ 1128233
* reset user passwords
____
== Compute nodes
Compute node is much easier and is written as role. Use:
....
vars_files:
- ... SNIP
- /srv/web/infra/ansible/vars/fedora-cloud.yml
- "{{ private }}/files/openstack/passwords.yml"
roles:
... SNIP
- cloud_compute
....
Define a host variable in `inventory/host_vars/FQDN.yml`:
....
compute_private_ip: 172.23.0.10
....
You should also add IP to `vars/fedora-cloud.yml`
And when adding new compute node, please update
`files/fedora-cloud/hosts`
[IMPORTANT]
.Important
====
When reinstalling make sure you removed all members on Dell Equalogic
(credentials are in /etc/cinder/cinder.conf on compute node) otherwise
the space will be blocked!!!
====
== Updates
Our openstack cloud should have updates applied and reboots when the
rest of our servers are updated and rebooted. This will cause an outage,
please make sure to schedule it.
[arabic]
. Stop copr-backend process on copr-be.cloud.fedoraproject.org
. Kill all copr-builder instances.
. Kill all transient/scratch instances.
. Update all instances we control. copr, persistent, infrastructure, qa
etc.
. Shutdown all instances
. Update and reboot fed-cloud09
. Update and reboot all compute nodes
. Start up all instances that are shutdown in step 5.
TODO: add commands for above as we know them.
== Troubleshooting
* {blank}
+
could not connect to VM? - check your security group, default SG does
not::
allow any connection.
* packstack end up with error, it is likely race condition in puppet -
BZ 1135529. Just run it again.
* {blank}
+
ERROR : append() takes exactly one argument (2 given::
`vi /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`
and add one more surrounding ()
* {blank}
+
Local ip for ovs agent must be set when tunneling is enabled::
restart fed-cloud09 or: ssh to fed-cloud09; ifdown eth1; ifup eth1;
ifup br-ex
* {blank}
+
mongodb problem? follow::
https://ask.openstack.org/en/question/54015/mongodbpp-error-when-installing-rdo-on-centos-7/?answer=54076#post-id-54076
* `WARNING:keystoneclient.httpclient:Failed to retrieve management_url from token`:
+
....
keystone --os-token $ADMIN_TOKEN --os-endpoint \
https://fedorainfracloud.org:35357/v2.0/ endpoint-create --region 'RegionOne' \
--service 91358b81b1aa40d998b3a28d0cfc86e7 --region 'RegionOne' --publicurl \
'https://fedorainfracloud.org:5000/v2.0' --adminurl 'http://172.24.0.9:35357/v2.0' \
--internalurl 'http://172.24.0.9:5000/v2.0'
....
== Fedora Classroom about our instance
http://meetbot.fedoraproject.org/fedora-classroom/2015-05-11/fedora-classroom.2015-05-11-15.02.log.html

View file

@ -0,0 +1,62 @@
= Collectd SOP
Collectd ( https://collectd.org/ ) is a client/server setup that gathers
system information from clients and allows the server to display that
information over various time periods.
Our server instance runs on log01.phx2.fedoraproject.org and most other
servers run clients that connect to the server and provide it with data.
'''''
[arabic]
. Contact Information
. Collectd info
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Location::
https://admin.fedoraproject.org/collectd/
Servers::
log01 and all/most other servers as clients
Purpose::
provide load and system information on servers.
== Configuration
The collectd roles configure collectd on the various machines:
collectd/base - This is the base client role for most servers.
collectd/server - This is the server for use on log01. collectd/other -
There's various other subroles for different types of clients.
== Web interface
The server web interface is available at:
https://admin.fedoraproject.org/collectd/
== Restarting
collectd runs as a normal systemd or sysvinit service, so you can:
systemctl restart collectd or service collectd restart to restart it.
== Removing old hosts
Collectd keeps information around until it's deleted, so you may need to
sometime go remove data from a host or hosts thats no longer used. To do
this:
[arabic]
. Login to log01
. cd /var/lib/collectd/rrd
. sudo rm -rf oldhostname
== Bug reporting
Collectd is in Fedora/EPEL and we use their packages, so report bugs to
bugzilla.redhat.com.

View file

@ -0,0 +1,76 @@
= Communishift SOP
Communishift is an OpenShift deployment hosted and maintained by Fedora
Infrastructure that is available to the community to host applications.
Fedora Infrastructure does not maintain the applications in Communishift
and is only responsible for the OpenShift deployment itself.
Production instance:
https://console-openshift-console.apps.os.fedorainfracloud.org/
Contents
== Contact information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
nirik
Location::
Phoenix
Servers::
* os-node01.fedorainfracloud.org
* os-node02.fedorainfracloud.org
* os-node03.fedorainfracloud.org
* os-node04.fedorainfracloud.org
* os-node05.fedorainfracloud.org
* os-node06.fedorainfracloud.org
* os-node07.fedorainfracloud.org
* os-node08.fedorainfracloud.org
* os-node09.fedorainfracloud.org
* os-node10.fedorainfracloud.org
* os-node11.fedorainfracloud.org
* virthost-os01.fedorainfracloud.org
* virthost-os02.fedorainfracloud.org
* virthost-os03.fedorainfracloud.org
* virthost-aarch64-os01.fedorainfracloud.org
* virthost-aarch64-os02.fedorainfracloud.org
Purpose::
Allow community members to host services for the Fedora Project.
== Onboarding new users
To allow new users to create projects in Communishift, begin by adding
them to the `communishift` FAS group.
At the time of this writing, there is no automation to sync users from
the `communishift` FAS group to OpenShift, so you will need to log in to
the Communishift instance and grant that user permissions to create
projects. For example, to grant `bowlofeggs` permissions, you would do
this:
....
$ oc adm policy add-cluster-role-to-user self-provisioner bowlofeggs
$ oc create clusterquota for-bowlofeggs --project-annotation-selector openshift.io/requester=bowlofeggs --hard pods=10 --hard persistentvolumeclaims=5
....
This will grant bowlofeggs the ability to provision up to 10 pods and 5
volumes.
== KVM access
We allow applications access to the kvm device so they can run emulation
faster. Anytime the cluster is re-installed, run:
!/bin/bash set -eux if ! oc get --namespace=default ds/device-plugin-kvm
&>/dev/null; then oc create --namespace=default -f
https://raw.githubusercontent.com/kubevirt/kubernetes-device-plugins/master/manifests/kvm-ds.yml
fi
See the
https://github.com/kubevirt/kubernetes-device-plugins/blob/master/docs/README.kvm.md[upstream
docs] as well as the
https://pagure.io/fedora-infrastructure/issue/8208[original request] for
this.

View file

@ -0,0 +1,30 @@
= Compose Tracker SOP
Compose Tracker tracks the pungi composes and creates a ticket in a
pagure repo for the composes are not FINISHED with a tail of the debug
and the koji tasks associated to it.
Compose Tracker: https://pagure.io/releng/compose-tracker Failed
Composes Repo: https://pagure.io/releng/failed-composes
== Contents
[arabic]
. Contact Information
== Contact Information
Owner::
Fedora Release Engineering Team
Contact::
#fedora-releng
Persons::
dustymabe mohanboddu
Purpose::
Track failed composes
== More Information
For information about the tool and deployment on Fedora Infra Openshift
please look at the documetation in
https://pagure.io/releng/compose-tracker/blob/master/f/README.md

View file

@ -0,0 +1,127 @@
= Content Hosting Infrastructure SOP
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, fedora-infrastructure-list
Location::
Phoenix
Servers::
secondary1, netapp[1-3], torrent1
Purpose::
Policy regarding hosting, removal and pruning of content.
Scope::
download.fedora.redhat.com, alt.fedoraproject.org,
archives.fedoraproject.org, secondary.fedoraproject.org,
torrent.fedoraproject.org
== Description
Fedora hosts both Fedora content and some non-Fedora content. Our
resources are finite and as such we have to have some policy around when
to remove old content. This SOP describes the test to remove content.
The spirit of this SOP is to allow more people to host content and give
it a try, prove that it's useful. If it's not popular or useful, it will
get removed. Also out of date or expired content will be removed.
=== What hosting options are available
Aside from the hosting at https://pagure.io/ we have a series of mirrors
we're allowing people to use. They are located at:
* http://archive.fedoraproject.org/pub/archive/ - For archives of
historical Fedora releases
* http://secondary.fedoraproject.org/pub/fedora-secondary/ - For
secondary architectures
* http://alt.fedoraproject.org/pub/alt/ - For misc content / catchall
* http://torrent.fedoraproject.org/ - For torrent hosting
* http://spins.fedoraproject.org/ - For official Fedora Spins hosting,
mirrored somewhat
* http://download.fedoraproject.com/pub/ - For official Fedora Releases,
mirrored widely
=== Who can host? What can be hosted?
Any official Fedora content can hosted and made available for mirroring.
Official content is determined by the Council by virtue of allowing
people to use the Fedora trademark. People representing these teams will
be allowed to host.
=== Non Official Hosting
People wanting to host unofficial bits may request approval for hosting.
Create a ticket at https://pagure.io/fedora-infrastructure/ explaining
what and why Fedora should host it. Such will be reviewed by the Fedora
Infrastructure team.
Requests for non-official hosting that may conflict with existing Fedora
policies will be escalated to the Council for approval.
=== Licensing
Anything hosted with Fedora must come with a Free software license that
is approved by Fedora. See http://fedoraproject.org/wiki/Licensing for
more.
== Requesting Space
* Make sure you have a Fedora account
-https://admin.fedoraproject.org/accounts/
* Ensure you have signed the Fedora Project Contributor Agreement (FPCA)
* Submit a hosting request -https://pagure.io/fedora-infrastructure/
** Include who you are, and any group you are working with (e.g. a SIG)
** Include Space requirements
** Include an estimate of the number of downloads expected (if you can).
** Include the nature of the bits you want to host.
* Apply for group hosted-content
-https://admin.fedoraproject.org/accounts/group/view/hosted-content
== Using Space
A dedicated namespace in the mirror will be assigned to you. It will be
your responsibility to upload content, remove old content, stay within
your quota, etc. If you have any questions or concerns about this please
let us know. Generally you will use rsync. For example:
....
rsync -av --progress ./my.iso secondary01.fedoraproject.org:/srv/pub/alt/mySpace/
....
[IMPORTANT]
.Important
====
None of our mirrored content is backed up. Ensure that you keep backups
of your content.
====
== Content Pruning / Purging / Removal
The following guidelines / tests will be used to determine whether or
not to remove content from the mirror.
=== Expired / Old Content
If content meets any of the following criteria it may be removed:
* Content that has reached the end of life (is no longer receiving
updates).
* Pre-release content that has been superceded.
* EOL releases that have been moved to archives.
* N-2 or greater releases. If more than 3 versions of a piece of content
are on the mirror, the oldest may be removed.
=== Limited Use Content
If content meets any of the following criteria it may be removed:
* Content with exceedingly limited seeders or downloaders, with little
prospect of increasing those numbers and which is older then 1 year.
* Content such as videos or audio which are several years old.
=== Catch All Removal
Fedora reserves the right to remove any content for any reason at any
time. We'll do our best to host things but sometimes we'll need space or
just need to remove stuff for legal or policy reasons.

View file

@ -0,0 +1,417 @@
= Copr
Copr is build system for 3rd party packages.
Frontend:::
* http://copr.fedorainfracloud.org/
Backend:::
* http://copr-be.cloud.fedoraproject.org/
Package signer:::
* copr-keygen.cloud.fedoraproject.org
Dist-git::
* copr-dist-git.fedorainfracloud.org
Devel instances (NO NEED TO CARE ABOUT THEM, JUST THOSE ABOVE):::
* http://copr-fe-dev.cloud.fedoraproject.org/
* http://copr-be-dev.cloud.fedoraproject.org/
* copr-keygen-dev.cloud.fedoraproject.org
* copr-dist-git-dev.fedorainfracloud.org
== Contact Information
Owner::
msuchy (mirek)
Contact::
#fedora-admin, #fedora-buildsys
Location::
Fedora Cloud
Purpose::
Build system
== This document
This document provides a condensed information allowing you to keep Copr
alive and working. For more sofisticated business processes, please see
https://docs.pagure.org/copr.copr/maintenance_documentation.html
== TROUBLESHOOTING
Almost every problem with Copr is due problem with spawning builder VMs,
or with processing action queue on backend.
=== VM spawning/termination problems
Try to restart copr-backend service:
....
$ ssh root@copr-be.cloud.fedoraproject.org
$ systemctl restart copr-backend
....
If this doesn't solve the problem, try to follow logs for some clues:
....
$ tail -f /var/log/copr-backend/{vmm,spawner,terminator}.log
....
As the last resort option, you can terminate all builders and let
copr-backend to throw all information about them. This action will
obviously interrupt all running builds and reschedule them:
....
$ ssh root@copr-be.cloud.fedoraproject.org
$ systemctl stop copr-backend
$ cleanup_vm_nova.py
$ redis-cli
> FLUSHALL
$ systemctl start copr-backend
....
Sometimes OpenStack can not handle spawning too much VMs at the same
time. So it is safer to edit on copr-be.cloud.fedoraproject.org:
....
vi /etc/copr/copr-be.conf
....
and change:
....
group0_max_workers=12
....
to "6". Start copr-backend service and some time later increase it to
original value. Copr automaticaly detect change in script and increase
number of workers.
The set of aarch64 VMs isn't maintained by OpenStack, but by Copr's
backend itself. Steps to diagnose:
....
$ ssh root@copr-be.cloud.fedoraproject.org
[root@copr-be ~][PROD]# systemctl status resalloc
● resalloc.service - Resource allocator server
...
[root@copr-be ~][PROD]# less /var/log/resallocserver/main.log
[root@copr-be ~][PROD]# su - resalloc
[resalloc@copr-be ~][PROD]$ resalloc-maint resource-list
13569 - aarch64_01_prod_00013569_20190613_151319 pool=aarch64_01_prod tags=aarch64 status=UP
13597 - aarch64_01_prod_00013597_20190614_083418 pool=aarch64_01_prod tags=aarch64 status=UP
13594 - aarch64_02_prod_00013594_20190614_082303 pool=aarch64_02_prod tags=aarch64 status=STARTING
...
[resalloc@copr-be ~][PROD]$ resalloc-maint ticket-list
879 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013569_20190613_151319
918 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013608_20190614_135536
904 - state=OPEN tags=aarch64 resource=aarch64_02_prod_00013594_20190614_082303
919 - state=OPEN tags=aarch64
...
....
Be careful when there's some resource in `STARTING` state. If that's so,
check
`/usr/bin/tail -F -n +0 /var/log/resallocserver/hooks/013594_alloc`.
Copr takes tickets from resalloc server; and if the resources fail to
spawn, the ticket numbers are not assigned with appropriately tagged
resource for a long time.
If that happens (it shouldn't) and there's some inconsistency between
resalloc's database and the actual status on aarch64 hypervisors
(`ssh copr@virthost-aarch64-os0{1,2}.fedorainfracloud.org`) - use
`virsh` there to introspect theirs statuses - use
`resalloc-maint resource-delete`, `resalloc ticket-close` or `psql`
commands to fix-up the resalloc's DB.
=== Backend Troubleshoting
Information about status of Copr backend services:
....
systemctl status copr-backend*.service
....
Utilization of workers:
....
ps axf
....
Worker process change $0 to list which task they are working on and on
which builder.
To list which VM builders are tracked by copr-vmm service:
....
/usr/bin/copr_get_vm_info.py
....
=== Appstream builder troubleshoting
Appstream builder is painfully slow when running on a repository with a
huge amount of packages. See
https://github.com/hughsie/appstream-glib/issues/301 . You might need to
disable it for some projects:
....
$ ssh root@copr-be.cloud.fedoraproject.org
$ cd /var/lib/copr/public_html/results/<owner>/<project>/
$ touch .disable-appstream
# You should probably also delete existing appstream data because
# they might be obsolete
$ rm -rf ./appdata
....
=== Backend action queue issues
First check the link:[number of not-yet-processed actions]. If that
number isn't equal to zero, and is not decrementing relatively fast (say
single action takes longer than 30s) -- there might be some problem.
Logs for the action dispatcher can be found in:
....
/var/log/copr-backend/action_dispatcher.log
....
Check if there's no stucked process under `Action dispatch` parent
process in `pstree -a copr` output.
== Deploy information
Using playbooks and rbac:
....
$ sudo rbac-playbook groups/copr-backend.yml
$ sudo rbac-playbook groups/copr-frontend-cloud.yml
$ sudo rbac-playbook groups/copr-keygen.yml
$ sudo rbac-playbook groups/copr-dist-git.yml
....
https://pagure.io/copr/copr/blob/master/f/copr-setup.txt The
[.title-ref]#copr-setup.txt# manual is severely outdated, but there is
no up-to-date alternative. We should extract useful information from it
and put it here in the SOP or into
https://docs.pagure.org/copr.copr/maintenance_documentation.html and
then throw the [.title-ref]#copr-setup.txt# away.
On backend should run copr-backend service (which spawns several
processes). Backend spawns VM from Fedora Cloud. You could not login to
those machines directly. You have to:
....
$ ssh root@copr-be.cloud.fedoraproject.org
$ su - copr
$ copr_get_vm_info.py
# find IP address of the VM that you want
$ ssh root@172.16.3.3
....
Instances can be easily terminated in
https://fedorainfracloud.org/dashboard
=== Order of start up
When reprovision you should start first: copr-keygen and copr-dist-git
machines (in any order). Then you can start copr-be. Well you can start
it sooner, but make sure that copr-* services are stopped.
Copr-fe machine is completly independent and can be start any time. If
backend is stopped it will just queue jobs.
== Logs
=== Backend
* /var/log/copr-backend/action_dispatcher.log
* /var/log/copr-backend/actions.log
* /var/log/copr-backend/backend.log
* /var/log/copr-backend/build_dispatcher.log
* /var/log/copr-backend/logger.log
* /var/log/copr-backend/spawner.log
* /var/log/copr-backend/terminator.log
* /var/log/copr-backend/vmm.log
* /var/log/copr-backend/worker.log
And several logs for non-essential features such as
copr_prune_results.log, hitcounter.log, cleanup_vms.log, that you
shouldn't be worried with.
=== Frontend
* /var/log/copr-frontend/frontend.log
* /var/log/httpd/access_log
* /var/log/httpd/error_log
=== Keygen
* /var/log/copr-keygen/main.log
=== Dist-git
* /var/log/copr-dist-git/main.log
* /var/log/httpd/access_log
* /var/log/httpd/error_log
== Services
=== Backend
* copr-backend
** copr-backend-action
** copr-backend-build
** copr-backend-log
** copr-backend-vmm
* redis
* lighttpd
All the [.title-ref]#copr-backend-*.service# are configured to be a part
of the [.title-ref]#copr-backend.service# so e.g. in case of restarting
all of them, just restart the [.title-ref]#copr-backend.service#.
=== Frontend
* httpd
* postgresql
=== Keygen
* signd
=== Dist-git
* httpd
* copr-dist-git
== PPC64LE Builders
Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with
access to buildsys ssh key can get there using keys as::
msuchy@rh-power2.fit.vutbr.cz
There are commands: $ ls bin/ destroy-all.sh reinit-vm26.sh
reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh
virsh-start-vm26.sh virsh-start-vm28.sh get-one-vm.sh reinit-vm27.sh
reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh
virsh-start-vm27.sh virsh-start-vm29.sh
bin/destroy-all.sh destroy all VM and reinit them reinit-vmXX.sh copy VM
image from template virsh-destroy-vmXX.sh destroys VM
virsh-start-vmXX.sh starts VM get-one-vm.sh start one VM and return its
IP - this is used in Copr playbooks.
In case of big queue of PPC64 tasks simply call bin/destroy-all.sh and
it will destroy stuck VM and copr backend will spawn new VM.
== Ports opened for public
Frontend:
[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving Copr frontend website
|443 |TCP |https |^^
|===
Backend:
[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving build results and repos
|443 |TCP |https |^^
|===
Distgit:
[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving cgit interface
|443 |TCP |https |^^
|===
Keygen:
[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|===
== Resources justification
Copr currently uses the following resources.
=== Frontend
* RAM: 2G (out of 4G) and some swap
* CPU: 2 cores (3400mhz) with load 0.92, 0.68, 0.65
Most of the memory is eaten by PostgreSQL, followed by Apache. The CPU
usage is also mainly used for those two services but in the reversed
order.
I don't think we can settle down with any instance that provides less
than (2G RAM, obviously), but ideally, we need 3G+. 2-core CPU is good
enough.
* Disk space: 17G for system and 8G for [.title-ref]#pgsqldb# directory
If needed, we are able to clean-up the database directory of old dumps
and backups and get down to around 4G disk space.
=== Backend
* RAM: 5G (out of 16G)
* CPU: 8 cores (3400MHz) with load 4.09, 4.55, 4.24
Backend takes care of spinning-up builders and running ansible playbooks
on them, running [.title-ref]#createrepo_c# (on big repositories) and so
on. Copr utilizes two queues, one for builds, which are delegated to
OpenStack builders, and action queue. Actions, however, are processed
directly by the backend, so it can spike our load up. We would ideally
like to have the same computing power that we have now. Maybe we can go
lower than 16G RAM, possibly down to 12G RAM.
* Disk space: 30G for the system, 5.6T (out of 6.8T) for build results
Currently, we have 1.3T of backup data, that is going to be deleted
soon, but nevertheless, we cannot go any lower on storage. Disk space is
a long-term issue for us and we need to do a lot of compromises and
settling down just to survive our daily increase (which is around 10G of
new data). Many features are blocked by not having enough storage. We
cannot go any lower and also we cannot go much longer with the current
storage.
=== Distgit
* RAM: ~270M (out of 4G), but climbs to ~1G when busy
* CPU: 2 cores (3400MHz) with load 1.35, 1.00, 0.53
Personally, I wouldn't downgrade the machine too much. Possibly we can
live with 3G ram, but I wouldn't go any lower.
* Disk space: 7G for system, 1.3T dist-git data
We currently employ a lot of aggressive cleaning strategies on our
distgit data, so we can't go any lower than what we have.
=== Keygen
* RAM: ~150M (out of 2G)
* CPU: 1 core (3400MHz) with load 0.10, 0.31, 0.25
We are basically running just [.title-ref]#signd# and
[.title-ref]#httpd# here, both with minimal resource requirements. The
memory usage is topped by [.title-ref]#systemd-journald#.
* Disk space: 7G for system and ~500M (out of ~700M) for GPG keys
We are slowly pushing the GPG keys storage to its limit, so in the case
of migrating copr-keygen somewhere, we would like to scale-up it to at
least 1G.

View file

@ -0,0 +1,31 @@
= Cyclades
cyclades notes
[arabic]
. login as root - default password is tslinux
. {blank}
+
change password for root and admin to our password from the::
phx2-access.txt file in the private repo
. {blank}
+
port forward to the web browser for the cyclades::
`ssh -L 8080:rack47-serial.phx2.fedoraproject.org:80`
. connect to localhost:8080 in your web browser
. login with root and the password you set above
. click on 'security'
. click on 'moderate'
. {blank}
+
logout, port forward port 443 as above:::
`ssh -L 8080:rack47-serial.phx2.fedoraproject.org:443`
. click on the 'wizard' button at lower left
. proceed through the wizard Info needed:
* serial ports are set to 115200 8N1 by default
* do not setup buffering
* give it the ip of our syslog server
. click 'apply changes'
. hope
. log back in
. name/setup the port aliases

View file

@ -0,0 +1,108 @@
= Darkserver SOP
To setup a http://darkserver.fedoraproject.org based on Darkserver
project to provide GNU_BUILD_ID information for packages. A devel
instance can be seen at http://darkserver01.dev.fedoraproject.org and
staging instance is http://darkserver01.stg.phx2.fedoraproject.org/.
This page describes how to set up the server.
== Contents
[arabic]
. Contact Information
. Installing the server
. Setting up the database
. SELinux Configuration
. Koji plugin setup
. Debugging
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin
Persons:::
kushal mether
Sponsor:::
nirik
Location:::
phx2
Servers:::
darkserver01 , darkserver01.stg, darkserver01.dev
Purpose:::
To host Darkserver
== Installing the Server
....
root@localhost# yum install darkserver
....
== Setting up the database
We are using MySQL as database. We will need two users, one for
koji-plugin and one for darkserver.:
....
root@localhost# mysql -u root
mysql> CREATE DATABASE darkserver;
mysql> GRANT INSERT ON darkserver.* TO kojiplugin@'koji-hub-ip' IDENTIFIED BY 'XXX';
mysql> GRANT SELECT ON darkserver.* TO dark@'darkserver-ip' IDENTIFIED BY 'XXX';
....
Setup this db configuration in the conf file under
`/etc/darkserver/darkserverweb.conf`:
....
[darkserverweb]
host=db host name
user=dark
password=XXX
database=darkserver
....
Now setup the db tables if it is a new install.
(For this you may need to `'GRANT * ON darkserver.*'` to the web user,
and then `'REVOKE * ON darkserver.*'` after running.)
....
root@localhost# python /usr/lib/python2.6/site-packages/darkserverweb/manage.py syncdb
....
== SELinux Configuration
Do the follow to allow the webserver to connect to the database.:
....
root@localhost# setsebool -P httpd_can_network_connect_db 1
....
== Setting up the Koji plugin
Install the package.:
....
root@localhost# yum install darkserver-kojiplugin
....
Then fill up the configuration file under
`/etc/koji-hub/plugins/darkserver.conf`:
....
[darkserver]
host=db host name
user=kojiplugin
password=XXX
database=darkserver
port=3306
....
Then enable the plugin in the koji hub configuration.
== Debugging
Set DEBUG to True in `/etc/darkserver/settings.py` file and restart
Apache.

View file

@ -0,0 +1,235 @@
= Database Infrastructure SOP
Our database servers provide database storage for many of our apps.
Contents
[arabic]
. Contact Information
. Description
. Creating a New Postgresql Database
. Troubleshooting and Resolution
+
____
[arabic]
.. Connection issues
.. Some useful queries
+
____
[arabic]
... What queries are running
... Seeing how "dirty" a table is
... XID Wraparound
____
.. Restart Procedure
+
____
[arabic]
... Koji
... Bodhi
____
____
. Note about TurboGears and MySQL
. Restoring from backups or specific dbs
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-dba group
Location::
Phoenix
Servers::
sb01, db03, db-fas01, db-datanommer02, db-koji01, db-s390-koji01,
db-arm-koji01, db-ppc-koji01, db-qa01, dbqastg01
Purpose::
Provides database connection to many of our apps.
== Description
db01, db03 and db-fas01 are our primmary servers. db01 and db-fas01 run
PostgreSQL. db03 contain mariadb. db-koji01, db-s390-koji01,
db-arm-koji01, db-ppc-koji01 contain secondary kojis. db-qa01 and
db-qastg01 contain resultsdb. db-datanommer02 contains all storage
messages from postgresql database.
== Creating a New Postgresql Database
Creating a new database on our postgresql server isn't hard but there's
several steps that should be taken to make the database server as secure
as possible.
We want to separate the database permissions so that we don't have the
user/password combination that can do anything it likes to the database
on every host (the webapp user can usually do a lot of things even
without those extra permissions but every little bit helps).
Say we have an app called "raffle". We'd have three users:
* raffleadmin: able to make any changes they want to this particular
database. It should not be used in day to day but only for things like
updating the database schema when an update occurs. We could very likely
disable this account in the db whenever we are not using it.
* raffleapp: the database user that the web application uses. This will
likely need to be able to insert and select from all tables. It will
probably need to update most tables as well. There may be some tables
that it does _not_ need delete on. It should almost certainly not need
schema modifying permissions. (With postgres, it likely also needs
permission to insert/select on sequences as well).
* rafflereadonly: Only able to read data from tables, not able to modify
anything. Sadly, we aren't using this often but it can be useful for
scripts that need to talk directly to the database without modifying it.
....
db2 $ sudo -u postgres createuser -P -E NEWDBadmin
Password: <randomly generated password>
db2 $ sudo -u postgres createuser -P -E NEWDBapp
Password: <randomly generated password>
db2 $ sudo -u postgres createuser -P -E NEWDBreadonly
Password: <randomly generated password>
db2 $ sudo -u postgres createdb -E utf8 NEWDB -O NEWDBadmin
db2 $ sudo -u postgres psql NEWDB
NEWDB=# revoke all on database NEWDB from public;
NEWDB=# revoke all on schema public from public;
NEWDB=# grant all on schema public to NEWDBadmin;
NEWDB=# [grant permissions to NEWDBapp as appropriate for your app]
NEWDB=# [grant permissions to NEWDBreadonly as appropriate for a user that
is only trusted enough to read information]
NEWDB=# grant connect on database NEWDB to nagiosuser;
....
If your application needs to have the NEWDBapp and password to connect
to the database, you probably want to add these to ansible as well. Put
the password in the private repo in batcave01. Then use a templatefile
to incorporate it into the config file. See fas.pp for an example.
== Troubleshooting and Resolution
=== Connection issues
There are no known outstanding issues with the database itself. Remember
that every time either database is restarted, services will have to be
restarted (see below).
=== Some useful queries
==== What queries are running
This can help you find out what queries are cuurently running on the
server:
....
select datname, pid, query_start, backend_start, query from
pg_stat_activity where state<>'idle' order by query_start;
....
This can help you find how many connections to the db server are for
each individual database:
....
select datname, count(datname) from pg_stat_activity group by datname
order by count desc;
....
==== Seeing how "dirty" a table is
We've added a function from postgres's contrib directory to tell how
dirty a table is. By dirty we mean, how many tuples are active, how many
have been marked as having old data (and therefore "dead") and how much
free space is allocated to the table but not used.:
....
\c fas2
\x
select * from pgstattuple('visit_identity');
table_len | 425984
tuple_count | 580
tuple_len | 46977
tuple_percent | 11.03
dead_tuple_count | 68
dead_tuple_len | 5508
dead_tuple_percent | 1.29
free_space | 352420
free_percent | 82.73
\x
....
Vacuum should clear out dead_tuples. Only a vacuum full, which will lock
the table and therefore should be avoided, will clear out free space.
==== XID Wraparound
Find out how close we are to having to perform a vacuum of a database
(as opposed to individual tables of the db). We should schedule a vacuum
when about 50% of the transaction ids have been used (approximately
530,000,000 xids):
....
select datname, age(datfrozenxid), pow(2, 31) - age(datfrozenxid) as xids_remaining
from pg_database order by xids_remaining;
....
Information on [61]wraparound
== Restart Procedure
If the database server needs to be restarted it should come back on it's
own. Otherwise each service on it can be restarted:
....
service mysqld restart
service postgresql restart
....
=== Koji
Any time postgreql is restarted, koji needs to be restarted. Please also
see [62]Restarting Koji
=== Bodhi
Anytime postgresql is restarted Bodhi will need to be restarted no sop
currently exists for this.
== TurboGears and MySQL
[NOTE]
.Note
====
about TurboGears and MySQL
There's a known bug in TurboGears that causes MySQL clients not to
automatically reconnect when lost. Typically a restart of the TurboGears
application will correct this issue.
====
== Restoring from backups or specific dbs.
Our backups store the latest copy in /backups/ on each db server. These
backups are created automatically by the db-backup script run fron cron.
Look in /usr/local/bin for the backup script.
To restore partially or completely you need to:
[arabic]
. setup postgres on a system
. {blank}
+
start postgres/run initdb::
* {blank}
+
if this new system running postgres has already run ansible then it
will;;
have wrong config files in /var/lib/pgsql/data - clear them out
before you start postgres so initdb can work.
. {blank}
+
grab the backups you need from /backups - also grab global.sql::
edit up global.sql to only create/alter the dbs you care about
. as postgres run: `psql -U postgres -f global.sql`
. {blank}
+
when this completes you can restore each db with (as postgres user)::::
createdb $dbname pg_restore -d dbname dbname_backup_file.db
. restart postgres and check your data.

View file

@ -0,0 +1,120 @@
= datanommer SOP
Consume fedmsg bus activity and stuff it in a postgresql db.
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc
Servers::
busgateway01
Purpose::
Save fedmsg bus activity
== Description
datanommer is a set of three modules:
python-datanommer-models::
Schema definition and API for storing new items and querying existing
items
python-datanommer-consumer::
A plugin for the fedmsg-hub that actively listens to the bus and
stores events.
datanommer-commands::
A set of CLI tools for querying the DB.
datanommer will one day serve as a backend for future web services like
datagrepper and dataviewer.
Source: https://github.com/fedora-infra/datanommer/ Plan:
https://fedoraproject.org/wiki/User:Ianweller/statistics_plus_plus
== CLI tools
Dump the db into a file as json:
....
$ datanommer-dump > datanommer-dump.json
....
When was the last bodhi message?:
....
$ # It was 678 seconds ago
$ datanommer-latest --category bodhi --timesince
[678]
....
When was the last bodhi message in more readable terms?:
....
$ # It was 12 minutes and 43 seconds ago
$ datanommer-latest --category bodhi --timesince --human
[0:12:43.087949]
....
What was that last bodhi message?:
....
$ datanommer-latest --category bodhi
[{"bodhi": {
"topic": "org.fedoraproject.stg.bodhi.update.comment",
"msg": {
"comment": {
"group": null,
"author": "ralph",
"text": "Testing for latest datanommer.",
"karma": 0,
"anonymous": false,
"timestamp": 1360349639.0,
"update_title": "xmonad-0.10-10.fc17"
},
"agent": "ralph"
},
}}]
....
Show me stats on datanommer messages by topic:
....
$ datanommer-stats --topic
org.fedoraproject.stg.fas.group.member.remove has 10 entries
org.fedoraproject.stg.logger.log has 76 entries
org.fedoraproject.stg.bodhi.update.comment has 5 entries
org.fedoraproject.stg.busmon.colorized-messages has 10 entries
org.fedoraproject.stg.fas.user.update has 10 entries
org.fedoraproject.stg.wiki.article.edit has 106 entries
org.fedoraproject.stg.fas.user.create has 3 entries
org.fedoraproject.stg.bodhitest.testing has 4 entries
org.fedoraproject.stg.fedoratagger.tag.create has 9 entries
org.fedoraproject.stg.fedoratagger.user.rank.update has 5 entries
org.fedoraproject.stg.wiki.upload.complete has 1 entries
org.fedoraproject.stg.fas.group.member.sponsor has 6 entries
org.fedoraproject.stg.fedoratagger.tag.update has 1 entries
org.fedoraproject.stg.fas.group.member.apply has 17 entries
org.fedoraproject.stg.__main__.testing has 1 entries
....
== Upgrading the DB Schema
datanommer uses "python-alembic" to manage its schema. When developers
want to add new columns or features, these should/must be tracked in
alembic and shipped with the RPM.
In order to run upgrades on our stg/prod dbs:
[arabic]
. ssh to busgateway01\{.stg}
. `cd /usr/share/datanommer.models/`
. Run:
+
....
$ alembic upgrade +1
....
____
Over and over again until the db is fully upgraded.
____

View file

@ -0,0 +1,133 @@
= Fedora Debuginfod Service - SOP
Debuginfod is the software that lies behind the service at
https://debuginfod.fedoraproject.org/ and
https://debuginfod.stg.fedoraproject.org/ . These services run on 1 VM
each in the stg and prod infrastructure at IAD2.
== Contact Information
Owner:::
RH perftools team + Fedora Infrastructure Team
Contact:::
@fche in #fedora-noc
Servers:::
VMs
Purpose:::
Serve elf/dwarf/source-code debuginfo for supported releases to
debugger-like tools in Fedora.
Repository:::
https://sourceware.org/elfutils/Debuginfod.html
https://fedoraproject.org/wiki/Debuginfod
== How it works
One virtual machine in prod NFS-mount the koji build system's RPM
repository, read-only. The production VM has a virtual twin in the
staging environment. They each run elfutils debuginfod to index
designated RPMs into a large local sqlite database. They answers HTTP
queries received from users on the Internet via reverse-proxies at the
https://debuginfod.fedoraproject.org/ URL. The reverse proxies apply
gzip compression on the data and provide redirection of the root `/`
location only into the fedora wiki.
Normally, it is autonomous and needs no maintenance. It should come back
nicely after many kinds of outage. The software is based on elfutils in
Fedora, but may occasionally track a custom COPR build with backported
patches from future elfutils versions.
== Configuration
The daemon uses systemd and `/etc/sysconfig/debuginfod` to set basic
parameters. These have been tuned from the distro defaults via
experimental hand-editing or ansible. Key parameters are:
[arabic]
. The -I/-X include/exclude regexes. These tell debuginfod what fedora
versions to include RPMs for. If index disk space starts to run low, one
can eliminate some older fedoras from the index to free up space (after
the next groom cycle).
. The --fdcache related parameters. These tell debuginfod how much data
to cache from RPMs. (Some debuginfo files - kernel, llvm, gtkweb, ...)
are huge and worth retaining instead of repeated extracting.) This is
straight disk space vs. time tradeoff.
. The -t (scan interval) parameter. Scanning lets an index get bigger,
as new RPMs in koji are examined and their contents indexed. Each pass
takes a bunch of hours to traverse the entire koji NFS directory
structure to fstat() everything for newness or change. A smaller scan
interval lets debuginfod react quicker to koji builds coming into
existence, but increases load on the NFS server. More -n (scan threads)
may help the indexing process go faster, if the networking fabric & NFS
server are underloaded.
. The -g (groom interval) parameter. Grooming lets an index get smaller,
as files removed from koji will be forgotten about. It can be run very
intermittently - weekly or less - since it takes many hours and cannot
run concurrently with scanning.
A quick:
....
systemd restart debuginfod
....
activates the new settings.
In case of some drastic failure like database corruption or signs of
penetration/abuse, one can shut down the server with systemd, and/or
stop traffic at the incoming proxy configuration level. The index sqlite
database under `/var/cache/debuginfod` may be deleted, if necessary, but
keep in mind that it takes days to reindex the relevant parts of koji.
Alternately, with the services stopped, the 150GB+ sqlite database files
may be freely copied between the staging and production servers, if that
helps during disaster recovery.
== Monitoring
=== Prometheus
The debuginfod daemons answer the standard /metrics URL endpoint to
serve a variety of operational metrics in prometheus. Important metrics
include:
[arabic]
. filesys_free_ratio - free space on the filesystems. (These are also
monitored via fedora-infra nagios.) If the free space on the database or
tmp partition falls low, further indexing or even service may be
impacted. Add more disk space if possible, or start eliding older fedora
versions from the database via the -I/-X daemon options.
. thread_busy - number of busy threads. During indexing, 1-6 threads may
be busy for minutes or even days, intermittently. User requests show up
as "buildid" (real request) or "buildid-after-you" (deferred duplicate
request) labels. If there are more than a handful of "buildid" ones,
there may be an overload/abuse underway, in which case it's time to
identify the excessive traffic via the logs and get a temporary iptables
block going. Or perhaps there is an outage or slowdown of the koji NFS
storage system, in which case there's not much to do.
. error_count. These should be zero or near zero all the time.
=== Logs
The debuginfod daemons produce voluminous logs into the local systemd
journal, whence the traffic moves to the usual fedora-infra log01
server, `/var/log/hosts/debuginfod*/YYYY/MM/DD/messages.log`. The lines
related to HTTP GET identify the main webapi traffic, with originating
IP addresses in the XFF: field, and response size and elapsed service
time in the last columns. These can be useful in tracking down possible
abuse. :
....
Jun 28 22:36:43 debuginfod01 debuginfod[381551]: [Mon 28 Jun 2021 10:36:43 PM GMT] (381551/2413727): 10.3.163.75:43776 UA:elfutils/0.185,Linux/x86_64,fedora/35 XFF:*elided* GET /buildid/90910c1963bbcf700c0c0c06ee3bf4c5cc831d3a/debuginfo 200 335440 0+0ms
....
The lines related to prometheus /metrics are usually no big deal.
The log also includes info about errors and indexing progress.
Interesting may be the lines like:
....
Jun 28 22:36:43 debuginfod01 debuginfod[381551]: [Mon 28 Jun 2021 10:36:43 PM GMT] (381551/2413727): serving fdcache archive /mnt/fedora_koji_prod/koji/packages/valgrind/3.17.0/3.fc35/x86_64/valgrind-3.17.0-3.fc35.x86_64.rpm file /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so
....
which identify the file names derived from requests (which RPMs the
buildids to). These can provide some indirect distro telemetry: what
packages and binaries are being debugged and for which architectures?

View file

@ -0,0 +1,52 @@
= Denyhosts Infrastructure SOP
Denyhosts provides a protection against brute force attacks.
== Contents
[arabic]
. Contact Information
. Description
. Troubleshooting and Resolution
____
[arabic]
. Connection issues
____
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main group
Location::
Anywhere
Servers::
All
Purpose::
Denyhosts provides a protection against brute force attacks.
== Description
All of our servers now implement denyhosts to protect against brute
force attacks. Very few boxes should be in the 'allowed' list.
Especially internally.
== Troubleshooting and Resolution
=== Connection issues
The most common issue will be legitimate logins failing. First, try to
figure out why a host ended up on the deny list (tcptraceroute, failed
login attempts, etc are all good candidates). Next do the following
directions. The below example is for a host (10.0.0.1) being banned.
Login to the box from a different host and as root do the following.:
....
cd /var/lib/denyhosts
sed -si '/10.0.0.1/d' * /etc/hosts.deny
/etc/init.d/denyhosts restart
....
That should correct the problem.

View file

@ -0,0 +1,63 @@
= Departing admin SOP
From time to time admins depart the project, this SOP checks any access
they may no longer need.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
Everywhere
Servers::
all
== Description
From time to time people with admin access to various parts of the
project may leave the project or no longer wish to contribute. This SOP
attempts to list the process for removing access they no longer need.
[arabic, start=0]
. First, make sure that this SOP is needed. Verify the person has left
the project and what areas they might wish to still contibute to.
. Gather info: fas username, email address, knowledge of passwords.
. Check the following areas with the following commands:
+
____
email address in ansible::
* Check: `git grep email@address`
* Remove: `git commit`
koji admin::
* Check: `koji list-permissions --user=username`
* Remove: `koji revoke-permission permissionname username`
wiki pages::
* Check: look for https://fedoraproject.org/wiki/User:Username
* Remove: delete page, or modify with info they are no longer
contributing.
packages::
* Check: Download
https://admin.fedoraproject.org/pkgdb/lists/bugzilla?tg_format=plain
and grep
* Remove: remove from cc, orphan packages or reassign.
fas account::
* Check: check username in fas
* Remove: set user inactive
+
[NOTE]
.Note
====
If there are scripts or files needed, save homedir of user.
====
passwords::
* Check: if departing admin knew sensitive passwords.
* Remove: Change passwords.
+
[NOTE]
.Note
====
root pw, management interfaces, etc
====
____

View file

@ -0,0 +1,358 @@
= DNS repository for fedoraproject
We've set this up so we can easily (and quickly) edit and deploy dns
changes with a record of who changed what and why. This system also lets
us edit out proxies from rotation for our many and varied websites
quickly and with a minimum of opportunity for error. Finally, it checks
to make sure that all of the zone changes will actually work before they
are allowed.
== DNS Infrastructure SOP
We have 5 DNS servers:
ns02.fedoraproject.org::
hosted at ibiblio (ipv6 enabled)
ns05.fedoraproject.org::
hosted at internetx (ipv6 enabled)
ns13.rdu2.fedoraproject.org::
in rdu2, internal to rdu2.
ns01.iad2.fedoraproject.org::
in iad2, internal to iad2.
ns02.iad2.fedoraproject.org::
in iad2, internal to iad2.
== Contents
[arabic]
. Contact Information
. Troubleshooting, Resolution and Maintenance
____
[arabic]
. DNS update
. Adding a new zone
____
[arabic, start=3]
. GeoDNS
____
[arabic]
. Non geodns fedoraproject.org IPs
. Adding and removing countries
. IP Country Mapping
____
[arabic, start=4]
. resolv.conf
____
[arabic]
. Phoenix
. Non-Phoenix
____
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main, sysadmin-dns
Location:::
ServerBeach and ibiblio and internetx and phx2.
Servers:::
ns02, ns05, ns13.rdu2, ns01.iad2, ns02.iad2
Purpose:::
Provides DNS to our users
Troubleshooting, Resolution and Maintenance
== Check out the DNS repository
You can get the dns repository from `/srv/git/dns` on `batcave01`:
....
$ git clone /srv/git/dns
....
== Adding a new Host
Adding a new host requires to add it to DNS and to ansible, see
new-hosts.rst for the details.
== Editing the domain(s)
We have three domains which needs to be able to change on demand for
proxy rotation/removal:
* fedoraproject.org.
* getfedora.org.
* cloud.fedoraproject.org.
The other domains are edited only when we add/subtract a host or move it
to a new ip. Not much else.
If you need to edit a domain that is NOT In the above list:
* change to the 'master' subdir, edit the domain as usual (remember to
update the serial), save it.
If you need to edit one of the domains in the above list: (replace
fedoraproject.org with the domain from above)
* if you need to add/change a host in fedoraproject.org that is not '@'
or 'wildcard' then:
** edit fedoraproject.org.template
** make your changes
** {blank}
+
do not edit the serial or anything surrounded by \{\{ }} unless you::
REALLY know what you are doing.
* {blank}
+
if you need to only add/remove a proxy during an outage or due to::
networking issue then run:
- `./zone-template fedoraproject.org.cfg disable ip [ip] [ip]`::
to disable the ip of the proxy you want removed.
- `./zone-template fedoraproject.org.cfg enable ip [ip] [ip]`::
reverses the disable
- `./zone-template fedoraproject.org.cfg reset`::
will reset to all ips enabled.
* if you want to add an all new proxy as '@' or 'wildcard' for
fedoraproject.org:
** edit fedoraproject.org.cfg
** add the ip to the correct section of the ipv4 or ipv6 in the config.
** save the file
** check the file for validity by running:
`python fedoraproject.org.cfg` looking for errors or tracebacks.
When complete run:
____
git add . git commit -a -m 'description of your change here'
____
It is important to commit this before running the do-domains script as
it makes it easier to track the changes.
In all cases then run:
* `./do-domains`
* if that completes successfully then run:
+
....
git add .
git commit -a -m 'description of your change here'
git push
....
* nameservers update from dns via cron every 10minutes.
The above git process can be achieved with the below bash function where
the commit message is passed as an arg when running.:
....
dnscommit()
{
local args=$1
cd ~/dns;
git commit -a -m "${args}"
git pull --rebase && ./do-domains && git add built && git commit -a -m "Signed DNS" && git push
}
....
If you need an update to be live more quickly:
and then run this on all of the nameservers (as root):
....
/usr/local/bin/update-dns
....
To run this via ansible from batcave do:
....
$ sudo rbac-playbook update_dns.yml
....
this will pull from the git tree, update all of the zones and reload the
name server.
== DNS update
DNS config files are ansible managed on batcave01.
From your local machine run:
....
git clone ssh://git@pagure.io/fedora-infra/ansible.git
cd ansible/roles/dns/files/
...make changes needed...
git commit -m "What you did"
git push
....
It should update within a half hour. You can test the new configs with
dig:
....
dig @ns01.fedoraproject.org fedoraproject.org
....
== Adding a new zone
First name the zone and generate new set of keys for it. Run this on
ns01. Note it could take SEVERAL minutes to run:
....
/usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE c.fedoraproject.org
/usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK c.fedoraproject.org
....
Then copy the created .key and .private files to the private git repo
(You need to be sysadmin-main to do this). The directory is
`private/private/dnssec`.
* add the zone in zones.conf in `ansible/roles/dns/files/zones.conf`
* save and commit - but do not push
* Add zone file to the master subdir in this repo
* git add and commit the file
* check the zone by running check-domains
* if you intend to have this be a dnssec signed zone then you must
** create a new key:
+
....
/usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE $domain.org
/usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK $domain.org
....
*** {blank}
+
put the files this generates into /srv/privatekeys/dnssec on batcave01::
**** edit the do-domains file in this dir and your domain to the
signed_domains entry at the top
**** edit the zone you just created and add the contents of the .key
files to the bottom of the zone
If this is a subdomain of fedoraproject.org:
* run dnssec-dsfromkey on each of the .key files generated
* paste that output into the bottom of fedoraproject.org.template
* commit everything to the dns tree
* push your changes
* push your changes to the ansible repo
* test
If you add a new child zone, such as c.fedoraproject.org or
vpn.fedoraproject.org you will also need to add the contents of
dsset-childzone.fedoraproject.org (for example), to the main
fedoraproject.org zonefile, so that DNSSEC has a valid trust path to
that zone.
You also must set the NS delegation entries near the top of
fedoraproject.org zone file these are necessary to keep dnssec-signzone
from whining with this error msg:
....
dnssec-signzone: fatal: 'xxxxx.example.com': found DS RRset without NS RRset
....
Look for the: "vpn IN NS" records at the top of fedoraproject.org and
copy them for the new child zone.
== GeoDNS
As part of our Content Distribution Network we use geodns for certain
zones. At the moment just `fedoraproject.org` and `*.fedoraproject.org`
zones. We've got proxy servers all over the US and in Europe. We are now
sending users to proxy servers that are near them. The current list of
available 'zone areas' are:
* DEFAULT
* EU
* NA
DEFAULT contains all the zones. So someone who does not seem to be in or
near the EU, or NA would get directed to any random set. (South Africa
for example doesn't get directed to any particular server).
[IMPORTANT]
.Important
====
Don't forget to increase the serial number in the fedoraproject.org zone
file. Even if you're making a change to one of the geodns IPs. There is
only one serial number for all setups and that serial number is in the
fedoraproject.org zone.
====
[NOTE]
.Note
====
Non geodns fedoraproject.org IPs If you're adding as server that is just
in one location, and isn't going to get geodns balanced. Just add that
host to the fedoraproject.org zone.
====
=== Adding and removing countries
Our setup actually requires us to specify which countries go to which
servers. To do this, simply edit the named.conf file in ansible. Below
is an example of what counts as "NA" (North America).:
....
view "NA" {
match-clients { US; CA; MX; };
recursion no;
zone "fedoraproject.org" {
type master;
file "master/NA/fedoraproject.org.signed";
};
include "etc/zones.conf";
};
....
=== IP Country Mapping
The IP -> Location mapping is done via a config file that exists on the
dns servers themselves (it's not ansible controlled). The file, located
at `/var/named/chroot/etc/GeoIP.acl` is generated by the `GeoIP.sh`
script (that script is in ansible).
[WARNING]
.Warning
====
This is known to be a less efficient means of doing geodns than the
patched version from kernel.org. We're using this version at the moment
because it's in Fedora and works. The level of DNS traffic we see is
generally low enough that the inefficiencies aren't that noticed. For
example, average load on the servers before this geodns was .2, now it's
around .4
====
== resolv.conf
In order to make the network more transparent to the admins, we do a lot
of search based relative names. Below is a list of what a resolv.conf
should look like.
[IMPORTANT]
.Important
====
Any machine that is not on our vpn or has not yet joined the vpn should
_link:[NOT] have the vpn.fedoraproject.org search until after it has
been added to the vpn (if it ever does)
====
Phoenix::
....
search phx2.fedoraproject.org vpn.fedoraproject.org fedoraproject.org
....
Phoenix in the QA network:::
....
search qa.fedoraproject.org vpn.fedoraproject.org phx2.fedoraproject.org fedoraproject.org
....
Non-Phoenix::
....
search vpn.fedoraproject.org fedoraproject.org
....
The idea here is that we can, when need be, setup local domains to
contact instead of having to go over the VPN directly but still have
sane configs. For example if we tell the proxy server to hit "app1" and
that box is in PHX, it will go directly to app1, if its not, it will go
over the vpn to app1.

View file

@ -0,0 +1,62 @@
= docs SOP
____
Fedora Documentation - Documentation for installing and using Fedora
____
== Contact Information
Owner:::
docs, Fedora Infrastrcture Team
Contact:::
#fedora-docs
Servers:::
proxy*
Purpose:::
Provide documentation for users and contributors.
== Description:
The Fedora Documentation Project was created to provide documentation
for fedora users and contributors. It's like "The Bible" for using
Fedora and other software used by the Fedora Project. It uses Publican,
a free and open-source publishing tool. Publican generates html pages
from content in DocBook XML format. The source files are in a git repo
and publican builds html files from these source files whenever changes
are made. As these are static pages these are available on all the proxy
servers which serve our requests for docs.fedoraproject.org.
== Updates process:
The fedora docs writers update and build their docs and then push the
completed output into a git repo. This git repo is then pulled by each
of the Fedora proxies and served as static content.
Note that docs is talking about setting up a new process, this SOP needs
updating when that happens.
== Reporting bugs:
Bugs can be reported at the Fedora Documentation's Bugzilla. Here's the
link:
https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora%20Documentation
Errors or problems in the wiki can be modified by anyone with a FAS
account.
== Contributing to the Fedora Documentation Project:
If you find the existing documentation insufficient or outdated or any
particular page is not available in your language feel free to improve
the documentation by contributing to Fedora Documentation Project. You
can find more details here:
https://fedoraproject.org/wiki/Join_the_Docs_Project
Translation of documentation is taken care by the Fedora Localization
Project aka L10N. More details can be found at:
https://fedoraproject.org/wiki/L10N
== Publican wiki:
More details about Publican can be found at the publican wiki here:
https://sourceware.org/publican/en-US/index.html

View file

@ -0,0 +1,157 @@
= Fedora Account System
Notes about FAS and how to do things in it:
* where are certs for fas accounts for koji, etc? on fas01
/var/lib/fedora-ca - makefile targets allow you to do things with them.
look in index.txt for certs. One's marked with an 'R' in the left-most
column are 'REVOKED'
to revoke a cert:
....
cd /var/lib/fedora-ca
....
find the cert number in index.txt - the number is the 3rd column in the
file - you can match it to the user by searching for their username. You
want the highest number cert for their account.
once you have the number you would run (as root or fas):
....
make revoke cert=newcerts/$that_number.pem
....
== How to gather information about a user
You'll want to have direct access to query the database for this. The
common way is to have someone in sysadmin-db ssh to the postgres db
hosting FAS (currently db01). Then access it via ident auth on the box:
....
sudo -u postgres psql fas2
....
There are several tables that will have information about a user. Some
of it is redundant but it's good to check all the sources there
shouldn't be inconsistencies:
....
select * from people where username = 'USERNAME';
....
Of interest here are:
id::
for later queries
password_changed::
tells when the password was last changed
last_seen::
last login to fas (including through jsonfas from other TG1/2 apps.
Maybe wiki and insight as well. Not fedorahosted trac, shell login,
etc)
status_change::
last time that the user's status was updated via the website. Usually
triggered when the user was marked inactive for a mass password change
and then they reset their password.
Next table is the log table:
....
select * from log where author_id = ID_FROM_PREV_QUERY or description ~ '.*USERNAME.*';
....
The FAS writes certain events to the log table. This will get those
events. We use both the author_id field (who made the change) and the
username in a description regex search because a few changes are made to
users by admins. Fields of interest are pretty self explanatory here:
changetime::
when the log was made
description::
description of the event that's being logged
[NOTE]
.Note
====
FAS does not log every event that happens to a user. Only "important"
ones. FAS also cannot record direct changes to the database here (for
instance, when we mark accounts inactive administratively via the db).
====
Lastly, there's the groups and person_roles table. When a user joins
a group, the person_roles table is updated to reflect the user's status
in the group, when they applied, and when they were approved:
....
select groups.name, person_roles.* from person_roles, groups where person_id = ID_FROM_INITIAL_QUERY and groups.id = person_roles.group_id;
....
This will give you the following fields to pay attention to:
name::
Name of the group
role_status::
If this is unapproved, it just means the user applied for it. If it is
approved, it means they are actually in the group.
creation::
When the user applied to the group
approval::
When the user was approved to be in the group
role_type::
What role the person has or wants to have in the group
sponsor_id::
If you suspect something is suspicious with one of the roles, you may
want to ask the sponsor if they remember sponsoring this person
== Account Deletion and renaming
[NOTE]
.Note
====
see also accountdeletion.rst For information on how to disable, rename,
and remove accounts.
====
== Pseudo Users
[NOTE]
.Note
====
see also nonhumanaccounts.rst For information on creating pseudo user
accounts for use in pkgdb/bugzilla
====
== fas staging
we have a staging fas db setup on db-fas01.stg.phx2.fedoraproject.org -
it accessed by fas01.stg.phx2.fedoraproject.org
This system is not autopopulated by production fas - it must be done
manually. To do this you must:
* dump the fas2 db on db-fas01.phx2.fedoraproject.org:
+
....
sudo -u postgres pg_dump -C fas2 > fas2.dump
scp fas2.dump db-fas01.stg.phx2.fedoraproject.org:/tmp
....
* then on fas01.stg.phx2.fedoraproject.org:
+
....
/etc/init.d/httpd stop
....
* then on db02.stg.phx2.fedoraproject.org:
+
....
echo "drop database fas2\;" | sudo -u postgres psql ; cat fas2.dump | sudo -u postgres psql
....
* then on fas01.stg.phx2.fedoraproject.org:
+
....
/etc/init.d/httpd start
....
that should do it.

View file

@ -0,0 +1,42 @@
= FAS-OpenID
FAS-OpenID is the OpenID server of Fedora infrastructure.
Live instance is at https://id.fedoraproject.org/ Staging instance is at
https://id.dev.fedoraproject.org/
== Contact Information
Owner::
Patrick Uiterwijk (puiterwijk)
Contact::
#fedora-admin, #fedora-apps, #fedora-noc
Location::
openid0\{1,2}.phx2.fedoraproject.org openid01.stg.fedoraproject.org
Purpose::
Authentication & Authorization
== Trusted roots
FAS-OpenID has a set of "trusted roots", which contains websites which
are always trusted, and thus FAS-OpenID will not show the Approve/Reject
form to the user when they login to any such site.
As a policy, we will only add websites to this list which Fedora
Infrastructure controls. If anyone ever ask to add a website to this
list, just answer with this default message:
....
We only add websites we (Fedora Infrastructure) maintain to this list.
This feature was put in because it wouldn't make sense to ask for permission
to send data to the same set of servers that it already came from.
Also, if we were to add external websites, we would need to judge their
privacy policy etc.
Also, people might start complaining that we added site X but not their site,
maybe causing us "political" issues later down the road.
As a result, we do NOT add external websites.
....

View file

@ -0,0 +1,181 @@
= fedmsg (Fedora Messaging) Certs, Keys, and CA - SOP
X509 certs, private RSA keys, Certificate Authority, and Certificate
Revocation List.
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-admin, #fedora-apps, #fedora-noc
Servers::
* app0[1-7]
* packages0[1-2]
* fas0[1-3]
* pkgs01
* busgateway01,
* value0\{1,3}
* releng0\{1,4}
* relepel03
Purpose::
Certify fedmsg messages come from authentic sources.
== Description
fedmsg sends JSON-encoded messages from many services to a zeromq
messaging bus. We're not concerned with encrypting the messages, only
with signing them so an attacker cannot spoof.
Every instance of each service on each host has its own cert and private
key, signed by the CA. By convention, we name the certs
<service>-<fqdn>.\{crt,key} For instance, bodhi has the following certs:
* bodhi-app01.phx2.fedoraproject.org
* bodhi-app02.phx2.fedoraproject.org
* bodhi-app03.phx2.fedoraproject.org
* bodhi-app01.stg.phx2.fedoraproject.org
* bodhi-app02.stg.phx2.fedoraproject.org
* more
Scripts to generate new keys, sign them, and revoke them live in the
ansible repo in `ansible/roles/fedmsg/files/cert-tools/`. The keys and
certs themselves (including ca.crt and the CRL) live in the private repo
in `private/fedmsg-certs/keys/`
fedmsg is locally configured to find the key it needs by looking in
`/etc/fedmsg.d/ssl.py` which is kept in ansible in
`ansible/roles/fedmsg/templates/fedmsg.d/ssl.py.erb`.
Each service-host has its own key. This means:
* A key is not shared across multiple instances of a service on
different machines. i.e., bodhi on app01 and bodhi on app02 should have
different key/cert pairs.
* A key is not shared across multiple services on a host. i.e.,
mediawiki on app01 and bodhi on app01 should have different key/cert
pairs.
The attempt here is to minimize the number of potential attack vectors.
Each private key should be readable only by the service that needs it.
bodhi runs under mod_wsgi in apache and should run as its own unique
bodhi user (not as apache). The permissions for
its.phx2.fedoraproject.org private_key, when deployed by ansible, should
be read-only for that local bodhi user.
For more information on how fedmsg uses these certs see
http://fedmsg.readthedocs.org/en/latest/crypto.html
== Configuring the Scripts
Usage of the main scripts is described in more detail below. They are
located in `ansible/rolesfedmsg/files/cert-tools`.
Before you use them, you'll need to point them at the right directory to
modify. By default, this is `~/private/fedmsg-certs/keys/`. You can
change that by editing `ansible/roles/fedmsg/files/cert-tools/vars` in
the event that you have the private repo checked out to an alternate
location.
There are other configuration values defined in that script. Most will
not need to be changed.
== Wiping and Rebuilding Everything
There is a script in `ansible/roles/fedmsg/files/cert-tools/` named
`rebuild-all-fedmsg-certs`. You can run it with no arguments to wipe out
the old and generate a new CA root certificate, a signing cert and key,
and all key/cert pairs for all service-hosts.
[NOTE]
.Note
====
Warning -- Obviously, this will wipe everything. Do you want that?
====
== Adding a new key for a new service-host
First, checkout the ansible private repo as that's where the keys are
going to be stored. The scripts will assume this is checked out to
~/private.
In `ansible/roles/fedmsg/files/cert-tools` run:
....
$ source ./vars
$ ./build-and-sign-key <service>-<fqdn>
....
For instance, if we bring up a new app host,
app10.phx2.fedoraproject.org, we'll need to generate a new cert/key pair
for each fedmsg-enabled service that will be running on it, so you'd
run:
....
$ source ./vars
$ ./build-and-sign-key shell-app10.phx2.fedoraproject.org
$ ./build-and-sign-key bodhi-app10.phx2.fedoraproject.org
$ ./build-and-sign-key mediawiki-app10.phx2.fedoraproject.org
....
Just creating the keys isn't quite enough, there are four more things
you'll need to do.
The private keys are created in your checkout of the private repo under
~/private/private/fedmsg-certs/keys . There will be four files for each
cert you created: <hexdigits>.pem (ex: 5B.pem) and
<service>-<fqdn>.\{crt,csr,key} git add, commit, and push all of those.
Second, You need to edit
`ansible/roles/fedmsg/files/cert-tools/rebuild-all-fedmsg-certs` and add
the argument of the commands you just ran, so that next time certs need
to be blown away and recreated, the new service-hosts will be included.
For the examples above, you would need to add to the list:
....
shell-app10.phx2.fedoraproject.org
bodhi-app10.phx2.fedoraproject.org
mediawiki-app10.phx2.fedoraproject.org
....
You need to ensure that the keys are distributed to the host with the
proper permissions. Only the bodhi user should be able to access bodhi's
private key. This can be accomplished by using the `fedmsg::certificate`
in ansible. It should distribute your new keys to the correct hosts and
correctly permission them.
Lastly, if you haven't already updated the global fedmsg config, you'll
need to. You need to add your new service-node to `fedmsg.d/endpoint.py`
and to `fedmsg.d/ssl.py`. Those can be found in
`ansible/roles/fedmsg/templates/fedmsg.d`. See
http://fedmsg.readthedocs.org/en/latest/config.html for more information
on the layout and meaning of those files.
== Revoking a key
In `ansible/roles/fedmsg/files/cert-tools` run:
....
$ source ./vars
$ ./revoke-full <service>-<fqdn>
....
This will alter `private/fedmsg-certs/keys/crl.pem` which should be
picked up and served publicly, and then consumed by all fedmsg consumers
globally.
`crl.pem` is publicly available at
http://fedoraproject.org/fedmsg/crl.pem
[NOTE]
.Note
====
Even though crl.pem lives in the private repo, we're just keeping it
there for convenience. It really _should_ be served publicly, so don't
panic. :)
====
[NOTE]
.Note
====
At the time of this writing, the CRL is not actually used. I need one
publicly available first so we can test it out.
====

View file

@ -0,0 +1,106 @@
= fedmsg-gateway SOP
Outgoing raw ZeroMQ message stream.
[NOTE]
.Note
====
see also: fedmsg-websocket
====
== Contact Information
Owner:::
Messaging SIG, Fedora Infrastructure Team
Contact:::
#fedora-apps, #fedora-admin, #fedora-noc
Servers:::
busgateway01, proxy0*
Purpose:::
Expose raw ZeroMQ messages outside the FI environment.
== Description
Users outside of Fedora Infrastructure can listen to the production
message bus by connecting to specific addresses. This is required for
local users to run their own hubs and message processors ("Consumers").
It is also required for user-facing tools like fedmsg-notify to work.
The specific public endpoints are:
production::
tcp://hub.fedoraproject.org:9940
staging::
tcp://stg.fedoraproject.org:9940
fedmsg-gateway, the daemon running on busgateway01, is listening to the
FI production fedmsg bus and will relay every message that it receives
out to a special ZMQ pub endpoint bound to port 9940. haproxy mediates
connections to the fedmsg-gateway daemon.
== Connection Flow
Clients connect through haproxy on proxy0*:9940 are redirected to
busgateway0*:9940. This can be found in the haproxy.cfg entry for
`listen fedmsg-raw-zmq 0.0.0.0:9940`.
This is different than the apache reverse proxy pass setup we have for
the app0* and packages0* machines. _That_ flow looks something like
this:
....
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
....
The flow for the raw zmq stream provided by fedmsg-gateway looks
something like this:
....
Client -> haproxy(proxy01) -> fedmsg-gateway(busgateway01)
....
haproxy is listening on a public port.
At the time of this writing, haproxy does not actually load balance
zeromq session requests across multiple busgateway0* machines, but there
is nothing stopping us from adding them. New hosts can be added in
ansible and pressed from busgateway01's template. Add them to the
fedmsg-raw-zmq listen in haproxy's config and it should Just Work.
== Increasing the Maximum Number of Concurrent Connections
HTTP requests are typically very short (a few seconds at most). This
means that the number of concurrent tcp connections we require for most
of our services is quite low (1024 is overkill). ZeroMQ tcp connections,
on the other hand, are expected to live for quite a long time.
Consequently we needed to scale up the number of possible concurrent tcp
connections.
All of this is in ansible and should be handled for us automatically if
we bring up new nodes.
* The pam_limits user limit for the fedmsg user was increased from 1024
to 160000 on busgateway01.
* The pam_limits user limit for the haproxy user was increased from 1024
to 160000 on the proxy0* machines.
* The zeromq High Water Mark (HWM) was increased to 160000 on
busgateway01.
* The maximum number of connections allowed was increased in
haproxy.cfg.
== Nagios
New nagios checks were added for this that check to see if the number of
concurrent connections through haproxy is approaching the maximum number
allowed.
You can check these numbers by hand by inspecting the haproxy web
interface: https://admin.fedoraproject.org/haproxy/proxy1#fedmsg-raw-zmq
Look at the "Sessions" section. "Cur" is the current number of sessions
versus "Max", the maximum number seen at the same time and "Limit", the
maximum number of concurrent connections allowed.
== RHIT
We had RHIT open up port 9940 special to proxy01.phx2 for this.

View file

@ -0,0 +1,57 @@
= fedmsg introduction and basics, SOP
General information about fedmsg
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
Almost all of them.
Purpose::
Introduce sysadmins to fedmsg tools and config
== Description
fedmsg is a system that links together most of our webapps and services
into a message mesh or net (often called a "bus"). It is built on top of
the zeromq messaging library.
fedmsg has its own developer documentation that is a good place to check
if this or other SOPs don't provide enough information -
http://fedmsg.rtfd.org
== Tools
Generally, fedmsg-tail and fedmsg-logger are the two most commonly used
tools for debugging and testing. To see if bus-connectivity exists
between two machines, log onto each of them and run the following on the
first:
....
$ echo testing from $(hostname) | fedmsg-logger
....
And run the following on the second:
....
$ fedmsg-tail --really-pretty
....
== Configuration
fedmsg configuration lives in /etc/fedmsg.d/
`/etc/fedmsg.d/endpoints.py` keeps the list of every possible fedmsg
endpoint. It acts as a global index that defines the bus.
See fedmsg.readthedocs.org/en/latest/config/ for a full glossary of
configuration values.
== Logs
fedmsg daemons keep their logs in /var/log/fedmsg. fedmsg message hooks
in existing apps (like bodhi) will log any errors to the logs of the app
they've been added to (like /var/log/httpd/error_log).

View file

@ -0,0 +1,29 @@
= fedmsg-irc SOP
____
Echo fedmsg bus activity to IRC.
____
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc
Servers::
value03
Purpose::
Echo fedmsg bus activity to IRC
== Description
fedmsg-irc is a daemon running on value03 and value01.stg. It is
listening to the fedmsg bus and echoing that activity to the
#fedora-fedmsg channel in IRC.
It can be configured to ignore certain messages, join certain rooms, and
take on a different nick by editing the values in `/etc/fedmsg.d/irc.py`
and restarting it with `sudo service fedmsg-irc restart`
See http://fedmsg.readthedocs.org/en/latest/config/#term-irc for more
information on configuration.

View file

@ -0,0 +1,73 @@
= Adding a new fedmsg message type
== Instrumenting the program
First, figure out how you're going to publish the message? Is it from a
shell script or from a long running process?
If its from shell script, you need to just add a
[.title-ref]#fedmsg-logger# statement to the script. Remember to set the
[.title-ref]#--modname# and [.title-ref]#--topic# for your new message's
fully-qualified topic.
If its from a python process, you need to just add a
`fedmsg.publish(..)` call. The same concerns about modname and topic
apply here.
If this is a short-lived python process, you'll want to add
[.title-ref]#active=True# to the call to `fedmsg.publish(..)`. This will
make the fedmsg lib "actively" reach out to our fedmsg-relay running on
busgateway01.
If it is a long-running python process (like a WSGI thread), then you
don't need to pass any extra arguments. You don't want it to reach out
to the fedmsg-relay if possible. Your process will require that some
"endpoints" are created for it in `/etc/fedmsg.d/`. More on that below.
== Supporting infrastructure
You need to make sure that the machine this is running on has a cert and
key that can be read by the program to sign its message. If you don't
have a cert already, then you need to create it in the private repo. Ask
a sysadmin-main member.
Then you need to declare those certs in the [.title-ref]#fedmsg_certs#
data structure stored typically in our ansible `group_vars/` for this
service. Declare both the name of the cert, what group and user it
should be owned by, and in the `can_send:` section, declare the list of
topics that this cert should be allowed to publish.
If this is a long-running python process that is _not_ passing
[.title-ref]#active=True# to the call to
[.title-ref]#fedmsg.publish(..)#, then you have to also declare
endpoints for it. You do that by specifying the `fedmsg_wsgi_procs` and
`fedmsg_wsgi_vars` in the `group_vars` for your service. The iptables
rules and fedmsg endpoints should be automatically created for you on
the next playbook run.
== Supporting code
At this point, you can push the change out to production and be
publishing messages "okay". Everything should be fine.
However, your message will show up blank in datagrepper, in IRC, and in
FMN, and everywhere else we try to render it. You _must_ then follow up
and write a new [.title-ref]#Processor# for it in the fedmsg_meta
library we maintain:
https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure
You also _must_ write a test case for it there. The docs listing all
topics we publish at http://fedora-fedmsg.rtfd.org/ is automatically
generated from the test suite. Please don't forget this.
Lastly, you should cut a release of fedmsg_meta and deploy it using the
[.title-ref]#playbooks/manual/upgrade/fedmsg.yml# playbook, which should
update all the relevant hosts.
== Corner cases
If the process publishing the new message lives _outside_ our main
network, you have to jump through more hoops. Look at abrt, koschei, and
copr for examples of how to configure this (you need a special firewall
rule, and they need to be configured to talk to our "inbound gateway"
running on the proxies.

View file

@ -0,0 +1,58 @@
= fedmsg-relay SOP
Bridge ephemeral scripts into the fedmsg bus.
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
app01
Purpose::
Bridge ephemeral bash and python scripts into the fedmsg bus.
== Description
fedmsg-relay is running on app01, which is a bad choice. We should look
to move it to a more isolated place in the future. busgateway01 would be
a better choice.
"Ephemeral" scripts like `pkgdb2branch.py`, the post-receive git hook on
pkgs01, and anywhere fedmsg-logger is used all depend on fedmsg-relay.
Instead of emitting messages "directly" to the rest of the bus, they use
fedmsg-relay as an intermediary.
Check that fedmsg-relay is running by looking for it in the process
list. You can restart it in the standard way with
`sudo service fedmsg-relay restart`. Check for its logs in
`/var/log/fedmsg/fedmsg-relay.log`
Ephemeral scripts know where the fedmsg-relay is by looking for the
relay_inbound and relay_outbound values in the global fedmsg config.
== But What is it Doing? And Why?
The fedmsg bus is designed to be "passive" in its normal operation. A
mod_wsgi process under httpd sets up its fedmsg publisher socket to
passively emit messages on a certain port. When some other service wants
to receive these messages, it is up to that service to know where
mod_wsgi is emitting and to actively connect there. In this way,
emitting is passive and listening is active.
We get a problem when we have a one-off or "ephemeral" script that is
not a long-running process -- a script like pkgdb2branch which is run
when a user runs it and which ends shortly after. Listeners who want
these scripts messages will find that they are usually not available
when they try to connect.
To solve this problem, we introduced the "fedmsg-relay" daemon which is
a kind of "passive"-to-"passive" adaptor. It binds to an outbound port
on one end where it will publish messages (like normal) but it also
binds to an another port where it listens passively for inbound
messages. Ephemeral scripts then actively connect to the passive inbound
port of the fedmsg-relay to have their payloads echoed on the
bus-proper.
See http://fedmsg.readthedocs.org/en/latest/topology/ for a diagram.

View file

@ -0,0 +1,70 @@
= websocket SOP
websocket communication with Fedora apps.
see-also: `fedmsg-gateway.txt`
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
busgateway01, proxy0*, app0*
Purpose::
Expose a websocket server for FI apps to use
== Description
WebSocket is a protocol (an extension of HTTP/1.1) by which client web
browsers can establish full-duplex socket communications with a server
--the "real-time web".
In our case, webapps served from app0* and packages0* will include
javascript code instructing client browsers to establish a second
connection to our WebSocket server. They point browsers to the following
addresses:
production::
wss://hub.fedoraproject.org:9939
staging::
wss://stg.fedoraproject.org:9939
The websocket server itself is a fedmsg-hub daemon running on
busgateway01. It is configured to enable its websocket server component
in the presence of certain configuration values.
haproxy mediates connections to the fedmsg-hub websocket server daemon.
An stunnel daemon provides SSL support.
== Connection Flow
The connection flow is much the same as in the fedmsg-gateway.txt SOP,
but is somewhat more complicated.
"Normal" HTTP requests to our app servers traverse the following chain:
....
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
....
The flow for a websocket requests looks something like this:
....
Client -> stunnel(proxy01) -> haproxy(proxy01) -> fedmsg-hub(busgateway01)
....
stunnel is listening on a public port, negotiates the SSL connection,
and redirects the connection to haproxy who in turn hands it off to the
fedmsg-hub websocket server listening on busgateway01.
At the time of this writing, haproxy does not actually load balance
zeromq session requests across multiple busgateway0* machines, but there
is nothing stopping us from adding them. New hosts can be added in
ansible and pressed from busgateway01's template. Add them to the
fedmsg-websockets listen in haproxy's config and it should Just Work.
== RHIT
We had RHIT open up port 9939 special to proxy01.phx2 for this.

View file

@ -0,0 +1,34 @@
= Fedocal SOP
Fedocal is a web-based group calender application that is made available
to the various groups with in the Fedora project.
== Contents
[arabic]
. Contact Information
. Documentation Links
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Location::
https://apps.fedoraproject.org/calendar
Servers
Purpose::
To provide links to the documentation for fedocal, as it exists
elsewhere on the internet and it was decided that a link document
would be a better use of resources than to rewrite the book.
== Documentation Links
For information on the latest and greatest in fedocal please review:
http://fedocal.readthedocs.org/en/latest/
For documentation on the usage of fedocal please consult:
http://fedocal.readthedocs.org/en/latest/usage.html

View file

@ -0,0 +1,360 @@
= Fedora Release Infrastructure SOP
This SOP contains all of the steps required by the Fedora Infrastructure
team in order to get a release out. Much of this work overlaps with the
Release Engineering team (and at present share many of the same
members). Some work may get done by releng, some may get done by
Infrastructure, as long as it gets done, it doesn't matter.
== Contact Information
Owner:::
Fedora Infrastructure Team, Fedora Release Engineering Team
Contact:::
#fedora-admin, #fedora-releng, sysadmin-main, sysadmin-releng
Location:::
N/A
Servers:::
All
Purpose:::
Releasing a new version of Fedora
== Preparations
Before a release ships, the following items need to be completed.
[arabic]
. New website from the websites team (typically hosted at
http://getfedora.org/_/)
. Verify mirror space (for all test releases as well)
. Verify with rel-eng permissions on content are right on the mirrors.
Don't leak.
. {blank}
+
Communication with Red Hat IS (Give at least 2 months notice, then::
reminders as the time comes near) (final release only)
. Infrastructure change freeze
. Modify Template:FedoraVersion to reference new version. (Final release
only)
. Move old releases to archive (post final release only)
. {blank}
+
Switch release from development/N to normal releases/N/ tree in mirror::
manager (post final release only)
== Change Freeze
The rules are simple:
* Hosts with the ansible variable "freezes" "True" are frozen.
* You may make changes as normal on hosts that are not frozen. (For
example, staging is never frozen)
* Changes to frozen hosts requires a freeze break request sent to the
fedora infrastructure list, containing a description of the problem or
issue, actions to be taken and (if possible) patches to ansible that
will be applied. These freeze breaks must then get two approvals from
sysadmin-main or sysadmin-releng group members before being applied.
* Changes to recover from outages are acceptable to frozen hosts if
needed.
Change freezes will be sent to the fedora-infrastructure-list and begin
3 weeks before each release and the final release. The freeze will end
one day after the release. Note, if the release slips during a change
freeze, the freeze just extends until the day after a release ships.
You can get a list of frozen/non-frozen hosts by:
....
git clone https://pagure.io/fedora-infra/ansible.git
scripts/freezelist -i inventory
....
== Notes about release day
Release day is always an interesting and unique event. After the final
sprint from test to the final release a lot of the developers will be
looking forward to a bit of time away, as well as some sleep. Once
Release Engineering has built the final tree, and synced it to the
mirrors it is our job to make sure everything else (except the bit flip)
gets done as painlessly and easily as possible.
[NOTE]
.Note
====
All communication is typically done in #fedora-admin. Typically these
channels are laid back and staying on topic isn't strictly enforced. On
release day this is not true. We encourage people to come, stay in the
room and be quiet unless they have a specific task or question releated
to release day. Its nothing personal, but release day can get out of
hand quick.
====
During normal load, our websites function as normal. This is
especially true since we've moved the wiki to mod_fcgi. On release day
our load spikes a great deal. During the Fedora 6 launch many services
were offline for hours. Some (like the docs) were off for days. A large
part of this outage was due to the wiki not being able to handle the
load, part was a lack of planning by the Infrastructure team, and part
is still a mystery. There are questions as to whether or not all of the
traffic was legit or a ddos.
The Fedora 7 release went much better. Some services were offline for
minutes at a time but very little of it was out longer then that. The
wiki crashed, as it always does. We had made sure to make the
fedoraproject.org landing page static though. This helped a great deal
though we did see load on the proxy boxes as spiky.
Recent releases have been quite smooth due to a number of changes: we
have a good deal more bandwith on master mirrors, more cpus and memory,
as well as prerelease versions are much easier to come by for those
interested before release day.
== Day Prior to Release Day
=== Step 1 (Torrent)
Setup the torrent. All files can be synced with the torrent box but just
not published to the world. Verify with sha1sum. Follow the instructions
on the torrentrelease.txt sop up to and including step 4.
=== Step 2 (Website)
Verify the website design / content has been finalized with the websites
team. Update the Fedora version number wiki template if this is a final
release. It will need to be changed in
https://fedoraproject.org/wiki/Template:CurrentFedoraVersion
Additionally, there are redirects in the ansible
playbooks/include/proxies-redirects.yml file for Cloud Images. These
should be pushed as soon as the content is available. See:
https://pagure.io/fedora-infrastructure/issue/3866 for example
=== Step 3 (Mirrors)
Verify enough mirrors are setup and have Fedora ready for release. If
for some reason something is broken it needs to be fixed. Many of the
mirrors are running a check-in script. This lets us know who has Fedora
without having to scan everyone. Hide the Alpha, Beta, and Preview
releases from the publiclist page.
You can check this by looking at:
....
wget "http://mirrors.fedoraproject.org/mirrorlist?path=pub/fedora/linux/releases/test/28-Beta&country=global"
(replace 28 and Beta with the version and release.)
....
== Release day
=== Step 1 (Prep and wait)
Verify the mirrors are ready and that the torrent has valid copies of
its files (use sha1sum)
Do not move on to step two until the Release Engineering team has given
the ok for the release. It is the releng team's decision as to whether
or not we release and they may pull the plug at any moment.
=== Step 2 (Torrent)
Once given the ok to release, the Infrastructure team should publish the
torrent and encourage people to seed. Complete the steps on the
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/torrentrelease.html
after step 4.
=== Step 3 (Bit flip)
The mirrors sit and wait for a single permissions bit to be altered so
that they show up to their services. The bit flip (done by the releng
team) will replicate out to the mirrors. Verify that the mirrors have
received the change by seeing if it is actually available, just use a
spot check. Once that is complete move on.
=== Step 4 (Website)
Once all of the distribution pieces are verified (mirrors and torrent),
all that is left is to publish the website. At present this is done by
making sure the master branch of fedora-web is pulled by the
syncStatic.sh script in ansible. It will sync in an hour normally but on
release day people don't like to wait that long so do the following on
sundries01
____
sudo -u apache /usr/local/bin/lock-wrapper syncStatic 'sh -x
/usr/local/bin/syncStatic'
____
Once that completes, on batcave01:
....
sudo -i ansible proxy\* "/usr/bin/rsync --delete -a --no-owner --no-group bapp02::getfedora.org/ /srv/web/getfedora.org/"
....
Verify http://getfedora.org/ is working.
=== Step 5 (Docs)
Just as with the website, the docs site needs to be published. Just as
above follow the following steps:
....
/root/bin/docs-sync
....
=== Step 6 (Monitor)
Once the website is live, keep an eye on various news sites for the
release announcement. Closely watch the load on all of the boxes, proxy,
application and otherwise. If something is getting overloaded, see
suggestions on this page in the "Juggling Resources" section.
=== Step 7 (Badges) (final release only)
We have some badge rules that are dependent on which release of Fedora
we're on. As you have time, please performs the following on your local
box:
....
$ git clone ssh://git@pagure.io/fedora-badges.git
$ cd badges
....
Edit `rules/tester-it-still-works.yml` and update the release tag to
match the now old but stable release. For instance, if we just released
fc21, then the tag in that badge rule should be fc20.
Edit `rules/tester-you-can-pry-it-from-my-cold-dead-hands.yml` and
update the release tag to match the release that is about to reach EOL.
For instance, if we just released f28, then the tag in that badge rule
should be f26. Commit the changes:
....
$ git commit -a -m 'Updated tester badge rule for f28 release.'
$ git push origin master
....
Then, on batcave, perform the following:
....
$ sudo -i ansible-playbook $(pwd)/playbooks/manual/push-badges.yml
....
=== Step 8 (Done)
Just chill, keep an eye on everything and make changes as needed. If you
can't keep a service up, try to redirect randomly to some of the
mirrors.
== Priorities
Priorities of during release day (In order):
[arabic]
. {blank}
+
Website::
Anything related to a user landing at fedoraproject.org, and clicking
through to a mirror or torrent to download something must be kept up.
This is distribution, and without it we can potentially lose many
users.
. {blank}
+
Linked addresses::
We do not have direct control over what Hacker News, Phoronix or
anyone else links to. If they link to something on the wiki and it is
going down or link to any other site we control a rewrite should be
put in place to direct them to http://fedoraproject.org/get-fedora.
. {blank}
+
Torrent::
The torrent server has never had problems during a release. Make sure
it is up.
. {blank}
+
Release Notes::
Typically grouped with the docs site, the release notes are often
linked to (this is fine, no need to redirect) but keep an eye on the
logs and ensure that where we've said the release notes are, that they
can be found there. In previous releases we sometimes had to make this
available in more than one spot.
. {blank}
+
docs.fedoraproject.org::
People will want to see whats new in Fedora and get further
documentation about it. Much of this is in the release notes.
. {blank}
+
wiki::
Because it is so resource heavy, and because it is so developer
oriented we have no choice but to give the wiki a lower priority.
. Everything else.
== Juggling Resources
In our environment we're running different things on many different
servers. Using Xen we can easily give machines more or less ram,
processors. We can take down builders and bring up application servers.
The trick is to be smart and make sure you understand what is causing
the problem. These are some tips to keep in mind:
* IPTables based bandwidth and connection limiting (successful in the
past)
* Altering the weight on the proxy balancers
* Create static pages out of otherwise dynamic content
* Redirect pages to a mirror
* Add a server / remove un-needed servers
== CHECKLISTS:
=== Beta:
* Announce infrastructure freeze 3 weeks before Beta
* Change /topic in #fedora-admin
* mail infrastucture list a reminder.
* File all tickets
* new website
* check mirror permissions, mirrormanager, check mirror sizes, release
day ticket.
After release is a "go":
* Make sure torrents are setup and ready to go.
* fedora-web needs a branch for fN-beta. In it:
* Beta used on get-prerelease
* get-prerelease doesn't direct to release
* verify is updated with Beta info
* releases.txt gets a branched entry for preupgrade
* bfo gets updated to have a Beta entry.
After release:
* Update /topic in #fedora-admin
* post to infrastructure list that freeze is over.
=== Final:
* Announce infrastructure freeze 2 weeks before Final
* Change /topic in #fedora-admin
* mail infrastucture list a reminder.
* File all tickets
* new website, check mirror permissions, mirrormanager, check
* mirror sizes, release day ticket.
After release is a "go":
* Make sure torrents are setup and ready to go.
* fedora-web needs a branch for fN-alpha. In it:
* get-prerelease does direct to release
* verify is updated with Final info
* bfo gets updated to have a Final entry.
* update wiki version numbers and names.
After release:
* Update /topic in #fedora-admin
* post to infrastructure list that freeze is over.
* Move MirrorManager repository tags from the development/$version/
Directory objects, to the releases/$version/ Directory objects. This is
done using the `move-devel-to-release --version=$version` command on
bapp02. This is usually done now a week or two after release.

View file

@ -0,0 +1,108 @@
= Fedora Packages SOP
This SOP is for the Fedora Packages web application.
https://apps.fedoraproject.org/packages
== Contents
[arabic]
. Contact Information
. Deploying to the servers
. Maintenance
. Checking for AGPL violations
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, #fedora-apps
Persons::
cverna
Location::
PHX2
Servers::
packages03.phx2.fedoraproject.org packages04.phx2.fedoraproject.org
packages03.stg.phx2.fedoraproject.org
Purpose::
Web interface for searching packages information
== Deploying to the servers
=== Deploying
Once the new version is built, it needs to be deployed. To deploy the
new version, you need
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/sshaccess.html[ssh
access] to batcave01.phx2.fedoraproject.org and
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/ansible.html[permissions
to run the Ansible playbook].
All the following commands should be run from batcave01.
You can check the upstream documentation, on how to build a new release.
This process results in a fedora-packages rpm available in the infra-tag
rpm repo.
You should make use of the staging instance in order to test the new
version of the application.
=== Upgrading
To upgrade, run the upgrade playbook:
....
$ sudo rbac-playbook manual/upgrade/packages.yml
....
This will upgrade the fedora-pacages package and restart the Apache web
server and fedmsg-hub service.
=== Rebuild the xapian Database
If you need to rebuild the xapian database then you can run the
following playbook:
....
$ sudo rbac-playbook manual/rebuild/fedora-packages.yml
....
== Maintenance
The web application is served by httpd and managed by the httpd
service.:
....
$ sudo systemctl restart httpd
....
can be used to restart the service if needed. The application log files
are available under [.title-ref]#/var/log/httpd/# directory.
The xapian database is updated by a fedmsg consumer. You can restart the
fedmsg-hub serivce if needed by using:
....
$ sudo systemctl restart fedmsg-hub
....
To check the consumer logs you can use:
....
$ sudo journalctl -u fedmsg-hub
....
== Checking for AGPL violations
To remain AGPL compliant, we must ensure that all modifications to the
code are made available in the SRPM that we link to in the footer of the
application. You can easily query our app servers to determine if any
AGPL violating code modifications have been made to the package.:
....
func-command --host="*app*" --host="community*" "rpm -V fedoracommunity"
....
You can safely ignore any changes to non-code files in the output. If
any violations are found, the Infrastructure Team should be notified
immediately.

View file

@ -0,0 +1,82 @@
= Fedora Pastebin SOP
[arabic]
. Contact Information
. Introduction
. Installation
. Dashboard
. Add a word to censored list
== 1. Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
athmane herlo
Sponsor::
nirik
Location::
phx2
Servers::
paste01.stg, paste01.dev
Purpose::
To host Fedora Pastebin
== 2. Introduction
Fedora pastebin is powered by sticky-notes which is included in EPEL.
Fedora theming (skin) is included in ansible role.
== 3. Installation
Sticky-notes needs a MySQL db and a user with 'select, update, delete,
insert' privileges.
It's recommended to dump and import db from a working installation to
save time (skipping the installation and tweaking).
By default the installation is locked ie: you can't relaunch it.
However, you can unlock the installation by commenting the line
containing `$gsod->trigger` in `/etc/sticky-notes/install.php` then
pointing the web browser to '/install'
The configuration file containing general settings and DB credentials is
located in `/etc/sticky-notes/config.php`
== 4. Dashboard
Sticky-notes has a dashboard (URL: /admin/) that can be used to :
* {blank}
+
Manage pastes:::
** deleting paste
** getting information about the paste author (IP/Date/time etc...)
* Manage users (aka admins) which can log into the dashboard
* Manage IP Bans (add / delete banned IPs).
* Authentication (not needed)
* {blank}
+
Site configuration:::
** General configuration (included in config.php).
** Project Honey Pot configuration (not a FOSS service)
** Word censor configuration: a list of words to be censored in
pastes.
== 5. Add a word to censored list
If a word is in censored list, any paste containing that word will be
rejected, to add one, edit the variable '$sg_censor' in sticky-notes
configuration file.:
....
$sg_censor = "WORD1
WORD2
...
...
WORDn";
....

View file

@ -0,0 +1,303 @@
= Websites Release SOP
____
* {blank}
[arabic]
. Preparing the website for a release
** 1.1 Obsolete GPG key of the EOL Fedora release
** 1.2 Update GPG key
*** 1.2.1 Steps
* {blank}
[arabic, start=2]
. Update website
** 2.1 For Alpha
** 2.2 For Beta
** 2.3 For GA
* {blank}
[arabic, start=3]
. Fire in the hole
* {blank}
[arabic, start=4]
. Tips
** 4.1 Merging branches
[arabic]
. Preparing the website for a new release cycle
____
1.1 Obsolete GPG key
One month after a Fedora release the release number 'FXX-2' (i.e. 1
month after F21 release, F19 will be EOL) will be EOL (End of Life). At
this point we should drop the GPG key from the list in verify/ and move
the keys to the obsolete keys page in keys/obsolete.html.
1.2 Update GPG key
After another couple of weeks and as the next release approaches, watch
the fedora-release package for a new key to be added. Use the
update-gpg-keys script in the fedora-web git repository to add it to
static/. Manually add it to /keys and /verify in all websites where we
use these keys:
____
* arm.fpo
* getfedora.org
* labs.fpo
* spins.fpo
____
1.2.1 Steps
[loweralpha]
. Get a copy of the new key(s) from the fedora-release repo, you will
find FXX-primary and FXX-secondary keys. Save them in ./tools to make
the update easier.
+
https://pagure.io/fedora-repos
. Start by editing ./tools/update-gpg-keys and adding the key-ids of any
obsolete keys to the obsolete_keys list.
. Then run that script to add the new key(s) to the fedora.gpg block:
+
fedora-web git:(master) cd tools/ tools git:(master) ./update-gpg-keys
RPM-GPG-KEY-fedora-23-primary tools git:(master) ./update-gpg-keys
RPM-GPG-KEY-fedora-23-secondary
+
This will add the key(s) to the keyblock in static/fedora.gpg and create
a text file for the key in static/$KEYID.txt as well. Verify that these
files have been created properly and contain all the keys that they
should.
* Handy checks: gpg static/fedora.gpg or gpg static/$KEYID.txt
* Adding "--with-fingerprint" option will add the fingerprint to the
output
+
The output of fedora.gpg should contain only the actual keys, not the
obsolete keys. The single text files should contain the correct
information for the uploaded key.
. Next, add new key(s) to the list in data/verify.html and move the new
key informations in the keys page in data/content/keys/index.html. A
script to aid in generating the HTML code for new keys is in
./tools/make-gpg-key-html. It will print HTML to stdout for each
RPM-GPG-KEY-* file given as arguments. This is suitable for copy/paste
(or directly importing if your editor supports this). Check the copied
HTML code and select if the key info is for a primary or secondary key
(output says 'Primary or Secondary').
+
tools git:(master) ./make-gpg-key-html RPM-GPG-KEY-fedora-23-primary
+
Build the website with 'make en test' and carefully verify that the data
is correct. Please double check all keys in
http://localhost:5000/en/keys and http://localhost:5000/en/verify.
+
NOTE: the tool will give you an outdated output, adapt it to the new
websites and bootstrap layout!
____
____
[arabic, start=2]
. Update website
____
2.1 For Alpha
____
[loweralpha]
. Create the fXX-alpha branch from master fedora-web git:(master) git
push origin master:refs/heads/f22-alpha
+
and checkout to the new branch: fedora-web git:(master) git checkout -t
-b f13-alpha origin/f13-alpha
. Update the global variables Change curr_state to Alpha for all arches
. Add Alpha banner Upload the FXX-Alpha banner to
static/images/banners/f22alpha.png which should appear in every
$\{PRODUCT}/download/index.html page. Make sure the banner is shown in
all sidebars, also in labs, spins, and arm.
. Check all Download links and paths in
$\{PRODUCT}/prerelease/index.html You can find all paths in bapp01 (sudo
su - mirrormanager first) or you can look at the downlaod page
http://dl.fedoraproject.org/pub/alt/stage
. Add CHECKSUM files to static/checksums and verify that the paths are
correct. The files should be in sundries01 and you can query them with:
$ find /pub/fedora/linux/releases/test/17-Alpha/ -type f -name
_CHECKSUM_ -exec cp '\{}' . ; Remember to add the right checksums to the
right websites (same path).
. Add EC2 AMI IDs for Alpha. All IDs now are in the globalvar.py file.
We get all data from there, even the redirect path to trac the AMI IDs.
We now also have a script which is useful to get all the AMI IDs
uploaded with fedimg. Execute it to get the latest uploads, but don't
run the script too early, as new builds are added constantly. fedora-web
git:(fXX-alpha) python ~/fedora-web/tools/get_ami.py
. Add CHECKSUM files also to http://spins.fedoraproject.org in
static/checksums. Verify the paths are correct in
data/content/verify.html. (see point e) to query them on sundries01).
Same for labs.fpo and arm.fpo.
. Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo.
. Update Alpha Image sizes and pre_cloud_composedate in
./build.d/globalvar.py. Verify they are right in Cloud images and Docker
image.
. Update the new POT files and push them to Zanata (ask a maintainer to
do so) every time you change text strings.
. Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to
test the pages online.
____
. Release Date:
____
* Merge the fXX-alpha branch to master and correct conflicts manually
* Remove the redirect of prerelease pages in ansible, edit:
* ansible/playbooks/include/proxies-redirects.yml
* ask a sysadmin-main to run playbook
* When ready and about 90 minutes before Release Time push to master
* Tag the commit as new release and push it too: $ git tag -a FXX-Alpha
-m 'Releasing Fedora XX Alpha' $ git push --tags
* If needed follow "Fire in the hole" below.
____
2.2 For Beta
____
[loweralpha]
. Create the fXX-beta branch from master fedora-web git:(master) git
push origin master:refs/heads/f22-beta
+
and checkout to the new branch: fedora-web git:(master) git checkout -t
-b f22-beta origin/f22-beta
. Update the global variables Change curr_state to Beta for all arches
. Add Alpha banner Upload the FXX-Beta banner to
static/images/banners/f22beta.png which should appear in every
$\{PRODUCT}/download/index.html page. Make sure the banner is shown in
all sidebars, also in labs, spins, and arm.
. Check all Download links and paths in
$\{PRODUCT}/prerelease/index.html You can find all paths in bapp01 (sudo
su - mirrormanager first) or you can look at the downlaod page
http://dl.fedoraproject.org/pub/alt/stage
. Add CHECKSUM files to static/checksums and verify that the paths are
correct. The files should be in sundries and you can query them with: $
find /pub/fedora/linux/releases/test/17-Beta/ -type f -name _CHECKSUM_
-exec cp '\{}' . ; Remember to add the right checksums to the right
websites (same path).
. Add EC2 AMI IDs for Beta. All IDs now are in the globalvar.py file. We
get all data from there, even the redirect path to trac the AMI IDs. We
now also have a script which is useful to get all the AMI IDs uploaded
with fedimg. Execute it to get the latest uploads, but don't run the
script too early, as new builds are added constantly. fedora-web
git:(fXX-beta) python ~/fedora-web/tools/get_ami.py
. Add CHECKSUM files also to http://spins.fedoraproject.org in
static/checksums. Verify the paths are correct in
data/content/verify.html. (see point e) to query them on sundries01).
Same for labs.fpo and arm.fpo.
. Remove static/checksums/Fedora-XX-Alpha* in all websites.
. Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo.
. Update Beta Image sizes and pre_cloud_composedate in
./build.d/globalvar.py. Verify they are right in Cloud images and Docker
image.
. Update the new POT files and push them to Zanata (ask a maintainer to
do so) every time you change text strings.
. Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to
test the pages online.
____
. Release Date:
____
* Merge the fXX-beta branch to master and correct conflicts manually
* When ready and about 90 minutes before Release Time push to master
* Tag the commit as new release and push it too: $ git tag -a FXX-Beta
-m 'Releasing Fedora XX Beta' $ git push --tags
* If needed follow "Fire in the hole" below.
____
2.3 For GA
____
[loweralpha]
. Create the fXX branch from master fedora-web git:(master) git push
origin master:refs/heads/f22
+
and checkout to the new branch: fedora-web git:(master) git checkout -t
-b f22 origin/f22
. Update the global variables Change curr_state for all arches
. Check all Download links and paths in $\{PRODUCT}/download/index.html
You can find all paths in bapp01 (sudo su - mirrormanager first) or you
can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage
. Add CHECKSUM files to static/checksums and verify that the paths are
correct. The files should be in sundries01 and you can query them with:
$ find /pub/fedora/linux/releases/17/ -type f -name _CHECKSUM_ -exec cp
'\{}' . ; Remember to add the right checksums to the right websites
(same path).
. At some point freeze translations. Add an empty PO_FREEZE file to
every website's directory you want to freeze.
. Add EC2 AMI IDs for GA. All IDs now are in the globalvar.py file. We
get all data from there, even the redirect path to trac the AMI IDs. We
now also have a script which is useful to get all the AMI IDs uploaded
with fedimg. Execute it to get the latest uploads, but don't run the
script too early, as new builds are added constantly. fedora-web
git:(fXX) python ~/fedora-web/tools/get_ami.py
. Add CHECKSUM files also to http://spins.fedoraproject.org in
static/checksums. Verify the paths are correct in
data/content/verify.html. (see point e) to query them on sundries01).
Same for labs.fpo and arm.fpo.
. Remove static/checksums/Fedora-XX-Beta* in all websites.
. Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo.
. Update GA Image sizes and cloud_composedate in ./build.d/globalvar.py.
Verify they are right in Cloud images and Docker image.
. Update static/js/checksum.js and check if the paths and checksum still
match.
. Update the new POT files and push them to Zanata (ask a maintainer to
do so) every time you change text strings.
. Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to
test the pages online.
____
. Release Date:
____
* Merge the fXX-beta branch to master and correct conflicts manually
* Add the redirect of prerelease pages in ansible, edit:
* ansible/playbooks/include/proxies-redirects.yml
* ask a sysadmin-main to run playbook
* Unfreeze translations by deleting the PO_FREEZE files
* When ready and about 90 minutes before Release Time push to master
* Update the short links for the Cloud Images for 'Fedora XX', 'Fedora
XX-1' and 'Latest'
* Tag the commit as new release and push it too: $ git tag -a FXX -m
'Releasing Fedora XX' $ git push --tags
* If needed follow "Fire in the hole" below.
____
____
[arabic, start=3]
. Fire in the hole
____
We now use ansible for everything, and normally use a regular build to
make the websites live. If something is not happening as expected, you
should get in contact with a sysadmin-main to run the ansible playbook
again.
All our stuff, such as SyncStatic.sh and SyncTranslation.sh scripts are
now also in ansible!
Staging server app02 and production server bapp01 do not exist anymore,
now our staging websites are on sundries01.stg and the production on
sundries01. Change your scripts accordingly and as sysadmin-web you
should have access to those servers as before.
____
____
[arabic, start=4]
. Tips
____
4.1 Merging branches
Suggested by Ricky This can be useful if you're _sure_ all new changes
on devel branch should go into the master branch. Conflicts will be
solved directly accepting only the changes in the devel branch. If
you're not 100% sure do a normal merge and fix conflicts manually!
$ git merge f22-beta $ git checkout --theirs f22-beta [list of
conflicting po files] $ git commit

View file

@ -0,0 +1,204 @@
= FedMsg Notifications (FMN) SOP
Route individualized notifications to fedora contributors over email,
irc.
== Contact Information
=== Owner
* Messaging SIG
* Fedora Infrastructure Team
=== Contact
____
* #fedora-apps for FMN development
* #fedora-fedmsg for an IRC feed of all fedmsgs
* #fedora-admin for problems with the deployment of FMN
* #fedora-noc for outage/crisis alerts
____
=== Servers
Production servers:
____
* notifs-backend01.phx2.fedoraproject.org (RHEL 7)
* notifs-web01.phx2.fedoraproject.org (RHEL 7)
* notifs-web02.phx2.fedoraproject.org (RHEL 7)
____
Staging servers:
____
* notifs-backend01.stg.phx2.fedoraproject.org (RHEL 7)
* notifs-web01.stg.phx2.fedoraproject.org (RHEL 7)
* notifs-web02.stg.phx2.fedoraproject.org (RHEL 7)
____
=== Purpose
Route notifications to users
== Description
fmn is a pair of systems intended to route fedmsg notifications to
Fedora contributors and users.
There is a web interface running on notifs-web01 and notifs-web02 that
allows users to login and configure their preferences to select this or
that type of message.
There is a backend running on notifs-backend01 where most of the work is
done.
The backend process is a 'fedmsg-hub' daemon, controlled by systemd.
== Hosts
=== notifs-backend
This host runs:
* `fedmsg-hub.service`
* One or more `fmn-worker@.service`. Currently notifs-backend01 runs
`fmn-worker@\{1-4}.service`
* `fmn-backend@1.service`
* `fmn-digests@1.service`
* `rabbitmq-server.service`, an AMQP broker used to communicate between
the services.
* `redis.service`, used for caching.
This host relies on a PostgreSQL database running on
db01.phx2.fedoraproject.org.
=== notifs-web
This host runs:
* A Python WSGI application via Apache httpd that serves the
https://apps.fedoraproject.org/notifications%3E[FMN web user interface].
This host relies on a PostgreSQL database running on
db01.phx2.fedoraproject.org.
== Deployment
Once upstream releases a new version of
https://github.com/fedora-infra/fmn[fmn],
https://github.com/fedora-infra/fmn.web[fmn-web], or
https://github.com/fedora-infra/fmn.sse[fmn-sse] creating a Git tag, a
new version can be built an deployed into Fedora infrastructure.
=== Building
FMN is packaged in Fedora and EPEL as
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn/[python-fmn]
(the backend),
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn-web/[python-fmn-web]
(the frontend), and the optional
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn-sse/[python-fmn-sse].
Since all the hosts run RHEL 7, you need to build all these packages for
EPEL 7.
=== Configuration
If there are any configuration updates required by the new version of
FMN, update the `notifs` Ansible roles on
batcave01.phx2.fedoraproject.org. Remember to use:
....
{% if env == 'staging' %}
<new config here>
{% else %}
<retain old config>
{% endif %}
....
When deploying the update to staging. You can apply configuration
updates to staging by running:
....
$ sudo rbac-playbook -l staging groups/notifs-backend.yml
$ sudo rbac-playbook -l staging groups/notifs-web.yml
....
Simply drop the `-l staging` to update the production configuration.
=== Upgrading
To upgrade the
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn/[python-fmn],
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn-web/[python-fmn-web],
and
https://admin.fedoraproject.org/pkgdb/package/rpms/python-fmn-sse/[python-fmn-sse]
packages, apply configuration changes, and restart the services, you
should use the manual upgrade playbook:
....
$ sudo rbac-playbook -l staging manual/upgrade/fmn.yml
....
Again, drop the `-l staging` flag to upgrade production.
Be aware that the FMN services take a significant amount of time to
start up as they pre-heat their caches before starting work.
== Service Administration
Disable an account (on notifs-backend01):
....
$ sudo -u fedmsg /usr/local/bin/fmn-disable-account USERNAME
....
Restart:
....
$ sudo systemctl restart fedmsg-hub
....
Watch logs:
....
$ sudo journalctl -u fedmsg-hub -f
....
Configuration:
....
$ ls /etc/fedmsg.d/
$ sudo fedmsg-config | less
....
Monitor performance:
....
http://threebean.org/fedmsg-health-day.html#FMN
....
Upgrade (from batcave):
....
$ sudo -i ansible-playbook /srv/web/infra/ansible/playbooks/manual/upgrade/fmn.yml
....
== Mailing Lists
We use FMN as a way to forward certain kinds of messages to mailing
lists so people can read them the good old fashioned way that they like
to. To accomplish this, we create 'bot' FAS accounts with their own FMN
profiles and we set their email addresses to the lists in question.
If you need to change the way some set of messages are forwarded, you
can do it from the FMN web interface (if you are an FMN admin as defined
in the config file in roles/notifs/frontend/). You can navigate to
https://apps.fedoraproject.org/notifications/USERNAME.id.fedoraproject.org
to do this.
If the account exists as a FAS user already (for instance, the
`virtmaint` user) but it does not yet exist in FMN, you can add it to
the FMN database by logging in to notifs-backend01 and running
`fmn-create-user --email DESTINATION@EMAIL.COM --create-defaults FAS_USERNAME`.

View file

@ -0,0 +1,100 @@
= FPDC SOP
Fedora Product Definition Center is a service that aims to replace
https://pdc.fedoraproject.org/[PDC] in Fedora. It is meant to be a
database with REST API access used to store data needed by other
services.
== Contact Information
Owner::
Infrastructure Team
Contact::
#fedora-apps, #fedora-admin
Persons::
cverna, abompard
Location::
Phoenix (Openshift)
Public addresses::
* fpdc.fedoraproject.org
* fpdc.stg.fedoraproject.org
Servers::
* os.fedoraproject.org
* os.stg.fedoraproject.org
Purpose::
Centralize metadata and facilitate access.
== Systems
FPDC is built using the DJANGO REST FRAMEWORK and uses a POSTGRESQL
database to store the metadata. The application is run on Openshift and
uses the Source-to-image technology to build the container directly from
the https://github.com/fedora-infra/fpdc[git repository].
In the staging and production environments, the application is
automatically rebuilt for every new commit in the [.title-ref]#staging#
or [.title-ref]#production# branch, this is achieved by configuring a
github webhook's to trigger an openshift deployment.
For example a new deployment to staging would look like that:
____
git clone git@github.com:fedora-infra/fpdc.git cd fpdc git checkout
staging git rebase master git push origin staging
____
The initial Openshift project deployment is manual and is done using the
following ansible playbook :
....
sudo rbac-playbook openshift-apps/fpdc.yml
....
This will create a new fpdc project in Openshift with all the needed
configuration.
== Logs
Logs can be retrive using the openshift command line:
....
$ oc login os-master01.phx2.fedoraproject.org
You must obtain an API token by visiting https://os.fedoraproject.org/oauth/token/request
$ oc login os-master01.phx2.fedoraproject.org --token=<Your token here>
$ oc -n fpdc get pods
fpdc-28-bfj52 1/1 Running 522 28d
$ oc logs fpdc-28-bfj52
....
== Database migrations
FPDC uses the [.title-ref]#recreate# deployment configuration of
openshift, which means that openshift will bring down the pods currently
running and recreate new ones with the new version of the application.
In the phase between the pods being down and the new pods being up, the
database migrations are run in an independent pod.
== Things that could go wrong
Hopefully not much. If something goes wrong is it currently advised to
kill the pods to trigger a fresh deployment. :
....
$ oc login os-master01.phx2.fedoraproject.org
You must obtain an API token by visiting https://os.fedoraproject.org/oauth/token/request
$ oc login os-master01.phx2.fedoraproject.org --token=<Your token here>
$ oc -n fpdc get pods
fpdc-28-bfj52 1/1 Running 522 28d
$ oc delete pod fpdc-28-bfj52
....
It is also possible to rollback to a previous version :
....
$ oc -n fpdc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
fpdc 39 1 1 config,image(fpdc:latest)
$ oc -n fpdc rollback fpdc
....

View file

@ -0,0 +1,261 @@
= FreeMedia Infrastructure SOP
This page is for defining the SOP for Fedora FreeMedia Program. This
will cover the infrastructural things as well as procedural things.
== Contents
[arabic]
. Location of Resources
. Location on Ansible
. Opening of the form
. Closing of the Form
. Tentative timeline
. How to
____
[arabic]
. Open
. Close
____
____
[arabic, start=7]
. Handling of tickets
____
____
[arabic]
. Login
. Rejecting Invalid Tickets
. Accepting Valid Tickets
____
____
[arabic, start=8]
. Handling of non fulfilled requests
. How to handle membership applications
____
== Location of Resources
* The web form is at
https://fedoraproject.org/freemedia/FreeMedia-form.html
* The TRAC is at [63]https://fedorahosted.org/freemedia/report
== Location on ansible
$PWD = `roles/freemedia/files`
Freemedia form::
FreeMedia-form.html
Backup form::
FreeMedia-form.html.orig
Closed form::
FreeMedia-close.html
Backend processing script::
process.php
Error Document::
FreeMedia-error.html
== Opening of the form
The form will be opened on the First day of each month.
== Closing of the Form
=== Tentative timeline
The form will be closed after a couple of days. This may vary according
to the capacity.
== How to
* The form is available at `roles/freemedia/files/FreeMedia-form.html`
and `roles/freemedia/files//FreeMedia-form.html.orig`
* The closed form is at `roles/freemedia/files/FreeMedia-close.html`
=== Open
* Goto roles/freemedia/tasks
* Open `main.yml`
* Goto line 32.
* {blank}
+
To Open: Change the line to read::::
src="FreeMedia-form.html"
* After opening the form, go to trac and grant "Ticket Create and Ticket
View" privilege to "Anonymous".
=== Close
* Goto roles/freemedia/tasks
* Open main.yml
* Goto line 32.
* {blank}
+
To Close: Change the line to read::::
src="FreeMedia-close.html",
* {blank}
+
After closing the form, go to trac and remove "Ticket Create and::
Ticket View" privilege from "Anonymous".
[NOTE]
.Note
====
* Have to check about monthly cron. * Have to write about changing
init.pp for closing and opening
====
== Handling of tickets
=== Login
* {blank}
+
Contributors are requested to visit::
https://fedorahosted.org/freemedia/report
* Please login with your FAS account.
=== Rejecting Invalid Tickets
* If a ticket is invalid, don't accept the request. Go to "resolve as:"
and select "invalid" and then press "Submit Changes".
* A ticket is Invalid if
+
____
** No Valid email-id is provided.
** The region does not match the country.
** No Proper Address is given.
____
* If a ticket is duplicate, accept one copy, close the others as
duplicate Go to "resolve as:" and select "duplicate" and then press
"Submit Changes".
=== Accepting Valid Tickets
* If you wish to fulfill a request, please ensure it from the above
section, it is not liable to be discarded.
* Now "Accept" the ticket from the "Action" field at the bottom, and
press the "Submit Changes" button.
* These accepted tickets will be available from
https://fedorahosted.org/freemedia/report user both "My Tickets" and
"Accepted Tickets for XX" (XX= your region e.g APAC)
* When You ship the request, please go to the ticket again, go to
"resolve as:" from the "Action" field and select "Fixed" and then press
"Submit Changes".
* If an accepted ticket is not finalised by the end of the month, is
should be closed with "shipping status unknown" in a comment
=== Handling of non fulfilled requests
We shall close all the pending requests by the end of the Month.
* Please Check your region
=== How to handle membership applications
Steps to become member of Free-media Group.
[arabic]
. Create an account in Fedora Account System (FAS)
. {blank}
+
Create an user page in Fedora Wiki with contact data. Like::
User:<nick-name>. There are templates.
. Apply to Free-Media Group in FAS
. Apply to Free-Media mailing list subscription
==== Rules for deciding over membership applications
[cols=",,,,",]
|===
|Case |Applied to Free-Media Group |User Page Created |Applied to
Free-Media List a|
____
Action
____
|======= |================ |========== |===============
|=========================
|1 |Yes a|
____
Yes
____
|Yes |Approve Group and mailing list applications
a|
'''''
a|
'''''
a|
'''''
a|
'''''
|-------------------------Put on hold + Write to
|2 |Yes a|
____
Yes
____
|No |subscribe to list Within a Week
a|
'''''
a|
'''''
a|
'''''
a|
'''''
|-------------------------Put on hold + Write to
|3 |Yes a|
____
No
____
|whatever |make User Page Within a Week
|------- |---------------- |---------- |---------------
|-------------------------
|4 |No a|
____
No
____
|Yes |Reject
|===
[NOTE]
.Note
====
{empty}1. As you need to have an FAS account for steps 2 and 3, this is
not included in the decision rules above 2. The time to be on hold is
one week. If not action is taken after one week, the application has to
be rejected. 3. When writing asking to fulfil steps, send CC to other
Free-media sponsors to let them know the application has been reviewed.
====

View file

@ -0,0 +1,72 @@
= Freenode IRC Channel Infrastructure SOP
Fedora uses the freenode IRC network for it's IRC communications. If you
want to make a new Fedora Related IRC Channel, please follow the
following guidelines.
== Contents
[arabic]
. Contact Information
. Is a new channel needed?
. Adding new channel
. Recovering/fixing an existing channel
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin
Location:::
freenode
Servers:::
none
Purpose:::
Provides a channel for Fedora contributors to use.
== Is a new channel needed?
First you should see if one of the existing Fedora channels will meet
your needs. Adding a new channel can give you a less noisy place to
focus on something, but at the cost of less people being involved. If
you topic/area is development related, perhaps the main #fedora-devel
channel will meet your needs?
== Adding new channel
* Make sure the channel is in the #fedora-* namespace. This allows the
Fedora Group Coordinator to make changes to it if needed.
* Found the channel. You do this by /join #channelname, then /msg
chanserv register #channelname
* Setup GUARD mode. This allows ChanServ to be in the channel for easier
management: `/msg chanserv set #channel GUARD on`
* Add Some other Operators/Managers to the access list. This would allow
them to manage the channel if you are asleep or absent.:
+
....
/msg chanserv access #channel add NICK +ARfiorstv
....
You can see what the various flags mean at
http://toxin.jottit.com/freenode_chanserv_commands#cs03
You may want to consider adding some or all of the folks in #fedora-ops
who manage other channels to help you with yours. You can see this list
with [.title-ref]##/msg chanserv access #fedora-ops list##`
* Set default modes. `/msg chanserv set mlock #channel +Ccnt` (The t for
topic lock is optional, if your channel would like to have people change
the topic often).
* If your channel is of general interest, add it to the main communicate
page of IRC Channels, and possibly announce it to your target audience.
* You may want to request zodbot join your channel if you need it's
functions. You can request that in #fedora-admin.
== Recovering/fixing an existing channel
If there is an existing channel in the #fedora-* namespace that has a
missing founder/operator, please contact the Fedora Group Coordinator:
[49]User:Spot and request it be reassigned. Follow the above procedure
on the channel once done so it's setup and has enough operators/managers
to not need reassiging again.

View file

@ -0,0 +1,149 @@
= Freshmaker SOP
[NOTE]
.Note
====
Freshmaker is very new and changing rapidly. We'll try to keep this up
to date as best we can.
====
Freshmaker is a service that watches message bus activity and tries
to rebuild _compound_ artifacts when their constituent pieces change.
== Contact Information
Owner::
Factory2 Team, Release Engineering Team, Infrastructure Team
Contact::
#fedora-modularity, #fedora-admin, #fedora-releng
Persons::
jkaluza, cqi, qwan, sochotni, threebean
Location::
Phoenix
Public addresses::
* freshmaker.fedoraproject.org
Servers::
* freshmaker-frontend0[1-2].phx2.fedoraproject.org
* freshmaker-backend01.phx2.fedoraproject.org
Purpose::
Rebuild compound artifacts. See description for more detail.
== Description
See also
http://fedoraproject.org/wiki/Infrastructure/Factory2/Focus/Freshmaker
for some of the original (old) thinking on Freshmaker.
As per the summary above, Freshmaker is a bus-oriented system that
watches for changes to smaller pieces of content, and triggers rebuilds
of larger pieces of content.
It doesn't do the actual _builds_ itself, but instead requests rebuilds
in our existing build systems.
It handles a number of different content types. In Fedora, we would like
to roll out rebuilds in the following order:
=== Module Builds
When a spec file changes on a particular dist-git branch, trigger
rebuilds of all modules that declare dependencies on that rpm branch.
Consider the _traditional workflow_ today. You make a patch to the
[.title-ref]#f27# of your package, and you know you need to build that
patch for f27, and then later submit an update for this single build.
Packagers know what to do.
Consider the _modular workflow_. You make a patch to the
[.title-ref]#2.2# branch of your package, but now, which modules do you
rebuild? Maybe you had one in mind that you wanted to fix, but are there
others that you forgot about -- that you don't even know about? Kevin
could maintain a module that pulls in my rpm branch and he never told
me. Even if he did, I have to now maintain a list of modules that depend
on my rpm, and request rebuilds of them everytime I patch my .spec file.
This is unmanageable.
Freshmaker deals with this by watching the bus for dist-git fedmsg
messages. When it sees a change on a branch, it looks up the list of
modules that depend on that branch, and requests rebuilds of them in the
MBS.
=== Container Slow Flow
When a traditional rpm or modular rpm is _shipped stable_, this trigger
rebuilds of all containers that ever included previous versions of this
rpm.
This applies to both modular and non-modular contexts. Today, you build
an rpm that fixes a CVE, but _some other person_ maintains a container
that includes your RPM. Maybe they never told you about this. Maybe they
didn't notice your CVE fix. Their container will remain outdated and
vulnerable.. forever?
Freshmaker deals with this by watching the bus for dist-git messages
about rpms being shipped to the stable updates repo. When they're
shipped, it looks up all containers that ever included pervious versions
of the rpm in question, and it triggers rebuilds of them.
_Waiting_ until the rpm ships to stable is _necessary_ because the
container build process doesn't know about unshipped content. This is
how containers are built manually today, and it is annoying. Which
brings us to the more complicated...
=== Container Fast Flow
When a traditional rpm or modular rpm is _signed_, generate a repo
containing it and rebuild all containers that ever included that rpm
before. This is the better version of the slow flow, but is more
complicated so we're deferring it until after we've proved the first two
cases out.
Freshmaker will do this by requesting an interim build repo from ODCS
(the On Demand Compose Service). ODCS can be given the appropriate koji
tag and will produce a repo of (pre-signed) rpms. Freshmaker will
request a rebuild of the container and will pass the ODCS repo url in.
This gives us an auditable trail of disposable repos.
== Systems
There is a frontend and a backend.
Everything in the previous section describes the backend behavior.
The frontend exists to provide an HTTP API that can be queried to find
out the status of the backend: What is it doing? What is it planning to
do? What has it done already?
== Observing Freshmaker Behavior
There is currently no command line tool to query Freshmaker, but
Freshmaker provides REST API which can be used to observe Freshmaker
behavior. This is available at the following URLs:
* https://freshmaker.fedoraproject.org/api/1/events
* https://freshmaker.fedoraproject.org/api/1/builds
The first [.title-ref]#/events# URL should return a list of events that
Freshmaker has noticed, recorded, and is handling. Handled events should
produce associated builds.
The second [.title-ref]#/builds# URL should return a list of builds that
Freshmaker has submitted and is monitoring. Each build should be
traceable back to the event that triggered it.
== Logs
The frontend logs are on freshmaker-frontend0[1-2] in
`/var/log/httpd/error_log`.
The backend logs are on freshmaker-backend01. Look in the journal for
the [.title-ref]#fedmsg-hub# service.
== Upgrading
The package in question is [.title-ref]#freshmaker#. Please use the
[.title-ref]#playbooks/manual/upgrade/freshmaker.yml# playbook.
== Things that could go wrong
TODO. We don't know yet. Probably lots of things.

View file

@ -0,0 +1,39 @@
= Fedora gather easyfix SOP
Fedora-gather-easyfix as the name says gather tickets marked as easyfix
from multiple sources (pagure, github and fedorahosted currently).
Providing a single place for new-comers to find small tasks to work on.
== Contents
[arabic]
. Contact Information
. Documentation Links
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Location::
http://fedoraproject.org/easyfix/
Servers::
sundries01, sundries02, sundries01.stg
Purpose::
Gather easyfix tickets from multiple sources.
Upstream sources are hosted on github at:
https://github.com/fedora-infra/fedora-gather-easyfix/
The files are then mirrored to our ansible repo, under the
[.title-ref]#easyfix/gather# role.
The project is a simple script `gather_easyfix.py` gathering information
from the projects sets on the
https://fedoraproject.org/wiki/Easyfix[Fedora wiki] and outputing a
single html file. This html file is then improved via the css and
javascript files present in the sources.
The generated html file together with the css and js files are then
synced to the proxies for public consumption :)

View file

@ -0,0 +1,121 @@
= GDPR Delete SOP
This SOP covers how Fedora Infrastructure handles General Data
Protection Regulation (GDPR) Delete Requests. It contains information
about how system administrators will use tooling to respond to Delete
requests, as well as how application developers can integrate their
applications with that tooling.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
nirik
Location::
Phoenix
Servers::
batcave01.phx2.fedoraproject.org Various application servers, which
will run scripts to delete data.
Purpose::
Respond to Delete requests.
== Responding to a Deletion Request
This section covers how a system administrator will use our
`gdpr-delete.yml` playbook to respond to a Delete request.
When processing a Delete request, perform the following steps:
[arabic, start=0]
. Verify that the requester is who they say they are. If the request
came in email ask them to file an issue at
https://pagure.io/fedora-pdr/new_issue Use the following in email reply
to them:
+
`In order to verify your identity, please file a new issue at https://pagure.io/fedora-pdr/new_issue using the appropriate issue type. Please note this form requires you to sign in to your account to verify your identity.`
+
If the request has come via Red Hat internal channels as an explicit
request to delete, mark the ticket with the tag `rh`. This tag will help
delineate requests for any future reporting needs.
+
If they do not have a FAS account, indicate to them that there is no
data to be deleted. Use this response:
+
`Your request for deletion has been reviewed. Since there is no related account in the Fedora Account System, the Fedora infrastructure does not store data relevant for this deletion request. Note that some public content related to Fedora you may have previously submitted without an account, such as to public mailing lists, is not deleted since accurate maintenance of this data serves Fedora's legitimate business interests, the public interest, and the interest of the open source community.`
. Identify the users FAS account name. The Delete playbook will use this
FAS account to delete the required data. Update the fedora-pdr issue
saying the request has been received. There is a 'quick response' in the
pagure issue tracker to note this.
. Login to FAS and clear the `Telephone number` entry, set Country to
`Other`, clear `Lattitude` and `Longitude` and `IRC Nick` and
`GPG Key ID` and set `Time Zone` to UTC and `Locale` to `en` and set the
user status to `disabled`. If the user is not in cla_done plus one
group, you are done. Update the ticket and close it. This step will be
folded into the following one once we implement it.
. If the user is in cla_done + one group, they may have additional data:
Run the gdpr delete playbook on `batcave01`. You will need to define one
Ansible variable for the playbook. `sar_fas_user` will be the FAS
username of the user.
+
____
$ sudo ansible-playbook playbooks/manual/gdpr/delete.yml -e
gdpr_delete_fas_user=bowlofeggs
____
+
After the script completes, update the ticket that the request is
completed and close it. There is a 'quick response' in the pagure issue
tracker to note this.
== Integrating an application with our delete playbook
This section covers how an infrastructure application can be configured
to integrate with our `delete.yml` playbook. To integrate, you must
create a script and Ansible variables so that your application is
compatible with this playbook.
=== Script
You need to create a script and have your project's Ansible role install
that script somewhere (most likely on a host from your project - for
example fedocal's is going on `fedocal01`.) It's not a bad idea to put
your script into your upstream project. This script should accept one
environment variable as input: `GDPR_DELETE_USERNAME`. This will be a
FAS username.
Some scripts may need secrets embedded in them - if you must do this be
careful to install the script with `0700` permissions, ensuring that
only `gdpr_delete_script_user` (defined below) can run them. Bodhi
worked around this concern by having the script run as `apache` so it
could read Bodhi's server config file to get the secrets, so it does not
have secrets in its script.
=== Variables
In addition to writing a script, you need to define some Ansible
variables for the host that will run your script:
[cols=",,",options="header",]
|===
|Variable |Description |Example
|``gdpr_delete_script |`` The full path to the script. a|
____
`/usr/bin/fedocal-delete`
____
|``gdpr_delete_script |_user`` The user the script should be run as a|
____
`apache`
____
|===
You also need to add the host that the script should run on to the
`[gdpr_delete]` group in `inventory/inventory`:
....
[gdpr_delete]
fedocal01.phx2.fedoraproject.org
....

View file

@ -0,0 +1,153 @@
= GDPR SAR SOP
This SOP covers how Fedora Infrastructure handles General Data
Protection Regulation (GDPR) Subject Access Requests (SAR). It contains
information about how system administrators will use tooling to respond
to SARs, as well as how application developers can integrate their
applications with that tooling.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Persons::
bowlofeggs
Location::
Phoenix
Servers::
batcave01.phx2.fedoraproject.org Various application servers, which
will run scripts to collect SAR data.
Purpose::
Respond to SARs.
== Responding to a SAR
This section covers how a system administrator will use our `sar.yml`
playbook to respond to a SAR.
When processing a SAR, perform the following steps:
[arabic, start=0]
. Verify that the requester is who they say they are. If the request
came in email and the user has a FAS account, ask them to file an issue
at https://pagure.io/fedora-pdr/new_issue Use the following in email
reply to them:
+
`In order to verify your identity, please file a new issue at https://pagure.io/fedora-pdr/new_issue using the appropriate issue type. Please note this form requires you to sign in to your account to verify your identity.`
+
If the request has come via Red Hat internal channels as an explicit
request to delete, mark the ticket with the tag `rh`. This tag will help
delineate requests for any future reporting needs.
. Identify an e-mail address for the requester, and if applicable, their
FAS account name. The SAR playbook will use both of these since some
applications have data associated with FAS accounts and others have data
associated with e-mail addresses. Update the fedora-pdr issue saying the
request has been received. There is a 'quick response' in the pagure
issue tracker to note this.
. Run the SAR playbook on `batcave01`. You will need to define three
Ansible variables for the playbook. `sar_fas_user` will be the FAS
username, if applicable; this may be omitted if the requester does not
have a FAS account. `sar_email` will be the e-mail address associated
with the user. `sar_tar_output_path` will be the path you want the
playbook to write the resulting tarball to, and should have a `.tar.gz`
extension. For example, if `bowlofeggs` submitted a SAR and his e-mail
address is `bowlof@eggs.biz`, you might run the playbook like this:
+
....
$ sudo ansible-playbook playbooks/manual/gdpr/sar.yml -e sar_fas_user=bowlofeggs \
-e sar_email=bowlof@eggs.biz -e sar_tar_output_path=/home/bowlofeggs/bowlofeggs.tar.gz
....
. Generate a random sha512 with something like:
`openssl rand 512 | sha512sum` and then move the output file to
/srv/web/infra/pdr/the-sha512.tar.gz
. Update the ticket to fixed / processed on pdr requests to have a link
to https://infrastructure.fedoraproject.org/infra/pdr/the-sha512.tar.gz
and tell them it will be available for one week.
== Integrating an application with our SAR playbook
This section covers how an infrastructure application can be configured
to integrate with our `sar.yml` playbook. To integrate, you must create
a script and Ansible variables so that your application is compatible
with this playbook.
=== Script
You need to create a script and have your project's Ansible role install
that script somewhere (most likely on a host from your project - for
example Bodhi's is going on `bodhi-backend02`.) It's not a bad idea to
put your script into your upstream project - there are plans for
upstream Bodhi to ship `bodhi-sar`, for example. This script should
accept two environment variables as input: `SAR_USERNAME` and
`SAR_EMAIL`. Not all applications will use both, so do what makes sense
for your application. The first will be a FAS username and the second
will be an e-mail address. Your script should gather the required
information related to those identifiers and print it in a machine
readable format to stdout. Bodhi, for example, prints information to
stdout in `JSON`.
Some scripts may need secrets embedded in them - if you must do this be
careful to install the script with `0700` permissions, ensuring that
only `sar_script_user` (defined below) can run them. Bodhi worked around
this concern by having the script run as `apache` so it could read
Bodhi's server config file to get the secrets, so it does not have
secrets in its script.
=== Variables
In addition to writing a script, you need to define some Ansible
variables for the host that will run your script:
[cols=",,",options="header",]
|===
|Variable |Description |Example
|`sar_script` |The full path to the script. |`/usr/bin/bodhi-sar`
|`sar_script_user` |The user the script should be run as |`apache`
|`sar_output_file` |The name of the file to write into the output
tarball |`bodhi.json`
|===
You also need to add the host that the script should run on to the
`[sar]` group in `inventory/inventory`:
....
[sar]
bodhi-backend02.phx2.fedoraproject.org
....
=== Variables for OpenShift apps
When you need to add OpenShift app to SAR playbook, you need to add
following variables to existing `sar_openshift` dictionary:
[cols=",,",options="header",]
|===
|Variable |Description |Example
|`sar_script` |The full path to the script. |`/usr/local/bin/sar.py`
|`sar_output_file` |The name of the file to write into the output
tarball |`anitya.json`
|`openshift_namespace` |The namespace in which the application is
running |`release-monitoring`
|`openshift_pod` |The pod name in which the script will be run
|`release-monitoring-web`
|===
The `sar_openshift` dictionary is located in
`inventory/group_vars/os_masters`:
....
sar_openshift:
# Name of the app
release-monitoring:
sar_script: /usr/local/bin/sar.py
sar_output_file: anitya.json
openshift_namespace: release-monitoring
openshift_pod: release-monitoring-web
....

View file

@ -0,0 +1,62 @@
= geoip-city-wsgi SOP
A simple web service that return geoip information as JSON-formatted
dictionary in utf-8. Particularly, it's used by anaconda[1] to get the
most probable territory code, based on the public IP of the caller.
== Contents
[arabic]
. Contact Information
. Basic Function
. Ansible Roles
. Apps depending of geoip-city-wsgi
. Documentation Links
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Location::
https://geoip.fedoraproject.org
Servers::
sundries*, sundries*-stg
Purpose::
A simple web service that return geoip information as JSON-formatted
dictionary in utf-8. Particularly, it's used by anaconda[1] to get the
most probable territory code, based on the public IP of the caller.
== Basic Function
* Users go to https://geoip.fedoraproject.org/city
* The website is exposed via
`/etc/httpd/conf.d/geoip-city-wsgi-proxy.conf`.
* Return a string with geoip information with syntax as JSON-formatted
dict in utf8
* It also currently accepts one override: ?ip=xxx.xxx.xxx.xxx, e.g.
https://geoip.fedoraproject.org/city?ip=18.0.0.1 which then uses the
passed IP address instead of the determined IP address of the client.
== Ansible Roles
The geoip-city-wsgi role
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/geoip-city-wsgi
is present in sundries playbook
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/groups/sundries.yml
the proxy task are present in
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/include/proxies-reverseproxy.yml
== Apps depending of geoip-city-wsgi
unknown
== Documentation Links
app: https://geoip.fedoraproject.org source:
https://github.com/fedora-infra/geoip-city-wsgi bugs:
https://github.com/fedora-infra/geoip-city-wsgi/issues Role:
https://pagure.io/fedora-infra/ansible/blob/main/f/tree/roles/geoip-city-wsgi
[1] https://fedoraproject.org/wiki/Anaconda

View file

@ -0,0 +1,67 @@
= Using github for Infra Projects
We're presently using github to host git repositories and issue tracking
for some infrastructure projects. Anything we need to know should be
recorded here.
== Setting up a new repo
Create projects inside of the fedora-infra group:
https://github.com/fedora-infra
That will allow us to more easily track what projects we have.
[TODO] How do we create a new project and import it?
* After creating a new repo, click on the Settings tab to set up some
fancy things.
+
If using git-flow for your project:
** Set the default branch from 'master' to 'develop'. Having the default
branch be develop is nice: new contributors will automatically start
committing there if they're not paying attention to what branch they're
on. You almost never want to commit directly to the master branch.
+
If there does not exist a develop branch, you should create one by
branching off of master.:
+
....
$ git clone GIT_URL
$ git checkout -b develop
$ git push --all
....
** Set up an IRC hook for notifications. From the "settings" tab click
on "Webhooks & Services." Under the "Add Service" dropdown, find "IRC"
and click it. You might need to enter your password. In the form, you
probably want the following values:
*** Server, irc.freenode.net
*** Port, 6697
*** Room, #fedora-apps
*** Nick, <nothing>
*** Branch Regexes, <nothing>
*** Password, <nothing>
*** Ssl, <on>
*** Message Without Join, <on>
*** No Colors, <off>
*** Long Url, <off>
*** Notice, <on>
*** Active, <on>
== Add an EasyFix label
The EasyFix label is used to mark bugs that are potentially fixable by
new contributors getting used to our source code or relatively new to
python programming. GitHub doesn't provide this label automatically so
we have to add it. You can add the label from the issues page of the
repository or use this curl command to add it:
....
curl -k -u '$GITHUB_USERNAME:$GITHUB_PASSWORD' https://api.github.com/repos/fedora-infra/python-fedora/labels -H "Content-Type: application/json" -d '{"name":"EasyFix","color":"3b6eb4"}'
....
Please try to use the same color for consistency between Fedora
Infrastructure Projects. You can then add the github repo to the list
that easyfix.fedoraproject.org scans for easyfix tickets here:
https://fedoraproject.org/wiki/Easyfix

View file

@ -0,0 +1,50 @@
= github2fedmsg SOP
Bridge github events onto our fedmsg bus.
App: https://apps.fedoraproject.org/github2fedmsg/ Source:
https://github.com/fedora-infra/github2fedmsg/
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
github2fedmsg01
Purpose::
Bridge github events onto our fedmsg bus.
== Description
github2fedmsg is a small Python Pyramid app that bridges github events
onto our fedmsg bus by way of github's "webhooks" feature. It is what
allows us to have IRC notifications of github activity via fedmsg. It
has two phases of operation:
* Infrequently, a user will log in to github2fedmsg via Fedora OpenID.
They then push a button to also log in to github.com. They are then
logged in to github2fedmsg with _both_ their FAS account and their
github account.
+
They are then presented with a list of their github repositories. They
can toggle each one: "on" or "off". When they turn a repo on, our webapp
makes a request to github.com to install a "webhook" for that repo with
a callback URL to our app.
* When events happen to that repo on github.com, github looks up our
callback URL and makes an http POST request to us, informing us of the
event. Our github2fedmsg app receives that, validates it, and then
republishes the content to our fedmsg bus.
== What could go wrong?
* Restarting the app or rebooting the host shouldn't cause a problem. It
should come right back up.
* Our database could die. We have a db with a list of all the repos we
have turned on and off. We would want to restore that from backup.
* If github gets compromised, they might have to revoke all of their
application credentials. In that case, our app would fail to work. There
are _lots_ of private secrets set in our private repo that allow our app
to talk to github.com. There are inline comments there with instructions
about how to generate new keys and secrets.

View file

@ -0,0 +1,26 @@
= Gitweb Infrastructure SOP
Gitweb-caching is the web interface we use to expose git to the web at
http://git.fedorahosted.org/git/
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-hosted
Location::
Serverbeach
Servers::
hosted[1-2]
Purpose::
Http access to git sources.
== Basic Function
* Users go to [46]http://git.fedorahosted.org/git/
* Pages are generated from cache stored in `/var/cache/gitweb-caching/`.
* The website is exposed via
`/etc/httpd/conf.d/git.fedoraproject.org.conf`.
* Main config file is `/var/www/gitweb-caching/gitweb_config.pl`. This
pulls git repos from /git/.

View file

@ -0,0 +1,112 @@
= Greenwave SOP
== Contact Information
Owner::
Factory2 Team, Fedora QA Team, Infrastructure Team
Contact::
#fedora-qa, #fedora-admin
Persons::
gnaponie (giulia), mprahl, lucarval, ralph (threebean)
Location::
Phoenix
Public addresses::
* https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/version
* https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/policies
* https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/decision
Servers::
* In OpenShift.
Purpose::
Provide gating decisions.
== Description
* See
http://fedoraproject.org/wiki/Infrastructure/Factory2/Focus/Greenwave[the
focus document] for background.
* See https://pagure.io/docs/greenwave/[the upstream docs] for more
detailed info.
Greenwave's job is:
* answering yes/no questions (or making decisions)
* about artifacts (RPM packages, source tarballs, …)
* at certain gating points in our pipeline
* based on test results
* according to some policy
In particular, we'll be using Greenwave to provide yes/no gating
decisions _to Bodhi_ about rpms in each update. Greenwave will do this
by consulting resultsdb and waiverdb for individual test results and
then combining those results into an aggregate decision.
The _policies_ for how those results should be combined or ignored, are
defined in ansible in
`roles/openshift-apps/greenwave/templates/configmap.yml`. We expect to
grow these over time to new use cases (rawhide compose gating, etc..)
== Observing Greenwave Behavior
Login to `os-master01.phx2.fedoraproject.org` as `root` (or,
authenticate remotely with openshift using
`oc login https://os.fedoraproject.org`), and run:
....
$ oc project greenwave
$ oc status -v
$ oc logs -f dc/greenwave-web
....
== Database
Greenwave currently has no database (and we'd like to keep it that way).
It relies on `resultsdb` and `waiverdb` for information.
== Upgrading
You can roll out configuration changes by changing the files in
`roles/openshift-apps/greenwave/` and running the
`playbooks/openshift-apps/greenwave.yml` playbook.
To understand how the software is deployed, take a look at these two
files:
* `roles/openshift-apps/greenwave/templates/imagestream.yml`
* `roles/openshift-apps/greenwave/templates/buildconfig.yml`
See that we build a fedora-infra specific image on top of an app image
published by upstream. The `latest` tag is automatically deployed to
staging. This should represent the latest commit to the `master` branch
of the upstream git repo that passed its unit and functional tests.
The `prod-fedora` tag is manually controlled. To upgrade prod to match
what is in stage, move the `prod-fedora` tag to point to the same image
as the `latest` tag. Our buildconfig is configured to poll that tag, so
a new os.fp.o build and deployment should be automatically created.
You can watch the build and deployment with `oc` commands.
You can poll this URL to see what version is live at the moment:
https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/version
== Troubleshooting
In case of problems with greenwave messaging, check the logs of the
container dc/greenwave-fedmsg-consumers to see if the is something
wrong:
....
$ oc logs -f dc/greenwave-fedmsg-consumers
....
It is also possible to check if greenwave is actually publishing
messages looking at
https://apps.fedoraproject.org/datagrepper/raw?category=greenwave&delta=127800&rows_per_page=1[this
link] and checking the time of the last message.
In case of problems with greenwave webapp, check the logs of the
container dc/greenwave-web:
....
$ oc logs -f dc/greenwave-web
....

View file

@ -0,0 +1,134 @@
= Guest Disk Resize SOP
Resize disks in our kvm guests
== Contents
[arabic]
. Contact Information
. How to do it
____
[arabic]
. KVM/libvirt Guests
____
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main
Location:::
PHX, Tummy, ibiblio, Telia, OSUOSL
Servers:::
All xen servers, kvm/libvirt servers.
Purpose:::
Resize guest disks
== How to do it
=== KVM/libvirt Guests
[arabic]
. {blank}
+
SSH to the kvm server and resize the guest's logical volume. If you::
want to be extra careful, make a snapshot of the LV first:
+
....
lvcreate -n [guest name]-snap -L 10G -s /dev/VolGroup00/[guest name]
....
+
Optional, but always good to be careful
. Shutdown the guest:
+
....
sudo virsh shutdown [guest name]
....
. Disable the guests lv:
+
....
lvchange -an /dev/VolGroup00/[guest name]
....
. Resize the lv:
+
....
lvresize -L [NEW TOTAL SIZE]G /dev/VolGroup00/[guest name]
or
lvresize -L +XG /dev/VolGroup00/[guest name]
(to add X GB to the disk)
....
. Enable the lv:
+
....
lvchange -ay /dev/VolGroup00/[guest name]
....
. Bring the guest back up:
+
....
sudo virsh start [guest name]
....
. Login into the guest:
+
....
sudo virsh console [guest name]
You may wish to boot single user mode to avoid services coming up and going down again
....
. On the guest, run:
+
....
fdisk /dev/vda
....
. Delete the the LVM partition on the guest you want to add space to and
recreate it with the maximum size. Make sure to set its type to LV (8e):
+
....
p to list partitions
d to delete selected partition
n to create new partition (default values should be ok)
t to change partition type (set to 8e)
w to write changes
....
. Run partprobe:
+
....
partprobe
....
. Check the size of the partition:
+
....
fdisk -l /dev/vdaN
....
+
If this still reflects the old size, then reboot the guest and verify
that its size changed correctly when it comes up again.
. Login to the guest again, and run:
+
....
pvresize /dev/vdaN
....
. A vgs should now show the new size. Use lvresize to resize the root
lv:
+
....
lvresize -L [new root partition size]G /dev/GuestVolGroup00/root
(pvs will tell you how much space is available)
....
. Finally, resize the root partition:
+
....
resize2fs /dev/GuestVolGroup00/root
(If the root fs is ext4)
or
xfs_growfs /dev/GuestVolGroup00/root
(if the root fs is xfs)
....
+
verify that everything worked out, and delete the snapshot you made if
you made one.

View file

@ -0,0 +1,80 @@
= Guest Editing SOP
Various virsh commands
== Contents
[arabic]
. Contact Information
. How to do it
+
____
[arabic]
.. add/remove cpus
.. resize memory
____
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main
Location:::
PHX, Tummy, ibiblio, Telia, OSUOSL
Servers:::
All xen servers, kvm/libvirt servers.
Purpose:::
Resize guest disks
== How to do it
=== Add cpu
[arabic]
. SSH to the virthost server
. Calculate the number of CPUs the system needs
. `sudo virsh setvcpus <guest> <num_of_cpus> --config` - ie:
+
....
sudo virsh setvcpus bapp01 16 --config
....
. Shutdown the virtual system
. Start the virtual system
[NOTE]
.Note
====
Note that using [.title-ref]#virsh reboot# is insufficient. You have to
actually stop the domain and start it with `virsh destroy <guest>` and
`virsh start <guest>` for the change to take effect.
====
[arabic, start=6]
. Login and check that cpu count matches
. *Remember to update the group_vars in ansible* to match the new value
you set, if appropriate.
=== Resize memory
[arabic]
. SSH to the virthost server
. Calculate the amount of memory the system needs in kb
. `sudo virsh setmem <guest> <num_in_kilobytes> --config` - ie:
+
....
sudo virsh setmem bapp01 16777216 --config
....
. Shutdown the virtual system
. Start the virtual system
[NOTE]
.Note
====
Note that using [.title-ref]#virsh reboot# is insufficient. You have to
actually stop the domain and start it with `virsh destroy <guest>` and
`virsh start <guest>` for the change to take effect.
====
[arabic, start=6]
. Login and check that memory matches
. *Remember to update the group_vars in ansible* to match the new value
you set, if appropriate.

View file

@ -0,0 +1,143 @@
= Haproxy Infrastructure SOP
haproxy is an application that does load balancing at the tcp layer or
at the http layer. It can do generic tcp balancing but it does
specialize in http balancing. Our proxy servers are still running apache
and that is what our users connect to. But instead of using
mod_proxy_balancer and ProxyPass balancer://, we do a ProxyPass to
[45]http://localhost:10001/ or [46]http://localhost:10002/. haproxy must
be told to listen to an individual port for each farm. All haproxy farms
are listed in /etc/haproxy/haproxy.cfg.
== Contents
[arabic]
. Contact Information
. How it works
. Configuration example
. Stats
. Advanced Usage
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main, sysadmin-web group
Location:::
Phoenix, Tummy, Telia
Servers:::
proxy1, proxy2, proxy3, proxy4, proxy5
Purpose:::
Provides load balancing from the proxy layer to our application layer.
== How it works
haproxy is a load balancer. If you're familiar, this section won't be
that interesting. haproxy in its normal usage acts just like a web
server. It listens on a port for requests. Unlike most webservers though
it then sends that request to one of our back end application servers
and sends the response back. This is referred to as reverse proxying. We
typically configure haproxy to send check to a specific url and look for
the response code. If this url isn't sent, it just does basic checks to
/. In most of our configurations we're using round robin balancing. IE,
request 1 goes to app1, request2 goes to app2, request 3 goes to app3
request 4 goes to app1, and the whole process repeats.
[WARNING]
.Warning
====
These checks do add load to the app servers. As well as additional
connections. Be smart about which url you're checking as it gets checked
often. Also be sure to verify the application servers can handle your
new settings, monitor them closely for the hour or two after you make
changes.
====
== Configuration example
The below example is how our fedoraproject wiki could be configured.
Each application should have its own farm. Even though it may have an
identical configuration to another farm, this allows easy addition and
subtraction of specific nodes when we need them.:
....
listen fpo-wiki 0.0.0.0:10001
balance roundrobin
server app1 app1.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5
server app2 app2.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5
server app4 app4.fedora.phx.redhat.com:80 backup check inter 2s rise 2 fall 5
option httpchk GET /wiki/Infrastructure
....
* The first line "listen ...." Says to create a farm called 'fpo-wiki'.
Listening on all IP's on port 10001. fpo-wiki can be arbitrary but make
it something obvious. Aside from that the important bit is :10001.
Always make sure that when creating a new farm, its listening on a
unique port. In Fedora's case we're starting at 10001, and moving up by
one. Just check the config file for the lowest open port above 10001.
* The next line "balance roundrobin" says to use round robin balancing.
* The server lines each add a new node to the balancer farm. In this
case the wiki is being served from app1, app2 and app4. If the wiki is
available at [53]http://app1.fedora.phx.redhat.com/wiki/ Then this
config would be used in conjunction with "RewriteRule ^/wiki/(.*)
[54]http://localhost:10001/wiki/$1 [P,L]".
* 'server' means we're adding a new node to the farm
* {blank}
+
'app1' is the worker name, it is analagous to fpo-wiki but should::
match shorthostname of the node to make it easy to follow.
* 'app1.fedora.phx.redhat.com:80' is the hostname and port to be
contacted.
* 'check' means to check via bottom line "option httpchk GET
/wiki/Infrastructure" which will use /wiki/Infrastructure to verify the
wiki is working. If that URL fails, that entire node will be taken out
of the farm mix.
* 'inter 2s' means to check every 2 seconds. 2s is the same as 2000 in
this case.
* 'rise 2' means to not put this node back in the mix until it has had
two successful connections in a row. haproxy will continue to check
every 2 seconds whether a node is up or down
* 'fall 5' means to take a node out of the farm after 5 failures.
* 'backup' You'll notice that app4 has a 'backup' option. We don't
actually use this for the wiki but do for other farms. It basically
means to continue checking and treat this node like any other node but
don't send it any production traffic unless the other two nodes are
down.
All of these options can be tweaked so keep that in mind when changing
or building a new farm. There are other configuration options in this
file that are global. Please see the haproxy documentation for more
info:
....
/usr/share/doc/haproxy-1.3.14.6/haproxy-en.txt
....
== Stats
In order to view the stats for a farm please see the stats page. Each
proxy server has its own stats page since each one is running its own
haproxy server. To view the stats point your browser to
https://admin.fedoraproject.org/haproxy/shorthostname/ so proxy1 is at
https://admin.fedoraproject.org/haproxy/proxy1/ The trailing / is
important.
* https://admin.fedoraproject.org/haproxy/proxy1/
* https://admin.fedoraproject.org/haproxy/proxy2/
* https://admin.fedoraproject.org/haproxy/proxy3/
* https://admin.fedoraproject.org/haproxy/proxy4/
* https://admin.fedoraproject.org/haproxy/proxy5/
== Advanced Usage
haproxy has some more advanced usage that we've not needed to worry
about yet but is worth mentioning. For example, one could send users to
just one app server based on session id. If user A happened to hit app1
first and user B happened to hit app4 first. All subsequent requests for
user A would go to app1 and user B would go to app4. This is handy for
applications that cannot normally be balanced because of shared storage
needs or other locking issues. This won't solve all problems though and
can have negative affects for example when app1 goes down user A would
either lose their session, or be unable to work until app1 comes back
up. Please do some great testing before looking in to this option.

View file

@ -0,0 +1,191 @@
= Fedorahosted migrations
Migrating hosted repositories to that of another type.
== Contents
[arabic]
. Contact Information
. Description
. SVN to GIT migration
____
[arabic]
. Questions left to be answered with this SOP
____
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-hosted
Location::
Serverbeach
Servers::
hosted1, hosted2
Purpose::
Migrate hosted SCM repositories to that of another SCM.
== Description
fedorahosted.org can be used to host open source projects. Occasionally
those projects want to change the SCM they utilize. This document
provides documentation for doing so.
[arabic]
. An scm for maintaining the code. The currently supported scm's include
Mercurial, Git, Bazaar, or SVN. Note: There is no cvs
. A trac instance, which provides a mini-wiki for hosting information
and also provides a ticketing system.
. A mailing list
[IMPORTANT]
.Important
====
This page is for administrators only. People wishing to request a hosted
project should use the [50]Ticketing System ; see the new project
request template. (Requires Fedora Account)
====
== SVN to GIT migration
=== FAS User Prep
Currently you must manually generate $PROJECTNAME-users.txt by grabbing
a list of people in the FAS group - and recording them in th following
format:
....
$fasusername = FirstName LastName <$emailaddress>
....
This is error prone, and will stop the git-svn fetch below if an author
appears that doesn't exist in the list of users.:
....
svn log --quiet | awk '/^r/ {print $3}' | sort -u
....
The above will generate a list of users in the svn repo.
If all users are FAS users you can use the following script to create a
users file (written by tmz (Todd Zullinger):
....
#!/bin/bash
if [ -z "$1" ]; then
echo "usage: $0 <svn repo>" >&2
exit 1
fi
svnurl=file:///svn/$1
if ! svn info $svnurl &>/dev/null; then
echo "$1 is not a valid svn repo." >&2
fi
svn log -q $svnurl | awk '/^r[0-9]+/ {print $3}' | sort -u | while read user; do
name=$( (getent passwd $user 2>/dev/null | awk -F: '{print $5}') || '' )
[ -z "$name" ] && name=$user
email="$user@fedoraproject.org"
echo "$user=$name <$email>"
done
....
=== Doing the conversion
[arabic]
. Log into hosted1
. Make a temporary directory to convert the repos in:
+
....
$ sudo mkdir /tmp/tmp-$PROJECTNAME.git
$ cd /tmp/tmp-$PROJECTNAME.git
....
. Create an git repo ready to receive migrated SVN data:
+
....
$ sudo git-svn init http://svn.fedorahosted.org/svn/$PROJECTNAME --no-metadata
....
. Tell git to fetch and convert the repository:
+
....
$ git svn fetch
.. note::
This creation of a temporary repository is necessary because SVN leaves a
number of items floating around that git can ignore, and we want those
essentially ignored.
....
. {blank}
+
From here, you'll wanted to follow [53]Creating a new git repo as if::
cloning an existing git repository to Fedorahosted.
. After that process is done - kindly remove the temporary repo that was
created:
+
....
$ sudo rm -rf /tmp/tmp-$PROJECTNAME.git
....
=== Doing the converstion (alternate)
Alternately, here's another way to do this (tmz):
Setup a working dir:
....
[tmz@hosted1 tmp (master)]$ mkdir im-chooser-conversion && cd im-chooser-conversion
....
Create authors file mapping svn usernames to Name <email> form git
uses.:
....
[tmz@hosted1 im-chooser-conversion (master)]$ ~tmz/svn-to-git-authors im-chooser > authors
....
Convert svn to git:
....
[tmz@hosted1 im-chooser-conversion (master)]$ git svn clone -s -A authors --no-metadata file:///svn/im-chooser
....
Move svn branches and tags into proper locations for the new git repo.
(git-svn leaves them as 'remote' branches/tags.):
....
[tmz@hosted1 im-chooser-conversion (master)]$ cd im-chooser
[tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/tags/* .git/refs/tags/ && rmdir .git/refs/remotes/tags
[tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/* .git/refs/heads/
....
Now 'git branch' and 'git tag' should display the branches/tags.
Create a bare repo from the converted git repo. Using `file://$(pwd)`
here ensures that git copies all objects to the new bare repo.:
....
[tmz@hosted1 im-chooser-conversion (master)]$ git clone --bare --shared file://$(pwd)/im-chooser im-chooser.git
....
Follow the steps in
https://fedoraproject.org/wiki/Hosted_repository_setup to finish setting
proper modes and permissions for the repo. Don't forget to update the
description file.
[NOTE]
.Note
====
This still leaves moving the converted bare repo (im-chooser.git) to
/git and fixing up the user/group.
====
== Questions left to be answered with this SOP
* Obviously we need to have requestor review the migration and confirm
it's ok.
* Do we then delete the old SCM contents?
* Do we need to change the FAS-group type to grant them access to
pull/push from it?

View file

@ -0,0 +1,51 @@
= HOTFIXES SOP
From time to time we have to quickly patch a problem or issue in
applications in our infrastructure. This process allows us to do that
and track what changed and be ready to remove it when the issue is fixed
upstream.
== Ansible based items:
For ansible, they should be placed after the task that installs the
package to be changed or modified. Either in roles or tasks.
hotfix tasks should be called "HOTFIX description" They should also link
in comments to any upstream bug or ticket. They should also have tags of
'hotfix'
The process is:
* Create a diff of any files changed in the fix.
* Check in the _link:[original] files and change to role/task
* Check in now your diffs of those same files.
* ansible will replace the files on the affected machines completely
with the fixed versions.
* If you need to back it out, you can revert the diff step, wait and
then remove the first checkin
Example:
....
<task that installs the httpd package>
#
# install hash randomization hotfix
# See bug https://bugzilla.redhat.com/show_bug.cgi?id=812398
#
- name: hotfix - copy over new httpd init script
copy: src="{{ files }}/hotfix/httpd/httpd.init" dest=/etc/init.d/httpd
owner=root group=root mode=0755
notify:
- restart apache
tags:
- config
- hotfix
- apache
....
== Upstream changes
Also, if at all possible a bug should be filed with the upstream
application to get the fix in the next version. Hotfixes are something
we should strive to only carry a short time.

View file

@ -0,0 +1,147 @@
= The New Hotness
https://github.com/fedora-infra/the-new-hotness/[the-new-hotness] is a
https://fedora-messaging.readthedocs.io/en/stable/[fedora messaging
consumer] that subscribes to
https://release-monitoring.org/[release-monitoring.org] fedora messaging
notifications to determine when a package in Fedora should be updated.
For more details on the-new-hotness, consult the
http://the-new-hotness.readthedocs.io/[project documentation].
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin #fedora-apps
Persons::
zlopez
Location::
iad2.fedoraproject.org
Servers::
Production
+
* hotness01.iad2.fedoraproject.org
+
Staging
+
* hotness01.stg.iad2.fedoraproject.org
Purpose::
File issues when upstream projects release new versions of a package
== Hosts
The current deployment is made up of the-new-hotness OpenShift
namespace.
[[the-new-hotness-1]]
=== the-new-hotness
This OpenShift namespace runs following pods:
* A fedora messaging consumer
This OpenShift project relies on:
* `anitya-sop` as message publisher
* Fedora messaging RabbitMQ hub for consuming messages
* Koji for scratch builds
* Bugzilla for issue reporting
== Releasing
The release process is described in
https://the-new-hotness.readthedocs.io/en/stable/dev-guide.html#release-guide[the-new-hotness
documentation].
=== Deploying
Staging deployment of the-new-hotness is deployed in OpenShift on
os-master01.stg.iad2.fedoraproject.org.
To deploy staging instance of the-new-hotness you need to push changes
to staging branch on
https://github.com/fedora-infra/the-new-hotness[the-new-hotness GitHub].
GitHub webhook will then automatically deploy a new version of
the-new-hotness on staging.
Production deployment of the-new-hotness is deployed in OpenShift on
os-master01.iad2.fedoraproject.org.
To deploy production instance of the-new-hotness you need to push
changes to production branch on
https://github.com/fedora-infra/the-new-hotness[the-new-hotness GitHub].
GitHub webhook will then automatically deploy a new version of
the-new-hotness on production.
==== Configuration
To deploy the new configuration, you need
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/sshaccess.html[ssh
access] to batcave01.iad2.fedoraproject.org and
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/ansible.html[permissions
to run the Ansible playbook].
All the following commands should be run from batcave01.
First, ensure there are no configuration changes required for the new
update. If there are, update the Ansible anitya role(s) and optionally
run the playbook:
....
$ sudo rbac-playbook openshift-apps/the-new-hotness.yml
....
The configuration changes could be limited to staging only using:
....
$ sudo rbac-playbook openshift-apps/the-new-hotness.yml -l staging
....
This is recommended for testing new configuration changes.
==== Upgrading
===== Staging
To deploy new version of the-new-hotness you need to push changes to
staging branch on
https://github.com/fedora-infra/the-new-hotness[the-new-hotness GitHub].
GitHub webhook will then automatically deploy a new version of
the-new-hotness on staging.
===== Production
To deploy new version of the-new-hotness you need to push changes to
production branch on
https://github.com/the-new-hotness/anitya[the-new-hotness GitHub].
GitHub webhook will then automatically deploy a new version of
the-new-hotness on production.
Congratulations! The new version should now be deployed.
== Monitoring Activity
It can be nice to check up on the-new-hotness to make sure its behaving
correctly. You can see all the Bugzilla activity using the
https://bugzilla.redhat.com/page.cgi?id=user_activity.html[user activity
query] (staging uses
https://partner-bugzilla.redhat.com/page.cgi?id=user_activity.html[partner-bugzilla.redhat.com])
and querying for the `upstream-release-monitoring@fedoraproject.org`
user.
You can also view all the Koji tasks dispatched by the-new-hotness. For
example, you can see the
https://koji.fedoraproject.org/koji/tasks?state=failed&owner=hotness[failed
tasks] it has created.
To monitor the pods of the-new-hotness you can connect to Fedora infra
OpenShift and look at the state of pods.
For staging look at the [.title-ref]#the-new-hotness# namespace in
https://os.stg.fedoraproject.org/console/project/release-monitoring/overview[staging
OpenShift instance].
For production look at the [.title-ref]#the-new-hotness# namespace in
https://os.fedoraproject.org/console/project/release-monitoring/overview[production
OpenShift instance].

View file

@ -0,0 +1,144 @@
= Fedora Hubs SOP
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-tools, sysadmin-hosted
Location::
?
Servers::
<prod-srv-hostname>, <stg-srv-hostname>, hubs-dev.fedorainfracloud.org
Purpose::
Contributor and team portal.
== Description
Fedora Hubs aggregates user and team activity throughout the Fedora
infrastructure (and elsewhere) to show what a user or a team is doing.
It helps new people find a place to contribute.
=== Components
Fedora Hubs has the following components:
* a SQL database like PostgreSQL (in the Fedora infra we're using the
shared database).
* a Redis server that is used as a message bus (it is not critical if
the content is lost). System service: `redis`.
* a MongoDB server used to store the contents of the activity feeds.
It's JSON data, limited to 100 entries per user or group. Service:
`mongod`.
* a Flask-based WSGI app served by Apache + mod_wsgi, that will also
serve the JS front end as static files. System service: `httpd`.
* a Fedmsg listener that receives messages from the fedmsg bus and puts
them in Redis. System service: `fedmsg-hub`.
* a set of "triage" workers that pull the raw messages from Redis,
process them using SQL queries and puts work items in another Redis
queue. System service: `fedora-hubs-triage@`.
* a set of "worker" daemons that pull from this other Redis queue, work
on the items by making SQL queries and external HTTP requests (to Github
for example), and put reload notifications in the SSE Redis queue. They
also access the caching system, which can be local files or memcached.
System service: `fedora-hubs-worker@`.
* The SSE server (Twisted-based) that pulls from that Redis queue and
sends reload notifications to the connected browsers. It handles
long-lived HTTP connection but there is little activity: only the
notifications and a "keepalive ping" message every 30 seconds to every
connected browser. System service: `fedora-hubs-sse`. Apache is
configured to proxy the `/sse` path to this server.
== Managing the services
Restarting all the services:
....
systemctl restart fedmsg-hub fedora-hubs-\*
....
By default, 4 `triage` daemons and 4 `worker` daemons are enabled. To
add another `triage` daemon and another `worker` daemon, you can run:
....
systemctl enable --now fedora-hubs-triage@5.service
systemctl enable --now fedora-hubs-worker@5.service
....
It is not necessary to have the same number of `triage` and `worker`
daemons, in fact it is expected that more `worker` than `triage` daemons
will be necessary, as they do more time-consuming work.
== Hubs-specific operations
Other Hubs-specific operations are done using the
[.title-ref]#fedora-hubs# command:
....
$ fedora-hubs
Usage: fedora-hubs [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
cache Cache-related operations.
db Database-related operations.
fas FAS-related operations.
run Run daemon processes.
....
=== Manipulating the cache
The `cache` subcommand is used to do cache-related operations:
....
$ fedora-hubs cache
Usage: fedora-hubs cache [OPTIONS] COMMAND [ARGS]...
Cache-related operations.
Options:
--help Show this message and exit.
Commands:
clean Clean the specified WIDGETs (id or name).
coverage Check the cache coverage.
list List widgets for which there is cached data.
....
For example, to check the cache coverage:
....
$ fedora-hubs cache coverage
107 cached values found, 95 are missing.
52.97 percent cache coverage.
....
The cache coverage value is an interesting metric that could be used in
a Nagios check. A value below 50% could be considered as significant of
application slowdowns and could thus generate a warning.
=== Interacting with FAS
The `fas` subcommand is used to get information from FAS:
....
$ fedora-hubs fas
Usage: fedora-hubs fas [OPTIONS] COMMAND [ARGS]...
FAS-related operations.
Options:
--help Show this message and exit.
Commands:
create-team Create the team hub NAME from FAS.
sync-teams Sync all the team hubs NAMEs from FAS.
....
To add a new team hub for a FAS group, run:
....
$ fedora-hubs fas create-team <fas-group-name>
....

View file

@ -0,0 +1,60 @@
= IBM RSA II Infrastructure SOP
Many of our physical machines use RSA II cards for remote management.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
PHX, ibiblio
Servers::
All physical IBM machines
Purpose::
Provide remote management for our physical IBM machines
== Restarting the RSA II card
Normally, the RSA II can be restarted from the web/ssh interface. If you
are locked out of any outside access to the RSA II, follow these
instructions on the physical machine.
If the machine can be rebooted without issue, cut off all power to the
machine, wait a few seconds, and restart everything.
Otherwise, to restart the card without rebooting the machine:
[arabic]
. Download and install the IBM Remote Supervisor Adapter II Daemon
+
____
[arabic]
.. `yum install usbutils libusb-devel` # (needed by the RSA II daemon)
.. {blank}
+
Download the correct tarball from::
http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5071676&brandind=5000008
(TODO: check if this can be packaged in Fedora)
.. Extract the tarball and run `sudo ./install.sh --update`
____
. {blank}
+
Download and extract the IBM Advanced Settings Utility::
http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=TOOL-ASU&brandind=5000016
+
____
[WARNING]
.Warning
====
this tarball dumps files in the current working directory
====
____
. Issue a `sudo ./asu64 rebootrsa` to reboot the RSA II.
. Clean up: `yum remove ibmusbasm64`
== Other Resources
http://www.redbooks.ibm.com/abstracts/sg246495.html may be a useful
resource to refer to when working with this.

View file

@ -0,0 +1,73 @@
= System Administrator Guide
Welcome to The Fedora Infrastructure system administration guide.
[[sysadmin-getting-started]]
== Getting Started
If you haven't already, you should complete the general
`getting-started` guide. Once you've completed that, you're ready to get
involved in the
https://admin.fedoraproject.org/accounts/group/view/fi-apprentice[Fedora
Infrastructure Apprentice] group.
=== Fedora Infrastructure Apprentice
The
https://admin.fedoraproject.org/accounts/group/view/fi-apprentice[Fedora
Infrastructure Apprentice] group in the Fedora Account System grants
read-only access to many Fedora infrastructure machines. This group is
used for new folks to look around at the infrastructure setup, check
machines and processes and see where they might like to contribute
moving forward. This also allows apprentices to examine and gather info
on problems, then propose solutions.
[NOTE]
.Note
====
This group will be pruned often of inactive folks who miss the monthly
email check-in on the
https://lists.fedoraproject.org/admin/lists/infrastructure.lists.fedoraproject.org/[infrastructure
mailing list]. There's nothing personal in this and you're welcome to
re-join later when you have more time, we just want to make sure the
group only has active members.
====
Members of the https://admin.fedoraproject.org/accounts/group/view/fi-apprentice[Fedora
Infrastructure Apprentice] group have ssh/shell access to many machines,
but no sudo rights or ability to commit to the
https://pagure.io/fedora-infra/ansible/[Ansible repository] (but they do
have read-only access). Apprentice can, however, contribute to the
infrastructure documentation by making a pull request to the
https://pagure.io/infra-docs/[infra-docs] repository. Access is via the
bastion.fedoraproject.org machine and from there to each machine. See
the `ssh-sop` for instructions on how to set up SSH. You can see a list
of hosts that allow apprentice access by using:
....
$ ./scripts/hosts_with_var_set -i inventory/ -o ipa_client_shell_groups=fi-apprentice
....
from a checkout of the https://pagure.io/fedora-infra/ansible/[Ansible
repository]. The Ansible repository is hosted on pagure.io at
`https://pagure.io/fedora-infra/ansible.git`.
=== Selecting a Ticket
Start by checking out the
https://pagure.io/fedora-infrastructure/issues?status=Open&tags=easyfix[easyfix
tickets]. Tickets marked with this tag are a good place for apprentices
to learn how things are setup, and also contribute a fix.
Since apprentices do not have commit access to the
https://pagure.io/fedora-infra/ansible/[Ansible repository], you should
make your change, produce a patch with `git diff`, and attach it to the
infrastructure ticket you are working on. It will then be reviewed.
[[sops]]
== Standard Operating Procedures
Below is a table of contents containing all the standard operating
procedures for Fedora Infrastructure applications. For information on
how to write a new standard operating procedure, consult the guide on
`develop-sops`.

View file

@ -0,0 +1,55 @@
= Infrastructure Git Repos
Setting up an infrastructure git repo - and the push mechanisms for the
magicks
We have a number of git repos (in /git on batcave) that manage files for
ansible, our docs, our common host info database and our kickstarts This
is a doc on how to setup a new one of these, if it is needed.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
Phoenix
Servers::
batcave01.phx2.fedoraproject.org, batcave-comm01.qa.fedoraproject.org
== Steps
Create the bare repo:
....
make $git_dir
setfacl -m d:g:$yourgroup:rwx -m d:g:$othergroup:rwx \
-m g:$yourgroup:rwx -m g:$othergroup:rwx $git_dir
cd $git_dir
git init --bare
....
edit up config - add these lines to the bottom:
....
[hooks]
# (normallysysadmin-members@fedoraproject.org)
mailinglist = emailaddress@yourdomain.org
emailprefix =
maildomain = fedoraproject.org
reposource = /path/to/this/dir
repodest = /path/to/where/you/want/the/files/dumped
....
edit up description - make it something useful:
....
cd hooks
rm -f *.sample
cp hooks from /git/infra-docs/hooks/ on batcave01 to this path
....
modify sudoers to allow users in whatever groups can commit to this repo
can run /usr/local/bin/syncgittree.sh w/o inputting a password

View file

@ -0,0 +1,115 @@
= Infrastructure Host Rename SOP
This page is intended to guide you through the process of renaming a
virtual node.
== Contents
[arabic]
. Introduction
. Finding out where the host is
. Preparation
. Renaming the Logical Volume
. Doing the actual rename
. Telling ansible about the new host
. VPN Stuff
== Introduction
Throughout this SOP, we will refer to the old hostname as $oldhostname
and the new hostname as $newhostname. We will refer to the Dom0 host
that the vm resides on as $vmhost.
If this process is being followed so that a temporary-named host can
replace a production host, please be sure to follow the
[51]Infrastructure retire machine SOP to properly decommission the old
host before continuing.
== Finding out where the host is
In order to rename the host, you must have access to the Dom0 (host) on
which the virtual server resides. To find out which host that is, log in
to batcave01, and run:
....
grep $oldhostname /var/log/virthost-lists.out
....
The first column of the output will be the Dom0 of the virtual node.
== Preparation
SSH to $oldhostname. If the new name is replacing a production box,
change the IP Address that it binds to, in
`/etc/sysconfig/network-scripts/ifcfg-eth0`.
Also change the hostname in `/etc/sysconfig/network`.
At this point, you can `sudo poweroff` $oldhostname.
Open an ssh session to $vmhost, and make sure that the node is listed as
`shut off`. If it is not, you can force it off with:
....
virsh destroy $oldhostname
....
== Renaming the Logical Volume
Find out the name of the logical volume (on $vmhost):
....
virsh dumpxml $oldhostname | grep 'source dev'
....
This will give you a line that looks like
`<source dev='/dev/VolGroup00/$oldhostname'/>` which tells you that
`/dev/VolGroup00/$oldhostname` is the path to the logical volume.
Run `/usr/sbin/lvrename` (the path that you found above) (the path that
you found above, with $newhostname at the end instead of $oldhostname)`
For example::::
/usr/sbin/lvrename /dev/VolGroup00/noc03-tmp /dev/VolGroup00/noc01
== Doing the actual rename
Now that the logical volume has been renamed, we can rename the host in
libvirt.
Dump the configuration of $oldhostname into an xml file, by running:
....
virsh dumpxml $oldhostname > $newhostname.xml
....
Open up $newhostname.xml, and change all instances of $oldhostname to
$newhostname.
Save the file and run:
....
virsh define $newhostname.xml
....
If there are no errors above, you can undefine $oldhostname:
....
virsh undefine $oldhostname
....
Power on $newhostname, with:
....
virsh start $newhostname
....
And remember to set it to autostart:
....
virsh autostart $newhostname
....
== VPN Stuff
TODO

View file

@ -0,0 +1,75 @@
= Infrastructure/SOP/Raid Mismatch Count
What to do when a raid device has a mismatch count
== Contents
[arabic]
. Contact Information
. Description
. Correction
____
[arabic]
. Step 1
. Step 2
____
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
All
Servers::
Physical hosts
Purpose::
Provides database connection to many of our apps.
== Description
In some situations a raid device may indicate there is a count mismatch
as listed in:
....
/sys/block/mdX/md/mismatch_cnt
....
Anything other than 0 is considered not good. Though if the number is
low it's probably nothing to worry about. To correct this situation try
the directions below.
== Correction
More than anything these steps are to A) Verify there is no problem and
B) make the error go away. If step 1 and step 2 don't correct the
problems, PROCEED WITH CAUTION. The steps below, however, should be
relatively safe.
Issue a repair (replace mdX with the questionable raid device):
....
echo repair > /sys/block/mdX/md/sync_action
....
Depending on the size of the array and disk speed this can take a while.
Watch the progress with:
....
cat /proc/mdstat
....
Issue a check. It's this check that will reset the mismatch count if
there are no problems. Again replace mdX with your actual raid device.:
....
echo check > /sys/block/mdX/md/sync_action
....
Just as before, you can watch the progress with:
....
cat /proc/mdstat
....

View file

@ -0,0 +1,113 @@
= Infrastructure Yum Repo SOP
In some cases RPM's in Fedora need to be rebuilt for the Infrastructure
team to suit our needs. This repo is provided to the public (except for
the RHEL RPMs). Rebuilds go into this repo which are stored on the
netapp and shared via the proxy servers after being built on koji.
For basic instructions, read the standard documentation on Fedora wiki:
- https://fedoraproject.org/wiki/Using_the_Koji_build_system
This document will only outline the differences between the "normal"
repos and the infra repos.
== Contents
[arabic]
. Contact Information
. Building an RPM
. Tagging an existing build
. Promoting a staging build
. Koji package list
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Location::
PHX [53] https://kojipkgs.fedoraproject.org/repos-dist/
Servers::
koji batcave01 / Proxy Servers
Purpose::
Provides infrastructure repo for custom Fedora Infrastructure rebuilds
== Building an RPM
Building an RPM for Infrastructure is significantly easier then building
an RPM for Fedora. Basically get your SRPM ready, then submit it to koji
for building to the $repo-infra target. (e.g. epel7-infra).
Example:
....
rpmbuild --define "dist .el7.infra" -bs test.spec
koji build epel7-infra test-1.0-1.el7.infra.src.rpm
....
[NOTE]
.Note
====
Remember to build it for every dist / arch you need to deploy it on.
====
After it has been built, you will see it's tagged as
$repo-infra-candidate, this means that it is a candidate for being
signed. The automatic signing system will pick it up and sign the
package for you without any further intervention. You can track when
this is done by checking the build info: when it is moved from
$repo-infra-candidate to $repo-infra-stg, it has been signed. You can
check this on the web interface (look under "Tags"), or via:
....
koji buildinfo test-1.0-1.el7.infra
....
After the build has been tagged into the $repo-infra-stg tag,
tag2distrepo will automatically create a distrepo task, which will
update the repository so that the package is available on staging hosts.
After this time, you can yum clean all and then install the packages via
yum install or yum update.
== Tagging existing builds
If you already have a real build and want to use it in the
infrastructure before it has landed in stable, you can tag it into the
respective infra-candidate tag. For example, if you have an epel7 build
of test2-1.0-1.el7.infra, run:
....
koji tag epel7-infra-candidate test2-1.0-1.el7.infra
....
And then the same autosigning and repogen from the previous section
applies.
== Promoting a staging build
After getting autosigned, builds will land in the respective infra-stg
tag, for example epel7-infra-stg. These tags go into repos that are
enabled on staging machines, but not on production. If you decide, after
testing, that the build is good enough for production, you can promote
it by running:
....
koji move epel7-infra-stg epel7-infra test2-1.0-1.el7.infra
....
== Koji package list
If you try to build a package into the infra tags, and koji says
something like: BuildError: package test not in list for tag
epel7-infra-candidate That means that the package has not been added to
the list for building in that particular tag. Either add the package to
the respective Fedora/EPEL branches (this is the preferred method, since
we should always aim to get everything packaged for Fedora/EPEL), or add
the package to the listing for the respective tag.
To add package to infra tag, run:
....
koji add-pkg $tag $package --owner=$user
....

View file

@ -0,0 +1,44 @@
= Infrastructure retire machine SOP
== Introduction
When a machine (be it virtual instance or real physical hardware is
decommisioned, a set of steps must be followed to ensure that the
machine is properly removed from the set of machines we manage and
doesn't cause problems down the road.
== Retire process
[arabic]
. {blank}
+
Ensure that the machine is no longer used for anything. Use git-grep,::
stop services, etc.
. {blank}
+
Remove the machine from ansible. Make sure you not only remove the
main::
machine name, but also any aliases it might have (or move them to an
active server if they are active services. Make sure to search for the
IP address(s) of the machine as well. Ensure dns is updated to remove
the machine.
. {blank}
+
Remove the machine from any labels in hardware devices like consoles
or::
the like.
. Revoke the ansible cert for the machine.
. {blank}
+
Move the machine xml defintion to ensure it does NOT start on boot.
You::
can move it to 'name-retired-YYYY-MM-DD'.
. {blank}
+
Ensure any backend storage the machine was using is freed or renamed
to::
name-retired-YYYY-MM-DD
== TODO
fill in commands

View file

@ -0,0 +1,140 @@
= Infrastructure/SOP/Yubikey
This document describes how yubikey authentication works
== Contents
[arabic]
. Contact Information
. User Information
. Host Admins
+
____
[arabic]
.. pam_yubico
____
. Server Admins
+
____
[arabic]
.. Basic architecture
.. ykval
.. ykksm
.. Physical Yubikey info
____
. fas integration
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
Phoenix
Servers::
fas*, db02
Purpose::
Provides yubikey authentication in Fedora
== Config Files
* `/etc/httpd/conf.d/yk-ksm.conf`
* `/etc/httpd/conf.d/yk-val.conf`
* `/etc/ykval/ykval-config.php`
* `/etc/ykksm/ykksm-config.php`
* `/etc/fas.cfg`
== User Information
See [57]Infrastruture/Yubikey
== Host Admins
pam_yubico
Generated from fas, the /etc/yubikeyid works like a authroized_keys file
and maps valid keys to users. It is downloaded from FAS:
[58]https://admin.fedoraproject.org/accounts/yubikey/dump
== Server Admins
=== Basic architecture
Yubikey authentication takes place in 3 basic phases.
[arabic]
. User presses yubikey which generates a one time password
. {blank}
+
The one time password makes its way to the yk-val application which::
verifies it is not a replay
. {blank}
+
yk-val passes that otp on to the yk-ksm application which verifies the::
key itself is a valid key
If all of those steps succeed, the ykval application sends back an OK
and authentication is considered successful. The two applications are
defined below, if either of them is unavailable, yubikey authentication
will fail.
==== ykval
Database: db02:ykval
The database contains 3 tables. clients: just a valid client. These are
not users, these are systems able to authenticate against ykval. In our
case Fedora is the only client so there's just one entry here queue:
Used for distributed setups (we don't do this) yubikeys: maps which
yubikey belongs to which user
ykval is installed on fas* and is located at:
[59]http://localhost/yk-val/verify
Purpose: Is to map keys to users and protect against replay attacks
==== ykksm
Database: db02:ykksm
The database contains one table: yubikeys: maps who created keys, what
key was created, when, and the public name and serial number, whether
its active, etc.
ykksm is installed on fas* at [60]http://localhost/yk-ksm
Purpose: verify if a key is a valid known key or not. Nothing contacts
this service directly except for ykval. This should be considered the
“high security” portion of the system as access to this table would
allow users to make their own yubikeys.
==== Physical Yubikey info
The actual yubikey contains information to generate a one time password.
The important bits to know are the begining of the otp contains the
identifier of the key (used similar to how ssh uses authorized_keys) and
note the rest of it contains lots of bits of information, including a
serial incremental.
Sample key: `ccccfcdaivjrvdhvzfljbbievftnvncljhibkulrftt`
Breaking this up, the first 12 characters are the identifier. This can
be considered 'public'
ccccfcdaivj rvdhvzfljbbievftnvncljhibkulrftt
The second half is the otp part.
== fas integration
Fas integration has two main parts. First is key generation, the next is
activation. The fas-plugin-yubikey contains the bits for both, and
verification. Users call on this page to generate the key info:
[61]https://admin.fedoraproject.org/accounts/yubikey/genkey
The fas password field automatically detects whether someone is using a
otp or a regular password. It then sends otp requests to yk-val for
verification.

View file

@ -0,0 +1,225 @@
= Ipsilon Infrastructure SOP
== Contents
[arabic]
. Contact Information
. Description
. Known Issues
. ReStarting
. Configuration
. {blank}
+
Common actions::
6.1. Registering OpenID Connect Scopes 6.2. Generate an OpenID Connect
token 6.3. Create OpenID Connect secrets for apps
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Primary upstream contact::
Patrick Uiterwijk - FAS: puiterwijk
Backup upstream contact::
Simo Sorce - FAS: simo (irc: simo) Howard Johnson - FAS: merlinthp
(irc: MerlinTHP) Rob Crittenden - FAS: rcritten (irc: rcrit)
Location::
Phoenix
Servers::
ipsilon01.phx2.fedoraproject.org ipsilon02.phx2.fedoraproject.org
ipsilion01.stg.phx2.fedoraproject.org.
Purpose::
Ipsilon is our central authentication service that is used to
authenticate users agains FAS. It is seperate from FAS.
== Description
Ipsilon is our central authentication agent that is used to authenticate
users agains FAS. It is seperate from FAS. The only service that is not
using this currently is the wiki. It is a web service that is presented
via httpd and is load balanced by our standard haproxy setup.
== Known issues
No known issues at this time. There is not currently a logout option for
ipsilon, but it is not considered an issue. If group memberships are
updated in ipsilon the user will need to wait a few minutes for them to
replicate to the all the systems.
== Restarting
To restart the application you simply need to ssh to the servers for the
problematic region and issue an 'service httpd restart'. This should
rarely be required.
== Configuration
Configuration is handled by the ipsilon.yaml playbook in Ansible. This
can also be used to reconfigure application, if that becomes nessecary.
== Common actions
This section describes some common configuration actions.
=== OpenID Connect Scope Registration
As documented on
https://fedoraproject.org/wiki/Infrastructure/Authentication,
application developers can request their own scopes. When a request for
this comes in, look in ansible/roles/ipsilon/files/oidc_scopes/ and copy
an example module. Copy this to a new file, so we have a file per scope
set. Fill in the information:
____
* name is an Ipsilon-internal name. This should not include any spaces
* display_name is the name that is displayed to the category of scopes
to the user
* scopes is a dictionary with the full scope identifier (with namespace)
as keys. The values are dicts with the following keys:
+
____
** display_name: The complete display name for this scope. This is what
the user gets shown to accept/reject
** claims: A list of additional "claims" (pieces of user information) an
application will get when the user
** consents to this scope. For most scopes, this will be the empty list.
____
____
In ansible/roles/ipsilon/tasks/main.yml, add the name of the new file
(without .py) to the with_items of "Copy OpenID Connect scope
registrations"). To enable, open
ansible/roles/ipsilon/templates/configuration.conf, and look for the
lines starting with "openidc enabled extensions". Add the name of the
plugin (in the "name" field of the file) to the environment this
scopeset has been requested for. Run the ansible ipsilon.yml playbook.
=== Generate an OpenID Connect token
There is a handy script in the Ansible project under
`scripts/generate-oidc-token` that can help you generate an OIDC token.
It has a self-explanatory `--help` argument, and it will print out some
SQL that you can run against Ipsilon's database, as well as the token
that you seek.
The `SERVICE_NAME` (the required positional argument) is the name of the
application that wants to use the token to perform actions against
another service.
To generate the scopes, you can visit our link:[authentication] docs and
find the service you want the token to be used for. Each service has a
base namespace (a URL) and one or more scopes for that namespace. To
form a scope for this script, you concatenate the namespace of the
service with the scope you want to grant the service. You can provide
the script the -s flag multiple times if you want to grant more than one
scope to the same token.
As an example, to give Bodhi access to create waivers in WaiverDB, you
can see that the base namespace is
`https://waiverdb.fedoraproject.org/oidc/` and that there is a
`create-waiver` scope. You can run this to generate Ipsilon SQL and a
token with that scope:
....
[bowlofeggs@batcave01 ansible][PROD]$ ./scripts/generate-oidc-token bodhi -e 365 -s https://waiverdb.fedoraproject.org/oidc/create-waiver
Run this SQL against Ipsilon's database:
--------START CUTTING HERE--------
BEGIN;
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','username','bodhi@service');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','security_check','-ptBqVLId-kUJquqkVyhvR0DbDULIiKp1eqbXqG_dfVK9qACU6WwRBN3-7TRfoOn');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','client_id','bodhi');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','expires_at','1557259744');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','type','Bearer');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','issued_at','1525723744');
insert into token values ('2a5f2dff-4e93-4a8d-8482-e62f40dce046','scope','["openid", "https://someapp.fedoraproject.org/"]');
COMMIT;
-------- END CUTTING HERE --------
Token: 2a5f2dff-4e93-4a8d-8482-e62f40dce046_-ptBqVLId-kUJquqkVyhvR0DbDULIiKp1eqbXqG_dfVK9qACU6WwRBN3-7TRfoOn
....
Once you have the SQL, you can run it against Ipsilon's database, and
you can provide the token to the application through some secure means
(such as putting into Ansible's secrets and telling the requestor the
Ansible variable they can use to access it.)
=== Create OpenID Connect secrets for apps
Application wanting to use OpenID Connect need to register against our
OpenID Connect server (Ipsilon). Since we do not allow self-registration
(except on iddev.fedorainfracloud.org) for obvious reasons, the secrets
need to be created and configured per application and environment
(production vs staging).
To do so: - Go to the private ansible repository. - Edit the file:
`files/ipsilon/openidc.{{env}}.static` - At the bottom of this file, add
the information concerning the application you are adding. This will
look something like:
____
....
fedocal client_name="fedocal"
fedocal client_secret="<long random string>"
fedocal redirect_uris=["https://calendar.stg.fedoraproject.org/oidc_callback"]
fedocal client_uri="https://calendar.stg.fedoraproject.org/"
fedocal ipsilon_internal={"type":"static","client_id":"fedocal","trusted":true}
fedocal contacts=["admin@fedoraproject.org"]
fedocal client_id=null
fedocal policy_uri="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
fedocal grant_types="authorization_code"
fedocal response_types="code"
fedocal application_type="web"
fedocal subject_type="pairwise"
fedocal logo_uri=null
fedocal tos_uri=null
fedocal jwks_uri=null
fedocal jwks=null
fedocal sector_identifier_uri=null
fedocal request_uris=[]
fedocal require_auth_time=null
fedocal token_endpoint_auth_method="client_secret_post"
fedocal id_token_signed_response_alg="RS256"
fedocal request_object_signing_alg="none"
fedocal initiate_login_uri=null
fedocal default_max_age=null
fedocal default_acr_values=null
fedocal client_secret_expires_at=0
....
____
In most of situation, only the first 5 lines (up to `ipsilon_internal`)
will change. If the application is not using flask-oidc or is not
maintained by the Fedora Infrastructure the first 11 lines (up to
`application_type`) may change. The remaining lines require a deeper
understanding of OpenID Connect and Ipsilon.
[NOTE]
.Note
====
`client_id` in `ipsilon_internal` must match the begining of the line,
and the `client_id` field must either match the begining of the line or
be `null` as in the example here.
====
[NOTE]
.Note
====
In our OpenID connect server, OIDC.user_getfield('nickname') will return
the FAS username, which we know from FAS is unique. However, not all
OpenID Connect servers enforce this constraint, so the application code
may rely on the `sub` which is the only key that is sure to be unique.
If the application relies on `sub` and wants `sub` to return the FAS
username, then the configuration should be adjusted with:
`subject_type="public"`.
====
After adjusting this file, you will need to make the `client_secret`
available to the application via ansible, for this simply add it to
`vars.yml` as we do for the other private variables and provide the
variable name to the person who requested it.
Finally, commit and push the changes to both files and run the
`ipsilon.yml` playbook.

View file

@ -0,0 +1,149 @@
= iSCSI
iscsi allows one to share and mount block devices using the scsi
protocol over a network. Fedora currently connects to a netapp that has
an iscsi export.
== Contents
[arabic]
. Contact Information
. Typical uses
. iscsi basics
____
[arabic]
. Terms
. iscsi's basic login / logout procedure is
____
[arabic, start=4]
. Loggin in
. Logging out
. Important note about creating new logical volumes
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
Phoenix
Servers::
xen[1-15]
Purpose::
Provides iscsi connectivity to our netapp.
== Typical uses
The best uses for Fedora are for servers that are not part of a farm or
live replicated. For example, we wouldn't put app1 on the iscsi share
because we don't gain anything from it. Shutting down app1 to move it
isn't an issue because app1 is part of our application server farm.
noc1, however, is not replicated. It's a stand alone box that, at best,
would have a non-live failover. By placing this host on an iscsi share,
we can make it more highly available as it allows us to move that box
around our virtualization infrastructure without rebooting it or even
taking it down.
== iscsi basics
=== Terms
* initiator means client
* target means server
* swab means mop
* deck means floor
=== iscsi's basic login / logout procedure is
[arabic]
. {blank}
+
Notify your client that a new target is available (similar to editing::
/etc/fstab for a new nfs mount)
. Login to the iscsi target (similar to running "mount /my/nfs"
. Logout from the iscsi target (similar to running "umount /my/nfs"
. {blank}
+
Delete the target from the client (similar to removing the nfs mount::
from /etc/fstab)
==== Logging in
Most mounts are covered by ansible so this should be automatic. In the
event that something goes wrong though, the best way to fix this is:
* Notify the client of the target:
+
....
iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 -o new
....
* Log in to the new target:
+
....
iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --login
....
* Scan and activate lvm:
+
....
pvscan
vgscan
vgchange -ay xenGuests
....
Once this is done, one should be able to run "lvs" to see the logical
volumes
==== Logging out
Logging out isn't normally needed, for example rebooting a machine
automatically logs the initiator out. Should a problem arise though here
are the steps:
* Disable the logical volume:
+
....
vgchange -an xenGuests
....
* log out:
+
....
iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --logout
....
[NOTE]
.Note
====
`Cannot deactivate volume group`
If the vgchange command fails with an error about not being able to
deactivate the volume group, this means that one of the logical volumes
is still in use. By running "lvs" you can get a list of volume groups.
Look in the Attr column. There are 6 attrs listed. The 5th column
usually has a '-' or an 'a'. 'a' means its active, - means it is not. To
the right of that (the last column) you will see an '-' or an 'o'. If
you see an 'o' that means that logical volume is still mounted and in
use.
====
[IMPORTANT]
.Important
====
Note about creating new logical volumes
At present we do not have logical volume locking on the xen servers.
This is dangerous and being worked on. Basically when you create a new
volume on a host, you need to run:
....
pvscan
vgscan
lvscan
....
on the other virtualization servers.
====

View file

@ -0,0 +1,40 @@
= Jenkins Fedmsg SOP
Send information about Jenkins builds to fedmsg.
== Contact Information
Owner::
Ricky Elrod, Fedora Infrastructure Team
Contact::
#fedora-apps
== Reinstalling when it disappears
For an as-of-yet unknown reason, the plugin sometimes seems to
disappear, though it still shows as "installed" on Jenkins.
To re-install it, grab [.title-ref]#fedmsg.hpi# from
[.title-ref]#/srv/web/infra/bigfiles/jenkins#. Go to the Jenkins web
interface and log in. Click [.title-ref]#Manage Jenkins# ->
[.title-ref]#Manage Plugins# -> [.title-ref]#Advanced#. Upload the
plugin and on the page that comes up, check the box to have Jenkins
restart when running jobs are finished.
== Configuration Values
These are written here in case the Jenkins configuration ever gets lost.
This is how to configure the jenkins-fedmsg-emit plugin.
Assume the plugin is already installed.
Go to "Configure Jenkins" -> "System Configuration"
Towards the bottom, look for "Fedmsg Emitter"
Values:
Signing: Checked Fedmsg Endpoint: tcp://209.132.181.16:9941 Environment
Shortname: prod Certificate File:
/etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.crt Keystore File:
/etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.key

View file

@ -0,0 +1,60 @@
= Kerneltest-harness SOP
The kerneltest-harness is the web application used to gather and present
statistics about kernel test results.
== Contents
[arabic]
. Contact Information
. Documentation Links
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin
Location::
https://apps.fedoraproject.org/kerneltest/
Servers::
kerneltest01, kerneltest01.stg
Purpose::
Provide a system to gather and present kernel tests results
== Add a new Fedora release
* Login
* On the front page, in the menu on the left side, if there is a
[.title-ref]#Fedora Rawhide# release, click on [.title-ref]#(edit)#.
* Bump the [.title-ref]#Release number# on [.title-ref]#Fedora Rawhide#
to avoid conflicts with the new release you're creating
* Back on the index page, click on [.title-ref]#New release#
* Complete the form:
+
Release number::
This would be the integer version of the Fedora release, for example
24 for Fedora 24.
Support::
The current status of the Fedora release
+
** Rawhide for Fedora Rawhide
** Test for branched release
** Release for released Fedora
** Retired for retired release of Fedora
== Upload new test results
The kernel tests are available on the
https://git.fedorahosted.org/cgit/kernel-tests.git/[kernel-test] git
repository.
Once ran with [.title-ref]#runtests.sh#, you can upload the resulting
file either using [.title-ref]#fedora_submit.py# or the UI.
If you choose the UI the steps are simply:
* Login
* Click on [.title-ref]#Upload# in the main menu on the top
* Select the result file generated by running the tests
* Submit

View file

@ -0,0 +1,171 @@
= Kickstart Infrastructure SOP
Kickstart scripts provide our install infrastructure. We have a plethora
of different kickstarts to best match the system you are trying to
install.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main
Location::
Everywhere we have machines.
Servers::
batcave01 (stores kickstarts and install media)
Purpose::
Provides our install infrastructure
== Introduction
Our kickstart infrastructure lives on batcave01. All install media and
kickstart scripts are located on batcave01. Because the RHEL binaries
are not public we have these bits blocked. You can add needed IPs to
(from batcave01):
....
ansible/roles/batcave/files/allows
....
== Physical Machine (kvm virthost)
[NOTE]
.Note
====
PXE Booting: If PXE booting just follow the prompt after doing the pxe boot (most
hosts will pxeboot via console hitting f12).
====
=== Prep
This only works on an already booted box, many boxes at our colocations
may have to be rebuilt by the people in those locations first. Also make
sure the IP you are about to boot to install from is allowed to our IP
restricted infrastructure.fedoraproject.org as noted above (in
Introduction).
Download the vmlinuz and initrd images.
for a rhel6 install:
....
wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/vmlinuz \
-O /boot/vmlinuz-install
wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/initrd.img \
-O /boot/initrd-install.img
grubby --add-kernel=/boot/vmlinuz-install \
--args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-6-nohd \
repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/ \
ksdevice=link ip=$IP gateway=$GATEWAY netmask=$NETMASK dns=$DNS" \
--title="install el6" --initrd=/boot/initrd-install.img
....
for a rhel7 install:
....
wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/vmlinuz -O /boot/vmlinuz-install
wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/initrd.img -O /boot/initrd-install.img
....
For phx2 hosts:
....
grubby --add-kernel=/boot/vmlinuz-install \
--args="ks=http://10.5.126.23/repo/rhel/ks/hardware-rhel-7-nohd \
repo=http://10.5.126.23/repo/rhel/RHEL7-x86_64/ \
net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \
ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \
--title="install el7" --initrd=/boot/initrd-install.img
....
(You will need to setup the br1 device if any after install)
For non phx2 hosts:
....
grubby --add-kernel=/boot/vmlinuz-install \
--args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-ext \
repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/ \
net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \
ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \
--title="install el7" --initrd=/boot/initrd-install.img
....
Fill in the br0 ip, gateway, etc
The default here is to use the hardware-rhel-7-nohd config which
requires you to connect via VNC to the box and configure its drives. If
this is a new machine or you are fine with blowing everything away, you
can instead use
https://infrastructure.fedoraproject.org/rhel/ks/hardware-rhel-6-minimal
as your kickstart
If you know the number of hard drives the system has there are other
kickstarts which can be used.
2 disk system::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk
or external::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk-ext
4 disk system::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk
or external::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk-ext
6 disk system::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk
or external::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk-ext
8 disk system::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk
or external::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk-ext
10 disk system::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk
or external::::
ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk-ext
Double and triple check your configuration settings (On RHEL-6
`cat /boot/grub/menu.lst` and on RHEL-7 `cat /boot/grub2/grub.cfg`),
especially your IP information. In places like ServerBeach not all hosts
have the same netmask or gateway. Once everything you are ready to run
the commands to get it set up to boot next boot.
RHEL-6:
....
echo "savedefault --default=0 --once" | grub --batch
shutdown -r now
....
RHEL-7:
....
grub2-reboot 0
shutdown -r now
....
=== Installation
Once the box logs you out, start pinging the IP address. It will
disappear and come back. Once you can ping it again, try to open up a
VNC session. It can take a couple of minutes after the box is back up
for it to actually allow vnc sessions. The VNC password is in the
kickstart script on batcave01:
....
grep vnc /mnt/fedora/app/fi-repo/rhel/ks/hardware-rhel-7-nohd
vncviewer $IP:1
....
If using the standard kickstart script, one can watch as the install
completes itself, there should be no need to do anything. If using the
hardware-rhel-6-nohd script, one will need to configure the drives. The
password is in the kickstart file in the kickstart repo.
=== Post Install
Run ansible on the box asap to set root passwords and other security
features. Don't leave a newly installed box sitting around.

View file

@ -0,0 +1,33 @@
This SOP documents how to archive Fedora EOL'd builds from the DEFAULT
volume to archived volume.
Before archiving the builds, identify if any of the EOL'd release builds
are still being used in the current releases. For example. to test if
f28 builds are still being using in f32, use:
$ koji list-tagged f32 | grep fc28
Tag all these builds to koji's do-not-archive-yet tag, so that they wont
be archived. To do that, first add the packages to the
do-not-archive-tag
$ koji add-pkg do-not-archive-yet --owner <username> pkg1 pkg2 ...
Then tags the builds to do-not-archive-yet tag
$ koji tag-build do-not-archive-yet build1 build2 ...
Then update the archive policy which is available in releng repo
(https://pagure.io/releng/blob/master/f/koji-archive-policy)
Run the following from compose-x86-01.phx2.fedoraproject.org
$ cd $ wget https://pagure.io/releng/raw/master/f/koji-archive-policy $
git clone https://pagure.io/koji-tools/ $ cd koji-tools $
./koji-change-volumes -p compose_koji -v ~/archive-policy
In any case, if you need to move a build back to DEFAULT volume
$ koji add-pkg do-not-archive-yet --owner <username> pkg1 $ koji
tag-build do-not-archive-yet build1 $ koji set-build-volume DEFAULT
<n-v-r>

View file

@ -0,0 +1,118 @@
= Setup Koji Builder SOP
== Contents
* Setting up a new koji builder
* Resetting/installing an old koji builder
== Builder Setup
Setting up a new koji builder involves a goodly number of steps:
=== Network Overview
[arabic]
. First get an instance spun up following the kickstart sop.
. {blank}
+
Define a hostname for it on the 125 network and a $hostname-nfs name::
for it on the .127 network.
. make sure the instance has 2 network connections:
* eth0 should be on the .125 network
* eth1 should be on the .127 network
+
____
For VM eth0 should be on br0, eth1 on br1 on the vmhost.
____
=== Setup Overview
* install the system as normal:
+
....
virt-install -n $builder_fqdn -r $memsize \
-f $path_to_lvm --vcpus=$numprocs \
-l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \
-x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \
ip=$ip netmask=$netmask gateway=$gw dns=$dns \
console=tty0 console=ttyS0" \
--network=bridge=br0 --network=bridge=br1 \
--vnc --noautoconsole
....
* run python `/root/tmp/setup-nfs-network.py` this should print out the
-nfs hostname that you made above
* change root pw
* disable selinux on the machine in /etc/sysconfig/selinux
* reboot
* setup ssl cert into private/builders - use fqdn of host as DN
** login to fas01 as root
** `cd /var/lib/fedora-ca`
** `./kojicerthelper.py normal --outdir=/tmp/ \ --name=$fqdn_of_the_new_builder --cadir=. --caname=Fedora`
** info for the cert should be like this:
+
....
Country Name (2 letter code) [US]:
State or Province Name (full name) [North Carolina]:
Locality Name (eg, city) [Raleigh]:
Organization Name (eg, company) [Fedora Project]:
Organizational Unit Name (eg, section) []:Fedora Builders
Common Name (eg, your name or your servers hostname) []:$fqdn_of_new_builder
Email Address []:buildsys@fedoraproject.org
....
** scp the file in `/tmp/$\{fqdn}_key_and_cert.pem` over to batcave01
** put file in the private repo under `private/builders/$dn}.pem`
** `git add` + `git commit`
** `git push`
* run `./sync-hosts` in infra-hosts repo; `git commit; git push`
* as a koji admin run:
+
....
koji add-host $fqdnr i386 x86_64
(note: those are yum basearchs on the end - season to taste)
....
=== Resetting/installing an old koji builder
* disable the builder in koji (ask a koji admin)
* halt the old system (halt -p)
* undefine the vm instance on the buildvmhost:
+
....
virsh undefine $builder_fqdn
....
* reinstall it - from the buildvmhost run:
+
....
virt-install -n $builder_fqdn -r $memsize \
-f $path_to_lvm --vcpus=$numprocs \
-l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \
-x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \
ip=$ip netmask=$netmask gateway=$gw dns=$dns \
console=tty0 console=ttyS0" \
--network=bridge=br0 --network=bridge=br1 \
--vnc --noautoconsole
....
* watch install via vnc:
+
....
vncviewer -via bastion.fedoraproject.org $builder_fqdn:1
....
* when the install finishes:
** start the instance on the buildvmhost:
+
....
virsh start $builder_fqdn
....
** set it to autostart on the buildvmhost:
+
....
virsh autostart $builder_fqdn
....
* when the guest comes up
** login via ssh using the temp root password
** python /root/tmp/setup-nfs-network.py
** change root password
** disable selinux in /etc/sysconfig/selinux
** reboot
** ask a koji admin to re-enable the host

View file

@ -0,0 +1,224 @@
= Koji Infrastructure SOP
[NOTE]
.Note
====
We are transitioning from two buildsystems, koji for Fedora and plague
for EPEL, to just using koji. This page documents both.
====
Koji and plague are our buildsystems. They share some of the same
machines to do their work.
== Contents
[arabic]
. Contact Information
. Description
. Add packages into Buildroot
. Troubleshooting and Resolution
[arabic]
. Restarting Koji
. kojid won't start or some builders won't connect
. OOM (Out of Memory) Issues
[arabic]
. Increase Memory
. Decrease weight
[arabic, start=4]
. Disk Space Issues
{empty}5. Should there be mention of being sure filesystems in chroots
are unmounted before you delete the chroots?
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-build group
Persons::
mbonnet, dgilmore, f13, notting, mmcgrath, SmootherFrOgZ
Location::
Phoenix
Servers::
* koji.fedoraproject.org
* buildsys.fedoraproject.org
* xenbuilder[1-4]
* hammer1, ppc[1-4]
Purpose::
Build packages for Fedora.
== Description
Users submit builds to koji.fedoraproject.org or
buildsys.fedoraproject.org. From there it gets passed on to the
builders.
[IMPORTANT]
.Important
====
At present plague and koji are unaware of each other. A result of this
may be an overloaded builder. A easy fix for this is not clear at this
time
====
== Add packages into Buildroot
Some contributors may have the need to build packages against fresh
built packages which are not into buildroot yet. Koji has override tags
as a Inheritance to the build tag in order to include them into
buildroot which can be set by:
....
koji tag-pkg dist-$release-override <package_nvr>
....
== Troubleshooting and Resolution
=== Restarting Koji
If for some reason koji needs to be restarted, make sure to restart the
koji master first, then the builders. If the koji master has been down
for a short enough time the builders do not need to be restarted.:
....
service httpd restart
service kojira restart
service kojid restart
....
[IMPORTANT]
.Important
====
If postgres becomes interrupted in some way, koji will need to be
restarted. As long as the koji master daemon gets restarted the builders
should reconnect automatically. If the db server has been restarted and
the builders don't seem to be building, restart their daemons as well.
====
=== kojid won't start or some builders won't connect
In the event that some items are able to connect to koji while some are
not, please make sure that the database is not filled up on connections.
This is common if koji crashes and the db connections aren't properly
cleared. Upon restart many of the connections are full so koji cannot
reconnect. Clearing old connections is easy, guess about how long it the
new koji has been up and pick a number of minutes larger then that and
kill those queries. From db3 as postgres run:
....
echo "select procpid from pg_stat_activity where usename='koji' and now() - query_start \
>= '00:40:00' order by query_start;" | psql koji | grep "^ " | xargs kill
....
=== OOM (Out of Memory) Issues
Out of memory issues occur from time to time on the build machines.
There are a couple of options for correction. The first fix is to just
restart the machine and hope it was a one time thing. If the problem
continues please choose from one of the following options.
==== Increase Memory
The xen machines can have memory increased on their corresponding xen
hosts. At present this is the table:
[width="34%",cols="44%,56%",]
|===
|xen3 |xenbuilder1
|xen4 |xenbuilder2
|disabled |xenbuilder3
|xen8 |xenbuilder4
|===
Edit `/etc/xen/xenbuilder[1-4]` and add more memory.
==== Decrease weight
Each builder has a weight as to how much work can be given to it.
Presently the only way to alter weight is actually changing the database
on db3:
....
$ sudo su - postgres
-bash-2.05b$ psql koji
koji=# select * from host limit 1;
id | user_id | name | arches | task_load | capacity | ready | enabled
---+---------+------------------------+-----------+-----------+----------+-------+---------
6 | 130 | ppc3.fedora.redhat.com | ppc ppc64 | 1.5 | 4 | t | t
(1 row)
koji=# update host set capacity=2 where name='ppc3.fedora.redhat.com';
....
Simply update capacity to a lower number.
=== Disk Space Issues
The builders use a lot of temporary storage. Failed builds also get left
on the builders, most should get cleaned but plague does not. The
easiest thing to do is remove some older cache dirs.
Step one is to turn off both koji and plague:
....
/etc/init.d/plague-builder stop
/etc/init.d/kojid stop
....
Next check to see what file system is full:
....
df -h
....
[IMPORTANT]
.Important
====
If any one of the following directories is full, send an outage
notification as outlined in: [62]Infrastructure/OutageTemplate to the
fedora-infrastructure-list and fedora-devel-list, then contact Mike
McGrath
* /mnt/koji
* /mnt/ntap-fedora1/scratch
* /pub/epel
* /pub/fedora
====
Typically just / will be full. The next thing to do is determine if
we have any extremely large builds left on the builder. Typical
locations include /var/lib/mock and /mnt/build (/mnt/build actually is
on the local filesystem):
....
du -sh /var/lib/mock/* /mnt/build/*
....
`/var/lib/mock/dist-f8-build-10443-1503`::
classic koji build
`/var/lib/mock/fedora-6-ppc-core-57cd31505683ef1afa533197e91608c5a2c52864`::
classic plague build
If nothing jumps out immediately, just start deleting files older than
one week. Once enough space has been freed start koji and plague back
up:
....
/etc/init.d/plague-builder start
/etc/init.d/kojid start
....
=== Unmounting
[WARNING]
.Warning
====
Should there be mention of being sure filesystems in chroots are
unmounted before you delete the chroots?
Res ipsa loquitur.
====

View file

@ -0,0 +1,208 @@
= Koschei SOP
Koschei is a continuous integration system for RPM packages. Koschei
runs package scratch builds after dependency change or after time elapse
and reports package buildability status to interested parties.
Production instance: https://apps.fedoraproject.org/koschei Staging
instance: https://apps.stg.fedoraproject.org/koschei
== Contact Information
Owner::
mizdebsk, msimacek
Contact::
#fedora-admin
Location::
Fedora Cloud
Purpose::
continuous integration system
== Deployment
Koschei deployment is managed by two Ansible playbooks:
....
sudo rbac-playbook groups/koschei-backend.yml
sudo rbac-playbook groups/koschei-web.yml
....
== Description
Koschei is deployed on two separate machines - `koschei-backend` and
`koschei-web`
Frontend (`koschei-web`) is a Flask WSGi application running with httpd.
It displays information to users and allows editing package groups and
changing priorities.
Backend (`koschei-backend`) consists of multiple services:
* `koschei-watcher` - listens to fedmsg events for complete builds and
changes build states in the database
* `koschei-repo-resolver` - resolves package dependencies in given repo
using hawkey and compares them with previous iteration to get a
dependency diff. It resolves all packages in the newest repo available
in Koji. The output is a base for scheduling new builds
* `koschei-build-resolver` - resolves complete builds in the repo in
which they were done in Koji. Produces the dependency differences
visible in the frontend
* `koschei-scheduler` - schedules new builds based on multiple criteria:
** dependency priority - dependency changes since last build valued by
their distance in the dependency graph
** manual and static priorities - set manually in the frontend. Manual
priority is reset after each build, static priority persists
** time priority - time elapsed since the last build
* `koschei-polling` - polls the same types of events as koschei-watcher
without reliance on fedmsg. Additionaly takes care of package list
synchronization and other regularly executed tasks
== Configuration
Koschei configuration is in `/etc/koschei/config-backend.cfg` and
`/etc/koschei/config-frontend.cfg`, and is merged with the default
configuration in `/usr/share/koschei/config.cfg` (the ones in `/etc`
overrides the defaults in `/usr`). Note the merge is recursive. The
configuration contains all configurable items for all Koschei services
and the frontend. The alterations to configuration that aren't temporary
should be done through ansible playbook. Configuration changes have no
effect on already running services -- they need to be restarted, which
happens automatically when using the playbook.
== Disk usage
Koschei doesn't keep on disk anything that couldn't be recreated easily
- all important data is stored in PostgreSQL database, configuration is
managed by Ansible, code installed by RPM and so on.
To speed up operation and reduce load on external servers, Koschei
caches some data obtained from services it integrates with. Most
notably, YUM repositories downloaded from Koji are kept in
`/var/cache/koschei/repodata`. Each repository takes about 100 MB of
disk space. Maximal number of repositories kept at time is controlled by
`cache_l2_capacity` parameter in `config-backend.cfg`
(`config-backend.cfg.j2` in Ansible). If repodata cache starts to
consume too much disk space, that value can be decreased - after
restart, `koschei-*-resolver` will remove least recently used cache
entries to respect configured cache capacity.
== Database
Koschei needs to connect to a PostgreSQL database, other database
systems are not supported. Database connection is specified in the
configuration under the `database_config` key that can contain the
following keys: `username, password, host, port, database`.
After an update of koschei, the database needs to be migrated to new
schema. This happens automatically when using the upgrade playbook.
Alternatively, it can be executed manulally using:
....
koschei-admin alembic upgrade head
....
The backend services need to be stopped during the migration.
== Managing koschei services
Koschei services are systemd units managed through `systemctl`. They can
be started and stopped independently in any order. The frontend is run
using httpd.
== Suspespending koschei operation
For stopping builds from being scheduled, stopping the
`koschei-scheduler` service is enough. For planned Koji outages, it's
recommended to stop `koschei-scheduler`. It is not necessary, as koschei
can recover from Koji errors and network errors automatically, but when
Koji builders are stopped, it may cause unexpected build failures that
would be reported to users. Other services can be left running as they
automatically restart themselves on Koji and network errors.
== Limiting Koji usage
Koschei is by default limited to 30 concurrently running builds. This
limit can be changed in the configuration under `koji_config.max_builds`
key. There's also Koji load monitoring, that prevents builds from being
scheduled when Koji load is higher that certain threshold. That should
prevent scheduling builds during mass rebuilds, so it's not necessary to
stop scheduling during those.
== Fedmsg notifications
Koschei optionally supports sending fedmsg notifications for package
state changes. The fedmsg dispatch can be turned on and off in the
configuration (key `fedmsg-publisher.enabled`). Koschei doesn't supply
configuration for fedmsg, it lets the library to load it's own (in
`/etc/fedmsg.d/`).
== Setting admin announcement
Koschei can display announcement in web UI. This is mostly useful to
inform users about outages or other problems.
To set announcement, run as koschei user:
....
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
....
or:
....
koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
....
To clear announcement, run as koschei user:
....
koschei-admin clear-notice
....
== Adding package groups
Packages can be added to one or more group.
To add new group named "mynewgroup", run as koschei user:
....
koschei-admin add-group mynewgroup
....
To add new group named "mynewgroup" and populate it with some packages,
run as koschei user:
....
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
....
== Set package static priority
Some packages are more or less important and can have higher or lower
priority. Any user can change manual priority, which is reset after
package is rebuilt. Admins can additionally set static priority, which
is not affected by package rebuilds.
To set static priority of package "foo" to value "100", run as koschei
user:
....
koschei-admin --collection f27 set-priority --static foo 100
....
== Branching a new Fedora release
After branching occurs and Koji build targets have been created, Koschei
should be updated to reflect the new state. There is a special admin
command for this purpose, which takes care of copying the configuration
and also last builds from the history.
To branch the collection from Fedora 27 to Fedora 28, use the following:
....
koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version 27
....
Then you can optionally verify that the collection configuration is
correct by visiting https://apps.fedoraproject.org/koschei/collections
and examining the configuration of the newly branched collection.

View file

@ -0,0 +1,281 @@
= Layered Image Build System
The
https://docs.pagure.org/releng/layered_image_build_service.html[Fedora
Layered Image Build System], often referred to as
https://github.com/projectatomic/osbs-client[OSBS] (OpenShift Build
Service) as that is the upstream project that this is based on, is used
to build Layered Container Images in the Fedora Infrastructure via Koji.
== Contents
[arabic]
. Contact Information
. Overview
. Setup
. Outage
== Contact Information
Owner::
Clement Verna (cverna)
Contact::
#fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main,
sysadmin-releng
Location::
osbs-control01, osbs-master01, osbs-node01, osbs-node02
registry.fedoraproject.org, candidate-registry.fedoraproject.org
+
osbs-control01.stg, osbs-master01.stg, osbs-node01.stg,
osbs-node02.stg registry.stg.fedoraproject.org,
candidate-registry.stg.fedoraproject.org
+
x86_64 koji buildvms
Purpose::
Layered Container Image Builds
== Overview
The build system is setup such that Fedora Layered Image maintainers
will submit a build to Koji via the `fedpkg container-build` command a
`container` namespace within
https://src.fedoraproject.org/projects/container/*[DistGit]. This will
trigger the build to be scheduled in
https://www.openshift.org/[OpenShift] via
https://github.com/projectatomic/osbs-client[osbs-client] tooling, this
will create a custom
https://docs.openshift.org/latest/dev_guide/builds.html[OpenShift Build]
which will use the pre-made buildroot container image that we have
created. The https://github.com/projectatomic/atomic-reactor[Atomic
Reactor] (`atomic-reactor`) utility will run within the buildroot and
prep the build container where the actual build action will execute, it
will also maintain uploading the
https://fedoraproject.org/wiki/Koji/ContentGenerators[Content Generator]
metadata back to https://fedoraproject.org/wiki/Koji[Koji] and upload
the built image to the candidate docker registry. This will run on a
host with iptables rules restricting access to the docker bridge, this
is how we will further limit the access of the buildroot to the outside
world verifying that all sources of information come from Fedora.
Completed layered image builds are hosted in a candidate docker registry
which is then used to pull the image and perform tests.
== Setup
The Layered Image Build System setup is currently as follows (more
detailed view available in the
https://docs.pagure.org/releng/layered_image_build_service.html[RelEng
Architecture Document]):
....
=== Layered Image Build System Overview ===
+--------------+ +-----------+
| | | |
| koji hub +----+ | batcave |
| | | | |
+--------------+ | +----+------+
| |
V |
+----------------+ V
| | +----------------+
| koji builder | | +-----------+
| | | osbs-control01 +--------+ |
+-+--------------+ | +-----+ | |
| +----------------+ | | |
| | | |
| | | |
| | | |
V | | |
+----------------+ | | |
| | | | |
| osbs-master01 +------------------------------+ [ansible]
| +-------+ | | | |
+----------------+ | | | | |
^ | | | | |
| | | | | |
| V V | | |
| +-----------------+ +----------------+ | | |
| | | | | | | |
| | osbs-node01 | | osbs-node02 | | | |
| | | | | | | |
| +-----------------+ +----------------+ | | |
| ^ ^ | | |
| | | | | |
| | +-----------+ | |
| | | |
| +------------------------------------------+ |
| |
+-------------------------------------------------------------+
....
=== Deployment
From batcave you can run the following
....
$ sudo rbac-playbook groups/osbs/deploy-cluster.yml
....
This is going to deploy the OpenShift cluster used by OSBS. Currently
the playbook deploys 2 clusters (x86_64 and aarch64). Ansible tags can
be used to deploy only one of these if needed for example
[.title-ref]#osbs-x86-deploy-openshift#.
If the openshift-ansible playbook fails it can be easier to run it
directly from osbs-control01 and use the verbose mode.
....
$ ssh osbs-control01.iad2.fedoraproject.org
$ sudo -i
# cd /root/openshift-ansible
# ansible-playbook -i cluster-inventory playbooks/prerequisites.yml
# ansible-playbook -i cluster-inventory playbooks/deploy_cluster.yml
....
Once these playbook have been successfull, you can configure OSBS on the
cluster. For that use the following playbook
....
$ sudo rbac-playbook groups/osbs/configure-osbs.yml
....
When this is done we need to get the new koji service token and update
its value in the private repository
....
$ ssh osbs-master01.iad2.fedoraproject.org
$ sudo -i
# oc -n osbs-fedora sa get-token koji
dsjflksfkgjgkjfdl ....
....
The token needs to be saved in the private ansible repo in
[.title-ref]#files/osbs/production/x86-64-osbs-koji#. Once this is done
you can run the builder playbook to update that token.
....
$ sudo rbac-playbook groups/buildvm.yml -t osbs
....
=== Operation
Koji Hub will schedule the containerBuild on a koji builder via the
koji-containerbuild-hub plugin, the builder will then submit the build
in OpenShift via the koji-containerbuild-builder plugin which uses the
osbs-client python API that wraps the OpenShift API along with a custom
OpenShift Build JSON payload.
The Build is then scheduled in OpenShift and it's logs are captured by
the koji plugins. Inside the buildroot, atomic-reactor will upload the
built container image as well as provide the metadata to koji's content
generator.
== Outage
If Koji is down, then builds can't be scheduled but repairing Koji is
outside the scope of this document.
If either the candidate-registry.fedoraproject.org or
registry.fedoraproject.org Container Registries are unavailable, but
repairing those is also outside the scope of this document.
=== OSBS Failures
OpenShift Build System itself can have various types of failures that
are known about and the recovery procedures are listed below.
==== Ran out of disk space
Docker uses a lot of disk space, and while the osbs-nodes have been
alloted what is considered to be ample disk space for builds (since they
are automatically cleaned up periodically) it is possible this will run
out.
To resolve this, run the following commands:
....
# These command will clean up old/dead docker containers from old OpenShift
# Pods
$ for i in $(sudo docker ps -a | awk '/Exited/ { print $1 }'); do sudo docker rm $i; done
$ for i in $(sudo docker images -q -f 'dangling=true'); do sudo docker rmi $i; done
# This command should only be run on osbs-master01 (it won't work on the
# nodes)
#
# This command will clean up old builds and related artifacts in OpenShift
# that are older than 30 days (We can get more aggressive about this if
# necessary, the main reason these still exist is in the event we need to
# debug something. All build info we care about is stored in Koji.)
$ oadm prune builds --orphans --keep-younger-than=720h0m0s --confirm
....
==== A node is broken, how to remove it from the cluster?
If a node is having an issue, the following command will effectively
remove it from the cluster temporarily.
In this example, we are removing osbs-node01
....
$ oadm manage-node osbs-node01.phx2.fedoraproject.org --schedulable=true
....
==== Container Builds are unable to access resources on the network
Sometimes the Container Builds will fail and the logs will show that the
buildroot is unable to access networked resources (docker registry, dnf
repos, etc).
This is because of a bug in OpenShift v1.3.1 (current upstream release
at the time of this writing) where an OpenVSwitch flow is left behind
when a Pod is destroyed instead of the flow being deleted along with the
Pod.
Method to confirm the issue is unfortunately multi-step since it's not a
cluster-wide issue but isolated to the node experiencing the problem.
First in the koji createContainer task there is a log file called
openshift-incremental.log and in there you will find a key:value in some
JSON output similar to the following:
....
'openshift_build_selflink': u'/oapi/v1/namespaces/default/builds/cockpit-f24-6``
....
The last field of the value, in this example `cockpit-f24-6` is the
OpenShift build identifier. We need to ssh into `osbs-master01` and get
information about which node that ran on.
....
# On osbs-master01
# Note: the output won't be pretty, but it gives you the info you need
$ sudo oc get build cockpit-f25-3 -o yaml | grep osbs-node
....
Once you know what machine you need, ssh into it and run the following:
....
$ sudo docker run --rm -ti buildroot /bin/bash'
# now attempt to run a curl command
$ curl https://google.com
# This should get refused, but if this node is experiencing the networking
# issue then this command will hang and eventually time out
....
How to fix:
Reboot the affected node that's experiencing the issue, when the node
comes back up OpenShift will rebuild the flow tables on OpenVSwitch and
things will be back to normal.
....
systemctl reboot
....

View file

@ -0,0 +1,27 @@
= librariesio2fedmsg SOP
librariesio2fedmsg is a small service that converts Server-Sent Events
from https://libraries.io/[libraries.io] to fedmsgs.
librariesio2fedmsg is an instance of
https://github.com/fedora-infra/sse2fedmsg[sse2fedmsg] using the
http://firehose.libraries.io/events[libraries.io firehose] running on
https://os.fedoraproject.org/[OpenShift] and publishes its fedmsgs
through the busgateway01.phx2.fedoraproject.org relay using the
`org.fedoraproject.prod.sse2fedmsg.librariesio` topic.
== Updating
sse2fedmsg is installed directly from its git repository, so once a new
release is tagged in sse2fedmsg, just update the tag in the git URL
provided to pip in the
https://infrastructure.fedoraproject.org/infra/ansible/roles/openshift-apps/librariesio2fedmsg/files/[build
config].
== Deploying
Run the playbook to apply the new OpenShift configuration:
....
$ sudo rbac-playbook openshift-apps/librariesio2fedmsg.yml
....

View file

@ -0,0 +1,77 @@
= Link tracking
Using link tracking is [43]an easy way for us to find out how people are
getting to our download page. People might click over to our download
page from any of a number of areas, and knowing the relative usage of
those links can help us understand what materials we're producing are
more effective than others.
== Adding links
Each link should be constructed by adding ? to the URL, followed by a
short code that includes:
* an indicator for the link source (such as the wiki release notes)
* an indicator for the Fedora release in specific (such as F15 for the
final, or F15a for the Alpha test release)
So a link to get.fp.o from the one-page release notes would become
http://get.fedoraproject.org/?opF15.
== FAQ
I want to copy a link to my status update for social networking, or my
blog.::
If you're posting a status update to identi.ca, for example, use the
link tracking code for status updates. Don't copy a link straight from
an announcement that includes link tracking from the announcement. You
can copy the link itself but remember to change the portion after the
? to instead use the st code for status updates and blogs, followed by
the Fedora release version (such as F16a, F16b, or F16), like this:
+
....
http://fedoraproject.org/get-prerelease?stF16a
....
I want to point people to the announcement from my blog. Should I use
the announcement link tracking code?::
The actual URL link itself is the announcement URL. Add the link
tracking code for blogs, which would start with ?st and end with the
Fedora release version, like this:
+
....
http://fedoraproject.org/wiki/F16_release_announcement?stF16a
....
== The codes
[NOTE]
.Note
====
Additions to this table are welcome.
====
[cols=",",options="header",]
|===
|Link source |Code
|Email announcements |an
|----------------------------------------------- |----------
|Wiki announcements |wkan
|----------------------------------------------- |----------
|Front page |fp
|----------------------------------------------- |----------
|Front page of wiki |wkfp
|----------------------------------------------- |----------
|The press release Red Hat makes |rhpr
|----------------------------------------------- |----------
|http://redhat.com/fedora |rhf
|----------------------------------------------- |----------
|Test phase release notes on |wkrn
|----------------------------------------------- |----------
|Official release notes |rn
|----------------------------------------------- |----------
|Official installation guide |ig
|----------------------------------------------- |----------
|One-page release notes |op
|----------------------------------------------- |----------
|Status links (blogs, social media) |st
|===

View file

@ -0,0 +1,148 @@
= Loopabull
https://github.com/maxamillion/loopabull[Loopabull] is an event-driven
https://www.ansible.com/[Ansible]-based automation engine. This is used
for various tasks, originally slated for
https://pagure.io/releng-automation[Release Engineering Automation].
== Contents
[arabic]
. Contact Information
. Overview
. Setup
. Outage
== Contact Information
Owner::
Adam Miller (maxamillion) Pierre-Yves Chibon (pingou)
Contact::
#fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main,
sysadmin-releng
Location::
loopabull01.phx2.fedoraproject.org
loopabull01.stg.phx2.fedoraproject.org
Purpose::
Event Driven Automation of tasks within the Fedora Infrastructure and
Fedora Release Engineering
== Overview
The https://github.com/maxamillion/loopabull[loopabull] system is setup
such that an event will take place within the infrastructure and a
http://www.fedmsg.com/en/latest/[fedmsg] is sent, then loopabull will
consume that message, trigger an https://www.ansible.com/[Ansible]
http://docs.ansible.com/ansible/playbooks.html[playbook] that shares a
name with the fedmsg topic, and provide the payload of the fedmsg to the
playbook as
https://github.com/ansible/ansible/blob/devel/docs/man/man1/ansible-playbook.1.asciidoc.in[extra
variables].
== Setup
The setup is relatively simple, the Overview above describes it and a
more detailed version can be found in the [.title-ref]#releng docs#.
....
+-----------------+ +-------------------------------+
| | | |
| fedmsg +------------>| Looper |
| | | (fedmsg handler plugin) |
| | | |
+-----------------+ +-------------------------------+
|
|
+-------------------+ |
| | |
| | |
| Loopabull +<-------------+
| (Event Loop) |
| |
+---------+---------+
|
|
|
|
V
+----------+-----------+
| |
| ansible-playbook |
| |
+----------------------+
....
=== Deployment
Loopabull is deployed on two hosts, one for the production instance:
`loopabull01.prod.phx2.fedoraproject.org` and one for the staging
instance: `loopabull01.stg.phx2.fedoraproject.org`.
Each host is running loopabull with 5 workers reacting to fedmsg
notifications.
== Expanding loopabull
The documentation to expand loopabull's usage is documented at:
https://pagure.io/Fedora-Infra/loopabull-tasks
== Outage
In the event that loopabull isn't responding or isn't running playbooks
as it should be, the following scenarios should be approached.
=== What is going on?
There are a few commands that may help figuring out what is going:
* Check the status of the different services:
....
systemctl |grep loopabull
....
* Follow the logs of the different services:
....
journalctl -lfu loopabull -u loopabull@1 -u loopabull@2 -u loopabull@3 \
-u loopabull@4 -u loopabull@5
....
If a playbook returns a non-zero error code, the worker running it will
be stopped. If that happens, you may want to carefully review the logs
to assess what lead to this situation so it can be prevented in the
future.
* Monitoring the queue size
The loopabull service listens to the fedmsg bus and puts the messages as
they come into a rabbitmq/amqp queue for the workers to process. If you
want to see the number of messages pending to be processed by the
workers you can check the queue size using:
....
rabbitmqctl list_queues
....
The output will be something like:
....
Listing queues ...
workers 489989
...done.
....
Where `workers` is the name of the queue used by loopabull and `489989`
the number of messages in that queue (yes that day we were recovering
from a several-day long outage).
=== Network Interruption
Sometimes if the network is interrupted, the loopabull service will hang
because the fedmsg listener will hold a dead socket open. The service
and its workers simply needs to be restarted at that point.
....
systemctl restart loopabull loopabull@1 loopabull@2 loopabull@3 \
loopabull@4 loopabull@5
....

View file

@ -0,0 +1,115 @@
= Mailman Infrastructure SOP
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-tools, sysadmin-hosted
Location::
phx2
Servers::
mailman01, mailman02, mailman01.stg
Purpose::
Provides mailing list services.
== Description
Mailing list services for Fedora projects are located on the
mailman01.phx2.fedoraproject.org server.
== Common Tasks
=== Creating a new mailing list
* Log into mailman01
* `sudo -u mailman mailman3 create <listname>@lists.fedora(project|hosted).org --owner <username>@fedoraproject.org --notify`
+
[IMPORTANT]
.Important
====
Please make sure to add a valid description to the newly created list.
(to avoid [no description available] on listinfo index)
====
== Removing content from archives
We don't.
It's not easy to remove content from the archives and it's generally
useless as well because the archives are often mirrored by third parties
as well as being in the INBOXs of all of the people on the mailing list
at that time. Here's an example message to send to someone who requests
removal of archived content:
....
Greetings,
We're sorry to say that we don't remove content from the mailing list archives.
Doing so is a non-trivial amount of work and usually doesn't achieve anything
because the content has already been disseminated to a wide audience that we do
not control. The emails have gone out to all of the subscribers of the mailing
list at that time and also (for a great many of our lists) been copied by third
parties (for instance: http://markmail.org and http://gmane.org).
Sorry we cannot help further,
Mailing lists and their owners
....
== Checking Membership
Are you in need of checking who owns a certain mailing list without
having to search around on list's frontpages?
Mailman has a nice tool that will help us list members by type.
Get a full list of all the mailing lists hosted on the server:
....
sudo -u mailman mailman3 lists
....
Get the list of regular members for example@example.com:
....
sudo -u mailman mailman3 members example@example.com
....
Get the list of owners for example@example.com:
....
sudo -u mailman mailman3 members -R owner example@example.com
....
Get the list of moderators for example@example.com:
....
sudo -u mailman mailman3 members -R moderator example@example.com
....
== Troubleshooting and Resolution
=== List Administration
Specific users are marked as 'site admins' in the database.
Please file a issue if you feel you need to have this access.
=== Restart Procedure
If the server needs to be restarted mailman should come back on it's
own. Otherwise each service on it can be restarted:
....
sudo service mailman3 restart
sudo service postfix restart
....
== How to delete a mailing list
Delete a list, but keep the archives:
....
sudo -u mailman mailman3 remove <listname>
....

View file

@ -0,0 +1,53 @@
= SSL Certificate Creation SOP
Every now and then you will need to create an SSL certificate for a
Fedora Service.
== Creating a CSR for a new server.
Know your hostname, ie [.title-ref]##lists.fedoraproject.org##`:
....
export ssl_name=<fqdn of host>
....
Create the cert. 8192 does not work with various boxes so we use 4096
currently.:
....
openssl genrsa -out ${ssl_name}.pem 4096
openssl req -new -key ${ssl_name}.pem -out $(ssl_name}.csr
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:NM
Locality Name (eg, city) [Default City]:Raleigh
Organization Name (eg, company) [Default Company Ltd]:Red Hat
Organizational Unit Name (eg, section) []:Fedora Project
Common Name (eg, your name or your server's hostname)
[]:lists.fedorahosted.org
Email Address []:admin@fedoraproject.org
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
....
send the CSR to the signing authority and wait for a cert. place all
three into private directory so that you can make certs in the future.
== Creating a temporary self-signed certificate.
Repeat the steps above but add in the following:
....
openssl x509 -req -days 30 -in ${ssl_name}.csr -signkey ${ssl_name}.pem -out ${ssl_name}.cert
Signature ok
subject=/C=US/ST=NM/L=Raleigh/O=Red Hat/OU=Fedora
Project/CN=lists.fedorahosted.org/emailAddress=admin@fedoraproject.org
....
Getting Private key
We only want a self-signed certificate to be good for a short time so 30
days sounds good.

View file

@ -0,0 +1,418 @@
= Mass Upgrade Infrastructure SOP
Every once in a while, we need to apply mass upgrades to our servers for
various security and other upgrades.
== Contents
[arabic]
. Contact Information
. Preparation
. Staging
. Special Considerations
+
____
* Disable builders
* Post reboot action
* Schedule autoqa01 reboot
* Bastion01 and Bastion02 and openvpn server
* Special yum directives
____
. Update Leader
. Group A reboots
. Group B reboots
. Group C reboots
. Doing the upgrade
. Doing the reboot
. Aftermath
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main, infrastructure@lists.fedoraproject.org,
#fedora-noc
Location:::
All over the world.
Servers:::
all
Purpose:::
Apply kernel/other upgrades to all of our servers
== Preparation
[arabic]
. Determine which host group you are going to be doing updates/reboots
on.
+
Group "A"::
servers that end users will see or note being down and anything that
depends on them.
Group "B"::
servers that contributors will see or note being down and anything
that depends on them.
Group "C"::
servers that infrastructure will notice are down, or are redundent
enough to reboot some with others taking the load.
. Appoint an 'Update Leader' for the updates.
. Follow the [61]Outage Infrastructure SOP and send advance notification
to the appropriate lists. Try to schedule the update at a time when many
admins are around to help/watch for problems and when impact for the
group affected is less. Do NOT do multiple groups on the same day if
possible.
. Plan an order for rebooting the machines considering two factors:
+
____
* Location of systems on the kvm or xen hosts. [You will normally reboot
all systems on a host together]
* Impact of systems going down on other services, operations and users.
Thus since the database servers and nfs servers are the backbone of many
other systems, they and systems that are on the same xen boxes would be
rebooted before other boxes.
____
. To aid in organizing a mass upgrade/reboot with many people helping,
it may help to create a checklist of machines in a gobby document.
. Schedule downtime in nagios.
. Make doubly sure that various app owners are aware of the reboots
== Staging
____
Any updates that can be tested in staging or a pre-production
environment should be tested there first. Including new kernels, updates
to core database applications / libraries. Web applications, libraries,
etc.
____
== Special Considerations
While this may not be a complete list, here are some special things that
must be taken into account before rebooting certain systems:
=== Disable builders
Before the following machines are rebooted, all koji builders should be
disabled and all running jobs allowed to complete:
____
* db04
* nfs01
* kojipkgs02
____
Builders can be removed from koji, updated and re-added. Use:
....
koji disable-host NAME
and
koji enable-host NAME
....
[NOTE]
.Note
====
you must be a koji admin
====
Additionally, rel-eng and builder boxes may need a special version
of rpm. Make sure to check with rel-eng on any rpm upgrades for them.
=== Post reboot action
The following machines require post-boot actions (mostly entering
passphrases). Make sure admins that have the passphrases are on hand for
the reboot:
____
* backup-2 (LUKS passphrase on boot)
* sign-vault01 (NSS passphrase for sigul service)
* sign-bridge01 (NSS passphrase for sigul bridge service)
* serverbeach* (requires fixing firewall rules):
____
Each serverbeach host needs 3 or 4 iptables rules added anytime it's
rebooted or libvirt is upgraded:
....
iptables -I FORWARD -o virbr0 -j ACCEPT
iptables -I FORWARD -i virbr0 -j ACCEPT
iptables -t nat -I POSTROUTING -s 192.168.122.3/32 -j SNAT --to-source 66.135.62.187
....
[NOTE]
.Note
====
The source is the internal guest ips, the to-source is the external ips
that map to that guest ip. If there are multiple guests, each one needs
the above SNAT rule inserted.
====
=== Schedule autoqa01 reboot
There is currently an autoqa01.c host on cnode01. Check with QA folks
before rebooting this guest/host.
=== Bastion01 and Bastion02 and openvpn server
We need one of the bastion machines to be up to provide openvpn for all
machines. Before rebooting bastion02, modify:
`manifests/nodes/bastion0*.phx2.fedoraproject.org.pp` files to start
openvpn server on bastion01, wait for all clients to re-connect, reboot
bastion02 and then revert back to it as openvpn hub.
=== Special yum directives
Sometimes we will wish to exclude or otherwise modify the yum.conf on a
machine. For this purpose, all machines have an include, making them
read
[62]http://infrastructure.fedoraproject.org/infra/hosts/FQHN/yum.conf.include
from the infrastructure repo. If you need to make such changes, add them
to the infrastructure repo before doing updates.
== Update Leader
Each update should have a Leader appointed. This person will be in
charge of doing any read-write operations, and delegating to others to
do tasks. If you aren't specficially asked by the Leader to reboot or
change something, please don't. The Leader will assign out machine
groups to reboot, or ask specific people to look at machines that didn't
come back up from reboot or aren't working right after reboot. It's
important to avoid multiple people operating on a single machine in a
read-write manner and interfering with changes.
== Group A reboots
Group A machines are end user critical ones. Outages here should be
planned at least a week in advance and announced to the announce list.
List of machines currently in A group (note: this is going to be
automated)
These hosts are grouped based on the virt host they reside on:
* torrent02.fedoraproject.org
* ibiblio02.fedoraproject.org
* people03.fedoraproject.org
* ibiblio03.fedoraproject.org
* collab01.fedoraproject.org
* serverbeach09.fedoraproject.org
* db05.phx2.fedoraproject.org
* virthost03.phx2.fedoraproject.org
* db01.phx2.fedoraproject.org
* virthost04.phx2.fedoraproject.org
* db-fas01.phx2.fedoraproject.org
* proxy01.phx2.fedoraproject.org
* virthost05.phx2.fedoraproject.org
* ask01.phx2.fedoraproject.org
* virthost06.phx2.fedoraproject.org
These are the rest:
* bapp02.phx2.fedoraproject.org
* bastion02.phx2.fedoraproject.org
* app05.fedoraproject.org
* backup02.fedoraproject.org
* bastion01.phx2.fedoraproject.org
* fas01.phx2.fedoraproject.org
* fas02.phx2.fedoraproject.org
* log02.phx2.fedoraproject.org
* memcached03.phx2.fedoraproject.org
* noc01.phx2.fedoraproject.org
* ns02.fedoraproject.org
* ns04.phx2.fedoraproject.org
* proxy04.fedoraproject.org
* smtp-mm03.fedoraproject.org
* batcave02.phx2.fedoraproject.org
* mm3test.fedoraproject.org
* packages02.phx2.fedoraproject.org
=== Group B reboots
This Group contains machines that contributors use. Announcements of
outages here should be at least a week in advance and sent to the
devel-announce list.
These hosts are grouped based on the virt host they reside on:
* db04.phx2.fedoraproject.org
* bvirthost01.phx2.fedoraproject.org
* nfs01.phx2.fedoraproject.org
* bvirthost02.phx2.fedoraproject.org
* pkgs01.phx2.fedoraproject.org
* bvirthost03.phx2.fedoraproject.org
* kojipkgs02.phx2.fedoraproject.org
* bvirthost04.phx2.fedoraproject.org
These are the rest:
* koji04.phx2.fedoraproject.org
* releng03.phx2.fedoraproject.org
* releng04.phx2.fedoraproject.org
=== Group C reboots
Group C are machines that infrastructure uses, or can be rebooted in
such a way as to continue to provide services to others via multiple
machines. Outages here should be announced on the infrastructure list.
Group C hosts that have proxy servers on them:
* proxy02.fedoraproject.org
* ns05.fedoraproject.org
* hosted-lists01.fedoraproject.org
* internetx01.fedoraproject.org
* app01.dev.fedoraproject.org
* darkserver01.dev.fedoraproject.org
* fakefas01.fedoraproject.org
* proxy06.fedoraproject.org
* osuosl01.fedoraproject.org
* proxy07.fedoraproject.org
* bodhost01.fedoraproject.org
* proxy03.fedoraproject.org
* smtp-mm02.fedoraproject.org
* tummy01.fedoraproject.org
* app06.fedoraproject.org
* noc02.fedoraproject.org
* proxy05.fedoraproject.org
* smtp-mm01.fedoraproject.org
* telia01.fedoraproject.org
* app08.fedoraproject.org
* proxy08.fedoraproject.org
* coloamer01.fedoraproject.org
+
____
Other Group C hosts:
____
* ask01.stg.phx2.fedoraproject.org
* app02.stg.phx2.fedoraproject.org
* proxy01.stg.phx2.fedoraproject.org
* releng01.stg.phx2.fedoraproject.org
* value01.stg.phx2.fedoraproject.org
* virthost13.phx2.fedoraproject.org
* db-fas01.stg.phx2.fedoraproject.org
* pkgs01.stg.phx2.fedoraproject.org
* packages01.stg.phx2.fedoraproject.org
* virthost11.phx2.fedoraproject.org
* app01.stg.phx2.fedoraproject.org
* koji01.stg.phx2.fedoraproject.org
* db02.stg.phx2.fedoraproject.org
* fas01.stg.phx2.fedoraproject.org
* virthost10.phx2.fedoraproject.org
* autoqa01.qa.fedoraproject.org
* autoqa-stg01.qa.fedoraproject.org
* bastion-comm01.qa.fedoraproject.org
* batcave-comm01.qa.fedoraproject.org
* virthost-comm01.qa.fedoraproject.org
* compose-x86-01.phx2.fedoraproject.org
* compose-x86-02.phx2.fedoraproject.org
* download01.phx2.fedoraproject.org
* download02.phx2.fedoraproject.org
* download03.phx2.fedoraproject.org
* download04.phx2.fedoraproject.org
* download05.phx2.fedoraproject.org
* download-rdu01.vpn.fedoraproject.org
* download-rdu02.vpn.fedoraproject.org
* download-rdu03.vpn.fedoraproject.org
* fas03.phx2.fedoraproject.org
* secondary01.phx2.fedoraproject.org
* memcached04.phx2.fedoraproject.org
* virthost01.phx2.fedoraproject.org
* app02.phx2.fedoraproject.org
* value03.phx2.fedoraproject.org
* virthost07.phx2.fedoraproject.org
* app03.phx2.fedoraproject.org
* value04.phx2.fedoraproject.org
* ns03.phx2.fedoraproject.org
* darkserver01.phx2.fedoraproject.org
* virthost08.phx2.fedoraproject.org
* app04.phx2.fedoraproject.org
* packages02.phx2.fedoraproject.org
* virthost09.phx2.fedoraproject.org
* hosted03.fedoraproject.org
* serverbeach06.fedoraproject.org
* hosted04.fedoraproject.org
* serverbeach07.fedoraproject.org
* collab02.fedoraproject.org
* serverbeach08.fedoraproject.org
* dhcp01.phx2.fedoraproject.org
* relepel01.phx2.fedoraproject.org
* sign-bridge02.phx2.fedoraproject.org
* koji03.phx2.fedoraproject.org
* bvirthost05.phx2.fedoraproject.org
* (disable each builder in turn, update and reenable).
* ppc11.phx2.fedoraproject.org
* ppc12.phx2.fedoraproject.org
* backup03
== Doing the upgrade
If possible, system upgrades should be done in advance of the reboot
(with relevant testing of new packages on staging). To do the upgrades,
make sure that the Infrastructure RHEL repo is updated as necessary to
pull in the new packages ([63]Infrastructure Yum Repo SOP)
On batcave01, as root run:
....
func-yum [--host=hostname] update
....
..note: --host can be specified multiple times and takes wildcards.
pinging people as necessary if you are unsure about any packages.
Additionally you can see which machines still need rebooted with:
....
sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py | grep yes
....
You can also see which machines would need a reboot if updates were all
applied with:
....
sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py after-updates | grep yes
....
== Doing the reboot
In the order determined above, reboots will usually be grouped by the
virtualization hosts that the servers are on. You can see the guests per
virt host on batcave01 in /var/log/virthost-lists.out
To reboot sets of boxes based on which virthost they are we've written a
special script which facilitates it:
....
func-vhost-reboot virthost-fqdn
....
ex:
....
sudo func-vhost-reboot virthost13.phx2.fedoraproject.org
....
== Aftermath
[arabic]
. Make sure that everything's running fine
. Reenable nagios notification as needed
. {blank}
+
Make sure to perform any manual post-boot setup (such as entering::
passphrases for encrypted volumes)
. Close outage ticket.
=== Non virthost reboots:
If you need to reboot specific hosts and make sure they recover -
consider using:
....
sudo func-host-reboot hostname hostname1 hostname2 ...
....
If you want to reboot the hosts one at a time waiting for each to come
back before rebooting the next pass a -o to func-host-reboot.

View file

@ -0,0 +1,79 @@
= Master Mirror Infrastructure SOP
== Contents
[arabic]
. Contact Information
. PHX Master Mirror Setup
. RDU I2 Master Mirror Setup
. Raising Issues
== Contact Information
Owner:::
Red Hat IS
Contact:::
#fedora-admin, Red Hat ticket
Location:::
PHX
Servers:::
server[1-5].download.phx.redhat.com
Purpose:::
Provides the master mirrors for Fedora distribution
== PHX Master Mirror Setup
The master mirrors are accessible as:
....
download1.fedora.redhat.com -> CNAME to download3.fedora.redhat.com
download2.fedora.redhat.com -> currently no DNS entry
download3.fedora.redhat.com -> 209.132.176.20
download4.fedora.redhat.com -> 209.132.176.220
download5.fedora.redhat.com -> 209.132.176.221
....
from the outside. download.fedora.redhat.com is a round robin to the
above::
IPs.
The external IPs correspond to internal load balancer IPs that balance
between server[1-5]:
....
209.132.176.20 -> 10.9.24.20
209.132.176.220 -> 10.9.24.220
209.132.176.221 -> 10.9.24.221
....
The load balancers then balance between the below Fedora IPs on the
rsync servers:
....
10.8.24.21 (fedora1.download.phx.redhat.com) - server1.download.phx.redhat.com
10.8.24.22 (fedora2.download.phx.redhat.com) - server2.download.phx.redhat.com
10.8.24.23 (fedora3.download.phx.redhat.com) - server3.download.phx.redhat.com
10.8.24.24 (fedora4.download.phx.redhat.com) - server4.download.phx.redhat.com
10.8.24.25 (fedora5.download.phx.redhat.com) - server5.download.phx.redhat.com
....
== RDU I2 Master Mirror Setup
[NOTE]
.Note
====
This section is awaiting confirmation from RH - information here may not
be 100% accurate yet.
====
download-i2.fedora.redhat.com (rhm-i2.redhat.com) is a round robin
between:
....
204.85.14.3 - 10.11.45.3
204.85.14.5 - 10.11.45.5
....
== Raising Issues
Issues with any of this setup should be raised in a helpdesk ticket.

View file

@ -0,0 +1,217 @@
= Module Build Service Infra SOP
The MBS is a build orchestrator on top of Koji for "modules".
https://fedoraproject.org/wiki/Changes/ModuleBuildService
== Contact Information
Owner::
Release Engineering Team, Infrastructure Team
Contact::
#fedora-modularity, #fedora-admin, #fedora-releng
Persons::
jkaluza, fivaldi, breilly, mikem
Location::
Phoenix
Public addresses::
* mbs.fedoraproject.org
Servers::
* mbs-frontend0[1-2].phx2.fedoraproject.org
* mbs-backend01.phx2.fedoraproject.org
Purpose::
Build modules for Fedora.
== Description
Users submit builds to mbs.fedoraproject.org referencing their modulemd
file in dist-git. (In the future, users will not submit their own module
builds. The [.title-ref]#freshmaker# daemon (running in infrastructure)
will watch for .spec file changes and modulemd.yaml file changes -- it
will submit the relevant module builds to the MBS on behalf of users.)
The request to build a module is received by the MBS flask app running
on the mbs-frontend nodes.
Cursory validation of the submitted modulemd is performed on the
frontend: are the named packages valid? Are their branches valid? The
MBS keeps a copy of the modulemd and appends additional data describing
which branches pointed to which hashes at the time of submission.
A fedmsg from the frontend triggers the backend to start building the
module. First, tags and build/srpm-build groups are created. Then, a
module-build-macros package is synthesized and submitted as an srpm
build. When it is complete and available in the buildroot, the rest of
the rpm builds are submitted.
These are grouped and limited in two ways:
* First, there is a global NUM_CONCURRENT_BUILDS config option that
controls how many koji builds the MBS is allowed to have open at any
time. It serves as a throttle.
* Second, a given module may specify that it's components should have a
certain "build order". If there are 50 components, it may say that the
first 25 of them are in one buildorder batch, and the second 25 are in
another buildorder batch. The first batch will be submitted and, when
complete, tagged back into the buildroot. Only after they are available
will the second batch of 25 begin.
When the last component is complete, the MBS backend marks the build as
"done", and then marks it again as "ready". (There is currently no
meaning to the "ready" state beyond "done". We reserved that state for
future CI interactions.)
== Observing MBS Behavior
=== The mbs-build command
The https://pagure.io/fm-orchestrator[fm-orchestrator repo] and the
[.title-ref]#module-build-service# package provide an
[.title-ref]#mbs-build# command with a few subcommands. For general
help:
....
$ mbs-build --help
....
To generate a report of all currently active module builds:
....
$ mbs-build overview
ID State Submitted Components Owner Module
---- ------- -------------------- ------------ ------- -----------------------------------
570 build 2017-06-01T17:18:11Z 35/134 psabata shared-userspace-f26-20170601141014
569 build 2017-06-01T14:18:04Z 14/15 mkocka mariadb-f26-20170601141728
....
To generate a report of an individual module build, given its ID:
....
$ mbs-build info 569
NVR State Koji Task
---------------------------------------------- -------- ------------------------------------------------------------
libaio-0.3.110-7.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803741
BUILDING https://koji.fedoraproject.org/koji/taskinfo?taskID=19804081
libedit-3.1-17.20160618cvs.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803745
compat-openssl10-1.0.2j-6.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803746
policycoreutils-2.6-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803513
selinux-policy-3.13.1-255.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803748
systemtap-3.1-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803742
libcgroup-0.41-11.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685834
net-tools-2.0-0.42.20160912git.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19804010
time-1.7-52.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803747
desktop-file-utils-0.23-3.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685835
libselinux-2.6-6.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685833
module-build-macros-0.1-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803333
checkpolicy-2.6-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803514
dbus-glib-0.108-2.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685836
....
To actively watch a module build in flight, given its ID:
....
$ mbs-build watch 570
Still building:
libXrender https://koji.fedoraproject.org/koji/taskinfo?taskID=19804885
libXdamage https://koji.fedoraproject.org/koji/taskinfo?taskID=19805153
Failed:
libXxf86vm https://koji.fedoraproject.org/koji/taskinfo?taskID=19804903
Summary:
2 components in the BUILDING state
34 components in the COMPLETE state
1 components in the FAILED state
97 components in the undefined state
psabata's build #570 of shared-userspace-f26 is in the "build" state
....
=== The releng repo
There are more tools located in the [.title-ref]#scripts/mbs/# directory
of the releng repo: https://pagure.io/releng/blob/master/f/scripts/mbs
== Cancelling a module build
Users can cancel their own module builds with:
....
$ mbs-build cancel $BUILD_ID
....
MBS admins can also cancel builds of any user.
[NOTE]
.Note
====
MBS admins are defined as members of the groups listed in the
[.title-ref]#ADMIN_GROUPS# configuration options in
[.title-ref]#roles/mbs/common/templates/config.py#.
====
== Logs
The frontend logs are on mbs-frontend0[1-2] in
`/var/log/httpd/error_log`.
The backend logs are on mbs-backend01. Look in the journal for the
[.title-ref]#fedmsg-hub# service.
== Upgrading
The package in question is [.title-ref]#module-build-service#. Please
use the [.title-ref]#playbooks/manual/upgrade/mbs.yml# playbook.
== Managing Bootstrap Modules
In general, modules use other modules to define their buildroots, but
what defines the buildroot of the very first module? For this, we use
"bootstrap" modules which are manually selected. For some history on
this, see these tickets:
* https://pagure.io/releng/issue/6791
* https://pagure.io/fedora-infrastructure/issue/6097
The tag for a bootstrap module needs to be manually created and
populated by Release Engineering. Builds for that tag are curated and
selected from other Fedora tags, with care to ensure that only as many
builds are added as needed.
The existence of the tag is not enough for the bootstrap module to be
useable by MBS. MBS discovers the bootstrap module as a possible
dependency for other yet-to-be-built modules by querying PDC. During
normal operation, these entries in PDC are automatically created by
pdc-updater on pdc-backend02, but for the bootstrap tag they need to be
manually created and linked to the new bootstrap tag.
The fm-orchestrator repo has a
https://pagure.io/fm-orchestrator/blob/master/f/bootstrap[bootstrap/]
directory with tools that we used to create the first bootstrap entries.
If you need to create a new bootsrap entry or modify an existing one,
use these tools for inspiration. They are not general purpose and will
likely have to be modified to do what is needed. In particular, see
[.title-ref]#import-to-pdc.py# as an example of creating a new entry and
[.title-ref]#activate-in-pdc.py# for an example of editing an existing
entry.
To be usable, you'll need a token with rights to speak to staging/prod
PDC. See the PDC SOP for information on client configuration in
[.title-ref]#/etc/pdc.d/# and on where to find those tokens.
== Things that could go wrong
=== Overloading koji
If koji is overloaded, it should be acceptable to _stop_ the fedmsg-hub
daemon on mbs-backend01 at any time.
[NOTE]
.Note
====
As builds finish in koji, they will be _missed_ by the backend.. but
when it restarts it should find them in datagrepper. If that fails as
well, the mbs backend has a poller which should start up ~5 minutes
after startup that checks koji for anything it may have missed, at which
point it will resume functioning.
====
If koji continues to be overloaded after startup, try decreasing the
[.title-ref]#NUM_CONCURRENT_BUILDS# option in the config file in
[.title-ref]#roles/mbs/common/templates/#.

View file

@ -0,0 +1,71 @@
= Memcached Infrastructure SOP
Our memcached setup is currently only used for wiki sessions. With
mediawiki, sessions stored in files over NFS or in the DB are very slow.
Memcached is a non-blocking solution for our session storage.
== Contents
[arabic]
. Contact Information
. Checking Status
. Flushing Memcached
. Restarting Memcached
. Configuring Memcached
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-web groups
Location::
PHX
Servers::
memcached03, memcached04
Purpose::
Provide caching for Fedora web applications.
== Checking Status
Our memcached instances are currently firewalled to only allow access
from wiki application servers. To check the status of an instance, use:
....
echo stats | nc memcached0{3,4} 11211
....
from an allowed host.
== Flushing Memcached
Sometimes, wrong contents get cached, and the cache should be flushed.
To do this, use:
....
echo flush_all | nc memcached0{3,4} 11211
....
from an allowed host.
== Restarting Memcached
Note that restarting an memcached instance will drop all sessions stored
on that instance. As mediawiki uses hashing to distribute sessions
across multiple instances, restarting one out of two instances will
result in about half of the total sessions being dropped.
To restart memcached:
....
sudo /etc/init.d/memcached restart
....
== Configuring Memcached
Memcached is currently setup as a role in the ansible git repo. The main
two tunables are the MAXCONN (the maximum number of concurrent
connections) and CACHESIZE (the amount memory to use for storage). These
variables can be set through $memcached_maxconn and $memcached_cachesize
in ansible. Additionally, other options (as described in the memcached
manpage) can be set via $memcached_options.

View file

@ -0,0 +1,85 @@
= Message Tagging Service SOP
== Contact Information
Owner::
Factory2 Team, Fedora QA Team, Infrastructure Team
Contact::
#fedora-qa, #fedora-admin
Persons::
cqi, lucarval, vmaljulin
Location::
Phoenix
Servers::
* In OpenShift.
Purpose::
Tag module build
== Description
Message Tagging Service, aka MTS, is an event-driven microservice to tag
a module build triggered by MBS specific event.
MTS basically listens on message bus for the MBS event
`mbs.build.state.change`. Once a message is received, the module build
represented by that message will be tested if it matches any predefined
rules. Each rule definition has destination tag defined. If a rule
matches the build, the destination tag will be applied to that build.
Only module build in ready state is handled by MTS for now.
== Observing Behavior
Login to `os-master01.phx2.fedoraproject.org` as `root` (or,
authenticate remotely with openshift using
`oc login https://os.fedoraproject.org`), and run:
....
oc project mts
oc status -v
oc logs -f dc/mts
....
== Database
MTS does not use database.
== Configuration
Please do remember to increase `MTS_CONFIG_VERSION` so that Openshift
creates a new pod after running the playbook.
== Deployment
You can roll out configuration changes by changing the files in
`roles/openshift-apps/message-tagging-service/` and running the
`playbooks/openshift-apps/message-tagging-service.yml` playbook.
=== Stage
MTS docker image is built automatically and pushed to upstream quay.io.
By default, tag `latest` is applied to a fresh image. Tag `stg` is
applied to image, then run the playbook
`playbooks/openshift-apps/message-tagging-service.yml` with environment
`staging`.
=== Prod
If everything works well, apply tag `prod` to docker image in quay.io,
then, run the playbook with environment `prod`.
== Update Rules
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/message-tagging-service/files/mts-rules.yml[Rules
file] is managed along side the playbook role in same repository.
For detailed information of rules format, please refer to
https://pagure.io/modularity/blob/master/f/drafts/module-tagging-service/format.md[documentation]
under Modularity.
== Troubleshooting
In case of problems with MTS, check the logs:
....
oc logs -f dc/mts
....

View file

@ -0,0 +1,36 @@
= Mirror hiding Infrastructure SOP
At times, such as release day, there may be a conflict between Red Hat
trying to release content for RHEL, and Fedora trying to release Fedora.
One way to limit the pain to Red Hat on release day is to hide
download.fedora.redhat.com from the publiclist and mirrorlist
redirector, which will keep most people from downloading the content
from Red Hat directly.
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-main, sysadmin-web group
Location::
Phoenix
Servers::
app3, app4
Purpose::
Hide Public Mirrors from the publiclist / mirrorlist redirector
== Description
To hide a public mirror, so it doesn't appear on the publiclist or the
mirrorlist, simply go into the MirrorManager administrative web user
interface, at [45]https://admin.fedoraproject.org/mirrormanager. Fedora
sysadmins can see all Sites and Hosts. For each Site and Host, there is
a checkbox marked "private", which if set, will hide that Site (and all
its Hosts), or just that single Host, such that it won't appear on the
public lists.
To make a private-marked mirror public, simply clear the "private"
checkbox again.
This change takes effect at the top of each hour.

View file

@ -0,0 +1,20 @@
= AWS Mirrors
Fedora Infrastructure mirrors EPEL content (/pub/epel) into Amazon
Simple Storage Service (S3) in multiple regions, to make it fast for EC2
CentOS/RHEL users to get EPEL content from an effectively local mirror.
For this to work, we have private mirror entries in MirrorManager, one
for each region, which include the EC2 netblocks for that region.
Amazon updates their list of network blocks roughly monthly, as they
consume additional address space. Therefore, we need to make the
corresponding changes into MirrorManager's entries for same.
Amazon publishes their list of network blocks on their forum site, with
the subject "Announcement: Amazon EC2 Public IP Ranges". As of November
2014, this was https://forums.aws.amazon.com/ann.jspa?annID=1701
As of November 19, 2014, Amazon publishes it as a JSON file we can
download.
http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Some files were not shown because too many files have changed in this diff Show more