Remove fedmsg and github2fedmsg from documentation

This commit removes all the documentation related to fedmsg and
github2fedmsg. Removes all the mentions of fedmsg when it makes sense or
change it to Fedora messaging.

I didn't updated
modules/releng_misc_guide/pages/sop_pushing_updates.adoc as this needs
somebody with the knowledge of the process to update it.

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
This commit is contained in:
Michal Konecny 2025-02-03 11:11:06 +01:00 committed by zlopez
parent 2c72e82f01
commit b2f3b6589a
19 changed files with 18 additions and 1136 deletions

View file

@ -59,7 +59,6 @@ presented in our xref:sle.adoc[SLE Documentation].
* Docstranslation
* Documentation https://docs.fedoraproject.org/
* FAS2Discourse
* Fedmsg
* GeoIP https://geoip.fedoraproject.org/
* Ipsilon website
* Kerneltest https://apps.fedoraproject.org/kerneltest
@ -86,7 +85,6 @@ presented in our xref:sle.adoc[SLE Documentation].
* Elections https://apps.fedoraproject.org/#Elections
* FedoCal https://apps.fedoraproject.org/#FedoCal
* Fedora People https://fedorapeople.org/
* github2fedmsg https://apps.fedoraproject.org/github2fedmsg
* Meetbot https://apps.fedoraproject.org/#Meetbot
* Packager dashboard https://packager-dashboard.fedoraproject.org/
* Packages https://packages.fedoraproject.org/

View file

@ -224,8 +224,8 @@ variety of operating system/cloud combinations.
* https://pagure.io/sigul[sigul] -An automated gpg signing system
* https://github.com/rpm-software-management/mock/wiki[mock] -a tool for
building packages in prestine buildroots
* http://www.fedmsg.com/en/latest/[fedmsg] -Fedora Infrastructure
Message Bus
* https://fedora-messaging.readthedocs.io/en/stable/[Fedora Messaging]
-Fedora Infrastructure Message Bus
* https://github.com/rhinstaller/lorax[lorax] -tool to build install
trees and images
* http://www.openshift.org/[OpenShift] -Open Source Platform as a

View file

@ -17,8 +17,6 @@ created from a Dockerfile and builds on top of that base image.
| Future Items to Integrate |
+------------------------------+
| +--------------------------+ |
| |PDC Integration | |
| +--------------------------+ |
| |New Hotness | |
| +--------------------------+ |
| |Other??? | |
@ -62,7 +60,7 @@ created from a Dockerfile and builds on top of that base image.
| | |
[docker images] | |
| | |
| [fedmsg] |
| [fedora messaging] |
+---------------+-----------+ | |
| | | +---------------+
| +----------------------+ | | |
@ -115,7 +113,7 @@ The main aspects of the Layered Image Build System are:
* A docker registry
** docker-distribution
* Taskotron
* fedmsg
* Fedora messaging
* RelEng Automation
The build system is setup such that Fedora Layered Image maintainers
@ -142,9 +140,9 @@ world verifying that all sources of information come from Fedora.
Completed layered image builds are hosted in a candidate docker registry
which is then used to pull the image and perform tests with
https://taskotron.fedoraproject.org/[Taskotron]. The taskotron tests are
triggered by a http://www.fedmsg.com/en/latest/[fedmsg] message that is
emitted from https://fedoraproject.org/wiki/Koji[Koji] once the build is
complete. Once the test is complete, taskotron will send fedmsg which is
triggered by a https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging]
message that is emitted from https://fedoraproject.org/wiki/Koji[Koji] once the build is
complete. Once the test is complete, taskotron will send fedora message which is
then caught by the [.title-ref]#RelEng Automation# Engine that will run
the Automatic Release tasks in order to push the layered image into a
stable docker registry in the production space for end users to consume.
@ -230,13 +228,13 @@ be held in DistGit and maintained by the Layered Image maintainers.
https://pagure.io/releng-automation[RelEng Automation] is an ongoing
effort to automate as much of the RelEng process as possible by using
http://ansible.com/[Ansible] and being driven by
http://www.fedmsg.com/en/latest/[fedmsg] via
https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging] via
https://github.com/maxamillion/loopabull[Loopabull] to execute Ansible
Playbooks based on fedmsg events.
Playbooks based on Fedora messaging events.
==== Robosignatory
https://pagure.io/robosignatory[Robosignatory] is a fedmsg consumer that
https://pagure.io/robosignatory[Robosignatory] is a Fedora messaging consumer that
automatically signs artifacts and will be used to automatically sign
docker layered images for verification by client tools as well as end
users.
@ -247,17 +245,9 @@ In the future various other components of the
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
will likely be incorporated.
===== PDC
https://pdc.fedoraproject.org/[PDC] is Fedora's implementation of
https://github.com/product-definition-center/product-definition-center[Product
Definition Center] which allows Fedora to maintain a database of each
Compose and all of it's contents in a way that can be queried and used
to make decisions in a programatic way.
===== The New Hotness
https://github.com/fedora-infra/the-new-hotness[The New Hotness] is a
http://www.fedmsg.com/en/latest/[fedmsg] consumer that listens to
release-monitoring.org and files bugzilla bugs in response (to notify
https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging] consumer
that listens to release-monitoring.org and files bugzilla bugs in response (to notify
packagers that they can update their packages).

View file

@ -1,114 +0,0 @@
== Fedora RelEng Workflow Automation
The Fedora RelEng Workflow Automation is a means to allow RelEng to
define a pattern by which Release Engineering work is automated in an
uniform fashion. The automation technology of choice is
https://ansible.com/[ansible] and the "workflow engine" is powered by
https://github.com/maxamillion/loopabull[loopabull], which is an event
loop that allows us to pass the information contained within a
http://www.fedmsg.com/en/latest/[fedmsg] and insert it into
https://ansible.com/[ansible]
https://docs.ansible.com/ansible/playbooks.html[playbooks]. This will
effectively create an event driven workflow that can take action
conditionally based on the contents of arbitrary
http://www.fedmsg.com/en/latest/[fedmsg] data.
Background on the topic can be found in the
https://fedoraproject.org/wiki/Changes/ReleaseEngineeringAutomationWorkflowEngine[Release
Engineering Automation Workflow Engine] Change proposal, as well as in
the https://pagure.io/releng-automation[releng-automation] pagure
repository.
=== RelEng Workflow Automation Architecture
By using http://www.fedmsg.com/en/latest/[fedmsg] as the source of
information feeding the event loop, we will configure
https://github.com/maxamillion/loopabull[loopabull] to listen for
specific
https://fedora-fedmsg.readthedocs.io/en/latest/topics.html[fedmsg
topics] which will correspond with https://ansible.com/[ansible]
https://docs.ansible.com/ansible/playbooks.html[playbooks]. When one of
the appropriate
https://fedora-fedmsg.readthedocs.io/en/latest/topics.html[fedmsg
topics] is encountered across the message bus, it's message payload is
then injected into the corresponding playbook as an extra set of
variables. A member of the Fedora Release Engineering Team can at that
point use this as a means to perform whatever arbitrary action or series
of actions they can otherwise perform with https://ansible.com/[ansible]
(including what we can enable via custom
https://docs.ansible.com/ansible/modules.html[modules]) based on the
input of the message payload.
The general overview of the architecture is below as well as a
description of how it works:
....
+------------+
| fedmsg |
| |
+---+--------+
| ^
| |
| |
| |
| |
| |
V |
+------------------+-----------------+
| |
| Release Engineering |
| Workflow Automation Engine |
| |
| - RabbitMQ |
| - fedmsg-rabbitmq-serializer |
| - loopabull |
| |
+----------------+-------------------+
|
|
|
|
V
+-----------------------+
| |
| composer/bodhi/etc |
| |
+-----------------------+
....
The flow of data will begin with an event somewhere in the
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
that sends a http://www.fedmsg.com/en/latest/[fedmsg] across the message
bus, then the messages will be taken in and serialized in to a
https://www.rabbitmq.com/[rabbitmq] worker queue using
https://pagure.io/fedmsg-rabbitmq-serializer[fedmsg-rabbitmq-serializer].
Then https://github.com/maxamillion/loopabull[loopabull] will be
listening to the rabbitmq worker queue for tasks to come in. Once a
message is recieved, it is processed and once it is either no-op'd or a
corresponding ansible playbook is run to completion, the message will be
`ack`'d and cleared from the worker queue. This will allow for us to
scale loopabull instances independently from the message queue as well
as ensure that work is not lost because of a downed or busy loopabull
instance. Also, as a point of note, the loopabull service instances will
be scaled using https://freedesktop.org/wiki/Software/systemd/[systemd]
https://fedoramagazine.org/systemd-template-unit-files/[unit templates].
Once a playbook has been triggered, it will run tasks on remote systems
on behalf of a loopabull automation user. These users can be privileged
if need be, however the scope of their privilege is based on the purpose
they serve. These user accounts are provisioned by the
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
Team based on the requirements of the
`RelEng Task Automation User Request Standard Operating
Procedure (SOP) <sop_requesting_task_automation_user>` document and
tasks are subject to code and security audit.
=== Fedora Lib RelEng
https://pagure.io/flr[Fedora Lib RelEng] (flr), is a library and set of
command line tools to expose the library that aims to provide re-usable
code for common tasks that need to be done in Release Engineering.
Combining this set of command line tools when necessary with the Release
Engineering Automation pipeline allows for easy separation of
permissions and responsibilities via sudo permissions on remote hosts.
This is explained in more detail on the project's pagure page.

View file

@ -162,15 +162,6 @@ OpenShift instance].
This section contains various issues encountered during deployment or
configuration changes and possible solutions.
=== Fedmsg messages aren't sent
*Issue:* Fedmsg messages aren't sent.
*Solution:* Set USER environment variable in pod.
*Explanation:* Fedmsg is using USER env variable as a username inside
messages. Without USER env set it just crashes and didn't send anything.
=== Cronjob is crashing
*Issue:* Cronjob pod is crashing on start, even after configuration

View file

@ -139,9 +139,8 @@ unexpected changes on servers (or playbooks).
We have in place a callback plugin that stores history for any
ansible-playbook runs and then sends a report each day to
sysadmin-logs-members with any CHANGED or FAILED actions. Additionally,
there's a fedmsg plugin that reports start and end of ansible playbook
runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of
sysadmin-logs-members with any CHANGED or FAILED actions.
Ansible also logs to syslog verbose reporting of
when and what commands and playbooks were run.
=== role based access control for playbooks

View file

@ -1,6 +1,6 @@
= datanommer SOP
Consume fedmsg bus activity and stuff it in a postgresql db.
Consume fedora messaging activity and stuff it in a postgresql db.
== Contact Information
@ -11,7 +11,7 @@ Contact::
Servers::
busgateway01
Purpose::
Save fedmsg bus activity
Save fedora messaging bus activity
== Description
@ -21,7 +21,7 @@ python-datanommer-models::
Schema definition and API for storing new items and querying existing
items
python-datanommer-consumer::
A plugin for the fedmsg-hub that actively listens to the bus and
A plugin for the fedora messaging that actively listens to the bus and
stores events.
datanommer-commands::
A set of CLI tools for querying the DB.

View file

@ -1,104 +0,0 @@
= fedmsg-gateway SOP
Outgoing raw ZeroMQ message stream.
[NOTE]
====
See also: <<fedmsg-websocket.adoc#>>
====
== Contact Information
Owner:::
Messaging SIG, Fedora Infrastructure Team
Contact:::
#fedora-apps, #fedora-admin, #fedora-noc
Servers:::
busgateway01, proxy0*
Purpose:::
Expose raw ZeroMQ messages outside the FI environment.
== Description
Users outside of Fedora Infrastructure can listen to the production
message bus by connecting to specific addresses. This is required for
local users to run their own hubs and message processors ("Consumers").
The specific public endpoints are:
production::
tcp://hub.fedoraproject.org:9940
staging::
tcp://stg.fedoraproject.org:9940
_fedmsg-gateway_, the daemon running on _busgateway01_, is listening to the
FI production fedmsg bus and will relay every message that it receives
out to a special ZMQ pub endpoint bound to port 9940. haproxy mediates
connections to the _fedmsg-gateway_ daemon.
== Connection Flow
Clients connect through haproxy on `proxy0*:9940` are redirected to
`busgateway0*:9940`. This can be found in the `haproxy.cfg` entry for
`listen fedmsg-raw-zmq 0.0.0.0:9940`.
This is different than the apache reverse proxy pass setup we have for
the _app0*_ and _packages0*_ machines. _That_ flow looks something like
this:
....
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
....
The flow for the raw zmq stream provided by _fedmsg-gateway_ looks
something like this:
....
Client -> haproxy(proxy01) -> fedmsg-gateway(busgateway01)
....
_haproxy_ is listening on a public port.
At the time of this writing, _haproxy_ does not actually load balance
zeromq session requests across multiple _busgateway0*_ machines, but there
is nothing stopping us from adding them. New hosts can be added in
ansible and pressed from _busgateway01_'s template. Add them to the
fedmsg-raw-zmq listen in _haproxy_'s config and it should Just Work.
== Increasing the Maximum Number of Concurrent Connections
HTTP requests are typically very short (a few seconds at most). This
means that the number of concurrent tcp connections we require for most
of our services is quite low (1024 is overkill). ZeroMQ tcp connections,
on the other hand, are expected to live for quite a long time.
Consequently we needed to scale up the number of possible concurrent tcp
connections.
All of this is in ansible and should be handled for us automatically if
we bring up new nodes.
* The pam_limits user limit for the fedmsg user was increased from 1024
to 160000 on _busgateway01_.
* The pam_limits user limit for the haproxy user was increased from 1024
to 160000 on the _proxy0*_ machines.
* The zeromq High Water Mark (HWM) was increased to 160000 on
_busgateway01_.
* The maximum number of connections allowed was increased in
`haproxy.cfg`.
== Nagios
New nagios checks were added for this that check to see if the number of
concurrent connections through haproxy is approaching the maximum number
allowed.
You can check these numbers by hand by inspecting the _haproxy_ web
interface: https://admin.fedoraproject.org/haproxy/proxy1#fedmsg-raw-zmq
Look at the "Sessions" section. "Cur" is the current number of sessions
versus "Max", the maximum number seen at the same time and "Limit", the
maximum number of concurrent connections allowed.
== RHIT
We had RHIT open up port 9940 special to _proxy01.iad2_ for this.

View file

@ -1,57 +0,0 @@
= fedmsg introduction and basics, SOP
General information about fedmsg
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
Almost all of them.
Purpose::
Introduce sysadmins to fedmsg tools and config
== Description
_fedmsg_ is a system that links together most of our webapps and services
into a message mesh or net (often called a "bus"). It is built on top of
the zeromq messaging library.
_fedmsg_ has its own developer documentation that is a good place to check
if this or other SOPs don't provide enough information -
http://fedmsg.rtfd.org
== Tools
Generally, _fedmsg-tail_ and _fedmsg-logger_ are the two most commonly used
tools for debugging and testing. To see if bus-connectivity exists
between two machines, log onto each of them and run the following on the
first:
....
$ echo testing from $(hostname) | fedmsg-logger
....
And run the following on the second:
....
$ fedmsg-tail --really-pretty
....
== Configuration
_fedmsg_ configuration lives in `/etc/fedmsg.d/`
`/etc/fedmsg.d/endpoints.py` keeps the list of every possible fedmsg
endpoint. It acts as a global index that defines the bus.
See https://fedmsg.readthedocs.org/en/stable/configuration/ for a full glossary of
configuration values.
== Logs
_fedmsg_ daemons keep their logs in `/var/log/fedmsg`. _fedmsg_ message hooks
in existing apps (like bodhi) will log any errors to the logs of the app
they've been added to (like `/var/log/httpd/error_log`).

View file

@ -1,73 +0,0 @@
= Adding a new fedmsg message type
== Instrumenting the program
First, figure out how you're going to publish the message? Is it from a
shell script or from a long running process?
If its from shell script, you need to just add a
_fedmsg-logger_ statement to the script. Remember to set the
_--modname_ and _--topic_ for your new message's
fully-qualified topic.
If its from a python process, you need to just add a
`fedmsg.publish(..)` call. The same concerns about modname and topic
apply here.
If this is a short-lived python process, you'll want to add
_active=True_ to the call to `fedmsg.publish(..)`. This will
make the _fedmsg_ lib "actively" reach out to our _fedmsg-relay_ running on
_busgateway01_.
If it is a long-running python process (like a WSGI thread), then you
don't need to pass any extra arguments. You don't want it to reach out
to the _fedmsg-relay_ if possible. Your process will require that some
"endpoints" are created for it in `/etc/fedmsg.d/`. More on that below.
== Supporting infrastructure
You need to make sure that the machine this is running on has a cert and
key that can be read by the program to sign its message. If you don't
have a cert already, then you need to create it in the private repo. Ask
a sysadmin-main member.
Then you need to declare those certs in the _fedmsg_certs
data structure stored typically in our ansible `group_vars/` for this
service. Declare both the name of the cert, what group and user it
should be owned by, and in the `can_send:` section, declare the list of
topics that this cert should be allowed to publish.
If this is a long-running python process that is _not_ passing
_active=True_ to the call to
`fedmsg.publish(..)`, then you have to also declare
endpoints for it. You do that by specifying the `fedmsg_wsgi_procs` and
`fedmsg_wsgi_vars` in the `group_vars` for your service. The iptables
rules and _fedmsg_ endpoints should be automatically created for you on
the next playbook run.
== Supporting code
At this point, you can push the change out to production and be
publishing messages "okay". Everything should be fine.
However, your message will show up blank in _datagrepper_, in matrix, and in
_FMN_, and everywhere else we try to render it. You _must_ then follow up
and write a new _Processor_ for it in the _fedmsg_meta_
library we maintain:
https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure
You also _must_ write a test case for it there. The docs listing all
topics we publish at http://fedora-fedmsg.rtfd.org/ is automatically
generated from the test suite. Please don't forget this.
Lastly, you should cut a release of _fedmsg_meta_ and deploy it using the
`playbooks/manual/upgrade/fedmsg.yml` playbook, which should
update all the relevant hosts.
== Corner cases
If the process publishing the new message lives _outside_ our main
network, you have to jump through more hoops. Look at _abrt_, _koschei_, and
_copr_ for examples of how to configure this (you need a special firewall
rule, and they need to be configured to talk to our "inbound gateway"
running on the proxies.

View file

@ -1,56 +0,0 @@
= fedmsg-relay SOP
Bridge ephemeral scripts into the fedmsg bus.
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
app01
Purpose::
Bridge ephemeral bash and python scripts into the fedmsg bus.
== Description
_fedmsg-relay_ is running on _app01_, which is a bad choice. We should look
to move it to a more isolated place in the future. _busgateway01_ would be
a better choice.
"Ephemeral" scripts like `pkgdb2branch.py`, the post-receive git hook on
_pkgs01_, and anywhere _fedmsg-logger_ is used all depend on _fedmsg-relay_.
Instead of emitting messages "directly" to the rest of the bus, they use
fedmsg-relay as an intermediary.
Check that _fedmsg-relay_ is running by looking for it in the process
list. You can restart it in the standard way with
`sudo service fedmsg-relay restart`. Check for its logs in
`/var/log/fedmsg/fedmsg-relay.log`
Ephemeral scripts know where the _fedmsg-relay_ is by looking for the
relay_inbound and relay_outbound values in the global fedmsg config.
== But What is it Doing? And Why?
The _fedmsg_ bus is designed to be "passive" in its normal operation. A
_mod_wsgi_ process under _httpd_ sets up its _fedmsg_ publisher socket to
passively emit messages on a certain port. When some other service wants
to receive these messages, it is up to that service to know where
_mod_wsgi_ is emitting and to actively connect there. In this way,
emitting is passive and listening is active.
We get a problem when we have a one-off or "ephemeral" script that is
not a long-running process -- a script like _pkgdb2branch_ which is run
when a user runs it and which ends shortly after. Listeners who want
these scripts messages will find that they are usually not available
when they try to connect.
To solve this problem, we introduced the "_fedmsg-relay_" daemon which is
a kind of "passive"-to-"passive" adaptor. It binds to an outbound port
on one end where it will publish messages (like normal) but it also
binds to an another port where it listens passively for inbound
messages. Ephemeral scripts then actively connect to the passive inbound
port of the _fedmsg-relay_ to have their payloads echoed on the
bus-proper.

View file

@ -1,70 +0,0 @@
= websocket SOP
websocket communication with Fedora apps.
See-also: <<fedmsg-gateway.adoc#>>
== Contact Information
Owner::
Messaging SIG, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
busgateway01, proxy0*, app0*
Purpose::
Expose a websocket server for FI apps to use
== Description
_WebSocket_ is a protocol (an extension of HTTP/1.1) by which client web
browsers can establish full-duplex socket communications with a server
--the "real-time web".
In our case, webapps served from _app0*_ and _packages0*_ will include
javascript code instructing client browsers to establish a second
connection to our _WebSocket_ server. They point browsers to the following
addresses:
production::
wss://hub.fedoraproject.org:9939
staging::
wss://stg.fedoraproject.org:9939
The websocket server itself is a _fedmsg-hub_ daemon running on
_busgateway01_. It is configured to enable its websocket server component
in the presence of certain configuration values.
_haproxy_ mediates connections to the _fedmsg-hub_ _websocket_ server daemon.
An _stunnel_ daemon provides SSL support.
== Connection Flow
The connection flow is much the same as in the <<fedmsg-gateway.adoc#>>,
but is somewhat more complicated.
"Normal" HTTP requests to our app servers traverse the following chain:
....
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
....
The flow for a websocket requests looks something like this:
....
Client -> stunnel(proxy01) -> haproxy(proxy01) -> fedmsg-hub(busgateway01)
....
stunnel is listening on a public port, negotiates the SSL connection,
and redirects the connection to haproxy who in turn hands it off to the
_fedmsg-hub_ websocket server listening on _busgateway01_.
At the time of this writing, _haproxy_ does not actually load balance
zeromq session requests across multiple _busgateway0*_ machines, but there
is nothing stopping us from adding them. New hosts can be added in
ansible and pressed from _busgateway01_'s template. Add them to the
_fedmsg-websockets_ listen in _haproxy_'s config and it should Just Work.
== RHIT
We had RHIT open up port 9939 special to _proxy01.iad2_ for this.

View file

@ -1,51 +0,0 @@
= github2fedmsg SOP
Bridge github events onto our fedmsg bus.
App: https://apps.fedoraproject.org/github2fedmsg/
Source: https://github.com/fedora-infra/github2fedmsg/
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-admin, #fedora-noc
Servers::
github2fedmsg01
Purpose::
Bridge github events onto our fedmsg bus.
== Description
github2fedmsg is a small Python Pyramid app that bridges github events
onto our fedmsg bus by way of github's "webhooks" feature. It is what
allows us to have notifications of github activity via fedmsg. It
has two phases of operation:
* Infrequently, a user will log in to github2fedmsg via Fedora OpenID.
They then push a button to also log in to github.com. They are then
logged in to github2fedmsg with _both_ their FAS account and their
github account.
+
They are then presented with a list of their github repositories. They
can toggle each one: "on" or "off". When they turn a repo on, our webapp
makes a request to github.com to install a "webhook" for that repo with
a callback URL to our app.
* When events happen to that repo on github.com, github looks up our
callback URL and makes an http POST request to us, informing us of the
event. Our github2fedmsg app receives that, validates it, and then
republishes the content to our fedmsg bus.
== What could go wrong?
* Restarting the app or rebooting the host shouldn't cause a problem. It
should come right back up.
* Our database could die. We have a db with a list of all the repos we
have turned on and off. We would want to restore that from backup.
* If github gets compromised, they might have to revoke all of their
application credentials. In that case, our app would fail to work. There
are _lots_ of private secrets set in our private repo that allow our app
to talk to github.com. There are inline comments there with instructions
about how to generate new keys and secrets.

View file

@ -98,11 +98,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
* xref:failedharddrive.adoc[Replacing Failed Hard Drives]
* xref:fas-openid.adoc[FAS-OpenID]
* xref:fedmsg-certs.adoc[fedmsg (Fedora Messaging) Certs, Keys, and CA]
* xref:fedmsg-gateway.adoc[fedmsg-gateway]
* xref:fedmsg-introduction.adoc[fedmsg introduction and basics]
* xref:fedmsg-new-message-type.adoc[Adding a new fedmsg message type]
* xref:fedmsg-relay.adoc[fedmsg-relay]
* xref:fedmsg-websocket.adoc[WebSocket]
* xref:fedocal.adoc[Fedocal]
* xref:fedora-releases.adoc[Fedora Release Infrastructure]
* xref:fedorawebsites.adoc[Websites Release]
@ -111,7 +106,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
* xref:gdpr_sar.adoc[GDPR SAR]
* xref:geoip-city-wsgi.adoc[geoip-city-wsgi]
* xref:github.adoc[Using github for Infra Projects]
* xref:github2fedmsg.adoc[github2fedmsg]
* xref:greenwave.adoc[Greenwave]
* xref:guest_migrate.adoc[Migrate Guest VMs]
* xref:guestdisk.adoc[Guest Disk Resize]
@ -139,7 +133,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
* xref:mailman.adoc[Mailman Infrastructure]
* xref:massupgrade.adoc[Mass Upgrade Infrastructure]
* xref:mastermirror.adoc[Master Mirror Infrastructure]
* xref:mbs.adoc[Module Build Service Infra]
* xref:memcached.adoc[Memcached Infrastructure]
* xref:message-tagging-service.adoc[Message Tagging Service]
* xref:mini_initiatives.adoc[Mini initiative Process]
@ -152,13 +145,11 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
* xref:new-virtual-hosts.adoc[Virtual Host Addition]
* xref:nonhumanaccounts.adoc[Non-human Accounts Infrastructure]
* xref:openshift_sops.adoc[Openshift SOPs]
* xref:odcs.adoc[On Demand Compose Service]
* xref:openqa.adoc[OpenQA Infrastructure]
* xref:openvpn.adoc[OpenVPN]
* xref:outage.adoc[Outage Infrastructure]
* xref:packagereview.adoc[Package Review]
* xref:pagure.adoc[Pagure Infrastructure]
* xref:pdc.adoc[PDC]
* xref:pesign-upgrade.adoc[Pesign upgrades/reboots]
* xref:planetsubgroup.adoc[Planet Subgroup Infrastructure]
* xref:publictest-dev-stg-production.adoc[Machine Classes]

View file

@ -1,204 +0,0 @@
= Module Build Service Infra SOP
The MBS is a build orchestrator on top of Koji for "modules".
https://fedoraproject.org/wiki/Changes/ModuleBuildService
== Contact Information
Owner::
Release Engineering Team, Infrastructure Team
Contact::
fedora-admin, #fedora-releng
Persons::
jkaluza, fivaldi, breilly, mikem
Public addresses::
* mbs.fedoraproject.org
Servers::
* mbs-frontend0[1-2].iad2.fedoraproject.org
* mbs-backend01.iad2.fedoraproject.org
Purpose::
Build modules for Fedora.
== Description
Users submit builds to _mbs.fedoraproject.org_ referencing their modulemd
file in https://src.fedoraproject.org/[dist-git]. (In the future,
users will not submit their own module
builds. The _freshmaker_ daemon (running in infrastructure)
will watch for `.spec` file changes and `modulemd.yaml` file changes -- it
will submit the relevant module builds to the MBS on behalf of users.)
The request to build a module is received by the MBS flask app running
on the `mbs-frontend` nodes.
Cursory validation of the submitted modulemd is performed on the
frontend: are the named packages valid? Are their branches valid? The
MBS keeps a copy of the modulemd and appends additional data describing
which branches pointed to which hashes at the time of submission.
A fedmsg from the frontend triggers the backend to start building the
module. First, tags and build/srpm-build groups are created. Then, a
module-build-macros package is synthesized and submitted as an srpm
build. When it is complete and available in the buildroot, the rest of
the rpm builds are submitted.
These are grouped and limited in two ways:
* First, there is a global `NUM_CONCURRENT_BUILDS` config option that
controls how many koji builds the MBS is allowed to have open at any
time. It serves as a throttle.
* Second, a given module may specify that it's components should have a
certain "build order". If there are 50 components, it may say that the
first 25 of them are in one buildorder batch, and the second 25 are in
another buildorder batch. The first batch will be submitted and, when
complete, tagged back into the buildroot. Only after they are available
will the second batch of 25 begin.
When the last component is complete, the MBS backend marks the build as
"done", and then marks it again as "ready". (There is currently no
meaning to the "ready" state beyond "done". We reserved that state for
future CI interactions.)
== Observing MBS Behavior
=== The mbs-build command
The https://pagure.io/fm-orchestrator[fm-orchestrator repo] and the
_module-build-service_ package provide an
_mbs-build_ command with a few subcommands. For general
help:
....
$ mbs-build --help
....
To generate a report of all currently active module builds:
....
$ mbs-build overview
ID State Submitted Components Owner Module
---- ------- -------------------- ------------ ------- -----------------------------------
570 build 2017-06-01T17:18:11Z 35/134 psabata shared-userspace-f26-20170601141014
569 build 2017-06-01T14:18:04Z 14/15 mkocka mariadb-f26-20170601141728
....
To generate a report of an individual module build, given its ID:
....
$ mbs-build info 569
NVR State Koji Task
---------------------------------------------- -------- ------------------------------------------------------------
libaio-0.3.110-7.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803741
BUILDING https://koji.fedoraproject.org/koji/taskinfo?taskID=19804081
libedit-3.1-17.20160618cvs.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803745
compat-openssl10-1.0.2j-6.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803746
policycoreutils-2.6-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803513
selinux-policy-3.13.1-255.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803748
systemtap-3.1-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803742
libcgroup-0.41-11.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685834
net-tools-2.0-0.42.20160912git.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19804010
time-1.7-52.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803747
desktop-file-utils-0.23-3.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685835
libselinux-2.6-6.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685833
module-build-macros-0.1-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803333
checkpolicy-2.6-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803514
dbus-glib-0.108-2.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685836
....
To actively watch a module build in flight, given its ID:
....
$ mbs-build watch 570
Still building:
libXrender https://koji.fedoraproject.org/koji/taskinfo?taskID=19804885
libXdamage https://koji.fedoraproject.org/koji/taskinfo?taskID=19805153
Failed:
libXxf86vm https://koji.fedoraproject.org/koji/taskinfo?taskID=19804903
Summary:
2 components in the BUILDING state
34 components in the COMPLETE state
1 components in the FAILED state
97 components in the undefined state
psabata's build #570 of shared-userspace-f26 is in the "build" state
....
=== The releng repo
There are more tools located in the `scripts/mbs/` directory
of the releng repo: https://pagure.io/releng/blob/main/f/scripts/mbs
== Cancelling a module build
Users can cancel their own module builds with:
....
$ mbs-build cancel $BUILD_ID
....
MBS admins can also cancel builds of any user.
[NOTE]
====
MBS admins are defined as members of the groups listed in the
`ADMIN_GROUPS` configuration options in
`roles/mbs/common/templates/config.py`.
====
== Logs
The frontend logs are on mbs-frontend0[1-2] in
`/var/log/httpd/error_log`.
The backend logs are on mbs-backend01. Look in the journal for the
`fedmsg-hub` service.
== Upgrading
The package in question is `module-build-service`. Please
use the `playbooks/manual/upgrade/mbs.yml` playbook.
== Managing Bootstrap Modules
In general, modules use other modules to define their buildroots, but
what defines the buildroot of the very first module? For this, we use
"bootstrap" modules which are manually selected. For some history on
this, see these tickets:
* https://pagure.io/releng/issue/6791
* https://pagure.io/fedora-infrastructure/issue/6097
The tag for a bootstrap module needs to be manually created and
populated by Release Engineering. Builds for that tag are curated and
selected from other Fedora tags, with care to ensure that only as many
builds are added as needed.
The existence of the tag is not enough for the bootstrap module to be
useable by MBS. MBS discovers the bootstrap module as a possible
dependency for other yet-to-be-built modules by querying PDC. During
normal operation, these entries in PDC are automatically created by
`pdc-updater` on _pdc-backend02_, but for the bootstrap tag they need to be
manually created and linked to the new bootstrap tag.
To be usable, you'll need a token with rights to speak to staging/prod
PDC. See the PDC SOP for information on client configuration in
`/etc/pdc.d/` and on where to find those tokens.
== Things that could go wrong
=== Overloading koji
If koji is overloaded, it should be acceptable to _stop_ the fedmsg-hub
daemon on _mbs-backend01_ at any time.
[NOTE]
====
As builds finish in koji, they will be _missed_ by the backend.. but
when it restarts it should find them in datagrepper. If that fails as
well, the mbs backend has a poller which should start up ~5 minutes
after startup that checks koji for anything it may have missed, at which
point it will resume functioning.
====
If koji continues to be overloaded after startup, try decreasing the
`NUM_CONCURRENT_BUILDS` option in the config file in
`roles/mbs/common/templates/`.

View file

@ -1,163 +0,0 @@
= On Demand Compose Service SOP
[NOTE]
====
The ODCS is very new and changing rapidly. We'll try to keep this up to
date as best we can.
====
The ODCS is a service generating temporary compose from Koji tag(s)
using Pungi.
== Contact Information
Owner::
Factory2 Team, Release Engineering Team, Infrastructure Team
Contact::
fedora-admin, #fedora-releng
Persons::
jkaluza, cqi, qwan, threebean
Public addresses::
* odcs.fedoraproject.org
Servers::
* odcs-frontend0[1-2].iad2.fedoraproject.org
* odcs-backend01.iad2.fedoraproject.org
Purpose::
Generate temporary compose from Koji tag(s) using Pungi.
== Description
ODCS clients submit request for a compose to _odcs.fedoraproject.org_. The
requests are submitted using `python2-odcs-client` Python module or just
using plain JSON.
The request contains all the information needed to build a compose:
* *source type*: Type of compose source, for example "tag" or "module"
* *source*: Name of Koji tag or list of modules defined by
name-stream-version.
* *packages*: List of packages to include in a compose.
* *seconds to live*: Number of seconds after which the compose is removed
from the filesystem and is marked as "removed".
* *flags*: Various flags further defining the compose - for example the
"no_deps" flag saying that the *packages* dependencies
should not be included in a compose.
The request is received by the ODCS flask app running on odcs-frontend
nodes. The frontend does input validation of the request and then adds
the compose request to database with "wait" state and sends fedmsg
message about this event. The compose request gets its unique id which
can be used by a client to query its status using frontend REST API.
The odcs-backend node then handles the compose requests in "wait" state
and starts generating the compose using the Pungi tool. It does so by
generating all the configuration files for Pungi and executing "pungi"
executable. Backend also changes the compose request status to
"generating" and sends fedmsg message about this event.
The number of concurrent pungi processes can be set using the
_num_concurrent_pungi_ variable in ODCS configuration file.
The output directory for a compose is shared between frontend and
backend node. Once the compose is generated, the backend changes the
status of compose request to "done" and again sends fedmsg message about
this event.
The shared directory with a compose is available using httpd on the
frontend node and ODCS client can access the generated compose. By
default this is on https://odcs.fedoraproject.org/composes/ URL.
If the compose generation goes wrong, the backend changes the state of
the compose request to "failed" and again sends fedmsg message about
this event. The "failed" compose is still available for
*seconds to live* time in the shared directory for further
examination of pungi logs if needed.
After the *seconds to live* time, the backend node removes
the compose from filesystem and changes the state of compose request to
"removed".
If there are compose requests for the very same composes, the ODCS will
reuse older compose instead of generating new one and points the new
compose to older one.
The "removed" compose can be renewed by a client to generate the same
compose as in the past. The *seconds to live* attribute of a
compose can be extended by a client when needed.
== Observing ODCS Behavior
There is currently no command line tool to query ODCS, but ODCS provides
REST API which can be used to observe the ODCS behavior. This is
available on https://odcs.fedoraproject.org/api/1/composes.
The API can be filtered by following keys entered as HTTP GET variables:
* owner
* source_type
* source
* state
It is also possible to see all the current composes in the compose
output directory, which is available on the frontend on
https://odcs.fedoraproject.org/composes.
== Removing compose before its expiration time
Members of FAS group defined in the _admins_ section of ODCS
configuration can remove any compose by sending DELETE request to
following URL:
https://odcs.fedoraproject.org/api/1/composes/$compose_id
== Logs
The frontend logs are on odcs-frontend0[1-2] in
`/var/log/httpd/error_log` or `/var/log/httpd/ssl_error_log`.
The backend logs are on odcs-backend01. Look in the journal for the
_odcs-backend_ service.
== Upgrading
The package in question is _odcs-server_. Please use the
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/manual/upgrade/odcs.yml[playbooks/manual/upgrade/odcs.yml]
playbook.
== Things that could go wrong
=== Not enough space on shared volume
In case there are too many composes, member of FAS group defined in the
ODCS configuration file _admins_ section should:
* Remove the oldest composes to get some free space immediatelly. List
of such composes can be found on
https://odcs.fedoraproject.org/composes/ by sorting by Last modified
fields.
* Decrease the *max_seconds_to_live* in ODCS configuration
file.
=== The OIDC token expires
This will cause the cron job to fail on the backend. Tokens have a lifetime of one year, and should be therefore periodically regenerated.
To regenerate the token, run the following command in the ansible repo:
....
scripts/generate-oidc-token odcs-prod -e 365 -u releng-odcs@service -s https://id.fedoraproject.org/scope/groups -s https://pagure.io/odcs/new-compose -s https://pagure.io/odcs/renew-compose -s https://pagure.io/odcs/delete-compose
....
Follow the instructions given by the script: run the SQL command on the Ipsilon database server:
....
ssh db-fas01.iad2.fedoraproject.org
sudo -u postgres -i ipsilon
ipsilon=# BEGIN;
[...]
ipsilon=# COMMIT;
....
Save the value of the token generated by the script in the ansible-private repo under `files/releng/production/releng-odcs-oidc-token`.
Deploy the change by running the `playbooks/groups/odcs.yml` playbook.

View file

@ -468,12 +468,3 @@ but is run on the openQA servers as it seems like as good a place as any
to do it. As with all other message consumers, if making manual changes
or updates to the components, remember to restart the consumer service
afterwards.
== Autocloud ResultsDB forwarder (autocloudreporter)
An ansible role called `autocloudreporter` also runs on the openQA
production server. This has nothing to do with openQA at all, but is run
there for convenience. This role deploys a fedmsg consumer that listens
for fedmsgs indicating that Autocloud (a separate automated test system
which tests cloud images) has completed a test run, then forwards those
results to ResultsDB.

View file

@ -1,185 +0,0 @@
= PDC SOP
Store metadata about composes we produce and "component groups".
App: https://pdc.fedoraproject.org/
Source for frontend: https://github.com/product-definition-center/product-definition-center
Source for backend: https://github.com/fedora-infra/pdc-updater
== Contact Information
Owner::
Release Engineering, Fedora Infrastructure Team
Contact::
#fedora-apps, #fedora-releng, #fedora-admin, #fedora-noc
Servers::
pdc-web0\{1,2}, pdc-backend01
Purpose::
Store metadata about composes and "component groups"
== Description
The Product Definition Center (PDC) is a webapp and API designed for
storing and querying product metadata. We automatically populate our
instance with data from our existing releng tools/processes. It doesn't
do much on its own, but the goal is to enable us to develop more sane
tooling down the road for future releases.
The webapp is a django app running on pdc-web0\{1,2}. Unlike most of our
other apps, it does not use OpenID for authentication, but it instead
uses SAML2. It uses _mod_auth_mellon_ to achieve this (in
cooperation with ipsilon). The webapp allows new data to be POST'd to it
by admin users.
The backend is a _fedmsg-hub_ process running on
_pdc-backend01_. It listens for new composes over fedmsg and then POSTs
data about those composes to PDC. It also listens for changes to the
fedora atomic host git repo in pagure and updates "component groups" in
PDC to reflect what rpm components constitute fedora atomic host.
For long-winded history and explanation, see the original Change
document: https://fedoraproject.org/wiki/Changes/ProductDefinitionCenter
[NOTE]
====
PDC is being replaced by fpdc (Fedora Product Definition Center)
====
== Upgrading the Software
There is an upgrade playbook in `playbooks/manual/upgrade/pdc.yml` which
will upgrade both the frontend and the backend if new packages are
available. Database schema upgrades should be handled automatically with
a run of that playbook.
== Logs
Logs for the frontend are in `/var/log/httpd/error_log` on
pdc-web0\{1,2}.
Logs for the backend can be accessed with
`journalctl -u fedmsg-hub -f` on _pdc-backend01_.
== Restarting Services
The frontend runs under apache. So either `apachectl graceful`
or `systemctl restart httpd` should do it.
The backend runs as a _fedmsg-hub_, so
`systemctl restart fedmsg-hub` should restart it.
== Scripts
The _pdc-updater_ package (installed on _pdc-backend01_) provides three
scripts:
* `pdc-updater-audit`
* `pdc-updater-retry`
* `pdc-updater-initialize`
A possible failure scenario is that we will lose a fedmsg message and
the backend will not update the frontend with info about that compose.
To detect this, we provide the `pdc-updater-audit` command
(which gets run once daily by cron with emails sent to the releng-cron
list). It compare all of the entries in PDC with all of the entries in
kojipkgs and then raises an alert if there is a discrepancy.
Another possible failure scenario is that the fedmsg message is
published and received correctly, but there is some processing error
while handling it. The event occurred, but the import to the PDC db
failed. The `pdc-updater-audit` script should detect this
discrepancy, and then an admin will need to manually repair the problem
and retry the event with the `pdc-updater-retry` command.
If doomsday occurs and the whole thing is totally hosed, you can delete
the db and re-ingest all information available from releng with the
`pdc-updater-initialize` tool. (Creating the initial schema needs to
happen on pdc-web01 with the standard django settings.py commands.)
== Manually Updating Information
In general, you shouldn't have to do these things. `pdc-updater` will
automatically create new releases and update information, but if you
ever need to manipulate PDC data, you can do it with the _pdc-client_
tool. A copy is installed on _pdc-backend01_ and there are some
credentials there you'll need, so ssh there first.
Make sure that you are root so that you can read
`/etc/pdc.d/fedora.json`.
Try listing all of the releases:
....
$ pdc -s fedora release list
....
Deactivating an EOL release:
....
$ pdc -s fedora release update fedora-21-updates --deactivate
....
[NOTE]
====
There are lots more attribute you can manipulate on a release (you can
change the type, and rename them, etc..) See `pdc --help`
and `pdc release --help` for more information.
====
Listing all composes:
....
$ pdc -s fedora compose list
....
We're not sure yet how to flag a compose as the Gold compose, but when
we do, the answer should appear here:
https://github.com/product-definition-center/product-definition-center/issues/428
== Adding superusers
Some small group of release engineers need to be superuser to set eol
dates and add/remove components. You can grant them permissions to do
this via some direct database calls. First find out their email address
listed in fas, then login to _db01.iad2.fedoraproject.org_:
....
sudo -u postgresql psql pdc pdc-
# update kerb_auth_user set is_superuser = 'true' where email = 'usersemailfromfas';
....
The user will now have privs with their normal tokens.
== Updating SAML2 certificates
As stated previously, the authentication uses SAML2 with _mod_auth_mellon_ (as the Service Provider on PDC's side) and Ipsilon (as the Identity Provider). This form of authentication relies on SSL certificates and XML metadata.
PDC's certificates live in the _ansible-private_ repository, in `files/saml2/pdc{,.stg}.fedoraproject.org/certificate.{pem,key}`. They are generated from the PKI in `files/saml2/ {staging,production}/`. The certificates can be self-signed as long as they are properly embedded in the metadata XML file and this file is distributed identically to PDC and to Ipsilon.
To renew the certificate, generate a new one with the provided script in the _ansible-private_ repo:
....
$ files/saml2/staging/build-key-server pdc.stg.fedoraproject.org
$ mv files/saml2/staging/keys/pdc.stg.fedoraproject.org.crt files/saml2/pdc.stg.fedoraproject.org/certificate.pem
$ mv files/saml2/staging/keys/pdc.stg.fedoraproject.org.key files/saml2/pdc.stg.fedoraproject.org/certificate.key
....
And for production:
....
$ files/saml2/production/build-key-server pdc.fedoraproject.org
$ mv files/saml2/production/keys/pdc.fedoraproject.org.crt files/saml2/pdc.fedoraproject.org/certificate.pem
$ mv files/saml2/production/keys/pdc.fedoraproject.org.key files/saml2/pdc.fedoraproject.org/certificate.key
....
And commit the changes:
....
$ git commit -a -s -m "PDC: new certificate"
$ git pull --rebase
$ git push
....
Then run the PDC and the Ipsilon playbooks. The PDC playbook will push the new certificates and re-generate the `metadata.xml` file in `/etc/httpd/saml2/`. The Ipsilon playbook will retrieve this `metadata.xml` file from the PDC server and insert it into the `/etc/ipsilon/root/configuration.conf` file.

View file

@ -50,8 +50,7 @@ given permissions by virtual host.
The /pubsub virtual host is the generic publish-subscribe virtual host
used by most applications. Messages published via AMQP are sent to the
"amq.topic" exchange. Messages being bridged from fedmsg into AMQP are
sent via "zmq.topic".
"amq.topic" exchange.
==== /public_pubsub