Remove fedmsg and github2fedmsg from documentation
This commit removes all the documentation related to fedmsg and github2fedmsg. Removes all the mentions of fedmsg when it makes sense or change it to Fedora messaging. I didn't updated modules/releng_misc_guide/pages/sop_pushing_updates.adoc as this needs somebody with the knowledge of the process to update it. Signed-off-by: Michal Konecny <mkonecny@redhat.com>
This commit is contained in:
parent
2c72e82f01
commit
b2f3b6589a
19 changed files with 18 additions and 1136 deletions
|
@ -59,7 +59,6 @@ presented in our xref:sle.adoc[SLE Documentation].
|
|||
* Docstranslation
|
||||
* Documentation https://docs.fedoraproject.org/
|
||||
* FAS2Discourse
|
||||
* Fedmsg
|
||||
* GeoIP https://geoip.fedoraproject.org/
|
||||
* Ipsilon website
|
||||
* Kerneltest https://apps.fedoraproject.org/kerneltest
|
||||
|
@ -86,7 +85,6 @@ presented in our xref:sle.adoc[SLE Documentation].
|
|||
* Elections https://apps.fedoraproject.org/#Elections
|
||||
* FedoCal https://apps.fedoraproject.org/#FedoCal
|
||||
* Fedora People https://fedorapeople.org/
|
||||
* github2fedmsg https://apps.fedoraproject.org/github2fedmsg
|
||||
* Meetbot https://apps.fedoraproject.org/#Meetbot
|
||||
* Packager dashboard https://packager-dashboard.fedoraproject.org/
|
||||
* Packages https://packages.fedoraproject.org/
|
||||
|
|
|
@ -224,8 +224,8 @@ variety of operating system/cloud combinations.
|
|||
* https://pagure.io/sigul[sigul] -An automated gpg signing system
|
||||
* https://github.com/rpm-software-management/mock/wiki[mock] -a tool for
|
||||
building packages in prestine buildroots
|
||||
* http://www.fedmsg.com/en/latest/[fedmsg] -Fedora Infrastructure
|
||||
Message Bus
|
||||
* https://fedora-messaging.readthedocs.io/en/stable/[Fedora Messaging]
|
||||
-Fedora Infrastructure Message Bus
|
||||
* https://github.com/rhinstaller/lorax[lorax] -tool to build install
|
||||
trees and images
|
||||
* http://www.openshift.org/[OpenShift] -Open Source Platform as a
|
||||
|
|
|
@ -17,8 +17,6 @@ created from a Dockerfile and builds on top of that base image.
|
|||
| Future Items to Integrate |
|
||||
+------------------------------+
|
||||
| +--------------------------+ |
|
||||
| |PDC Integration | |
|
||||
| +--------------------------+ |
|
||||
| |New Hotness | |
|
||||
| +--------------------------+ |
|
||||
| |Other??? | |
|
||||
|
@ -62,7 +60,7 @@ created from a Dockerfile and builds on top of that base image.
|
|||
| | |
|
||||
[docker images] | |
|
||||
| | |
|
||||
| [fedmsg] |
|
||||
| [fedora messaging] |
|
||||
+---------------+-----------+ | |
|
||||
| | | +---------------+
|
||||
| +----------------------+ | | |
|
||||
|
@ -115,7 +113,7 @@ The main aspects of the Layered Image Build System are:
|
|||
* A docker registry
|
||||
** docker-distribution
|
||||
* Taskotron
|
||||
* fedmsg
|
||||
* Fedora messaging
|
||||
* RelEng Automation
|
||||
|
||||
The build system is setup such that Fedora Layered Image maintainers
|
||||
|
@ -142,9 +140,9 @@ world verifying that all sources of information come from Fedora.
|
|||
Completed layered image builds are hosted in a candidate docker registry
|
||||
which is then used to pull the image and perform tests with
|
||||
https://taskotron.fedoraproject.org/[Taskotron]. The taskotron tests are
|
||||
triggered by a http://www.fedmsg.com/en/latest/[fedmsg] message that is
|
||||
emitted from https://fedoraproject.org/wiki/Koji[Koji] once the build is
|
||||
complete. Once the test is complete, taskotron will send fedmsg which is
|
||||
triggered by a https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging]
|
||||
message that is emitted from https://fedoraproject.org/wiki/Koji[Koji] once the build is
|
||||
complete. Once the test is complete, taskotron will send fedora message which is
|
||||
then caught by the [.title-ref]#RelEng Automation# Engine that will run
|
||||
the Automatic Release tasks in order to push the layered image into a
|
||||
stable docker registry in the production space for end users to consume.
|
||||
|
@ -230,13 +228,13 @@ be held in DistGit and maintained by the Layered Image maintainers.
|
|||
https://pagure.io/releng-automation[RelEng Automation] is an ongoing
|
||||
effort to automate as much of the RelEng process as possible by using
|
||||
http://ansible.com/[Ansible] and being driven by
|
||||
http://www.fedmsg.com/en/latest/[fedmsg] via
|
||||
https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging] via
|
||||
https://github.com/maxamillion/loopabull[Loopabull] to execute Ansible
|
||||
Playbooks based on fedmsg events.
|
||||
Playbooks based on Fedora messaging events.
|
||||
|
||||
==== Robosignatory
|
||||
|
||||
https://pagure.io/robosignatory[Robosignatory] is a fedmsg consumer that
|
||||
https://pagure.io/robosignatory[Robosignatory] is a Fedora messaging consumer that
|
||||
automatically signs artifacts and will be used to automatically sign
|
||||
docker layered images for verification by client tools as well as end
|
||||
users.
|
||||
|
@ -247,17 +245,9 @@ In the future various other components of the
|
|||
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
|
||||
will likely be incorporated.
|
||||
|
||||
===== PDC
|
||||
|
||||
https://pdc.fedoraproject.org/[PDC] is Fedora's implementation of
|
||||
https://github.com/product-definition-center/product-definition-center[Product
|
||||
Definition Center] which allows Fedora to maintain a database of each
|
||||
Compose and all of it's contents in a way that can be queried and used
|
||||
to make decisions in a programatic way.
|
||||
|
||||
===== The New Hotness
|
||||
|
||||
https://github.com/fedora-infra/the-new-hotness[The New Hotness] is a
|
||||
http://www.fedmsg.com/en/latest/[fedmsg] consumer that listens to
|
||||
release-monitoring.org and files bugzilla bugs in response (to notify
|
||||
https://fedora-messaging.readthedocs.io/en/stable/[Fedora messaging] consumer
|
||||
that listens to release-monitoring.org and files bugzilla bugs in response (to notify
|
||||
packagers that they can update their packages).
|
||||
|
|
|
@ -1,114 +0,0 @@
|
|||
== Fedora RelEng Workflow Automation
|
||||
|
||||
The Fedora RelEng Workflow Automation is a means to allow RelEng to
|
||||
define a pattern by which Release Engineering work is automated in an
|
||||
uniform fashion. The automation technology of choice is
|
||||
https://ansible.com/[ansible] and the "workflow engine" is powered by
|
||||
https://github.com/maxamillion/loopabull[loopabull], which is an event
|
||||
loop that allows us to pass the information contained within a
|
||||
http://www.fedmsg.com/en/latest/[fedmsg] and insert it into
|
||||
https://ansible.com/[ansible]
|
||||
https://docs.ansible.com/ansible/playbooks.html[playbooks]. This will
|
||||
effectively create an event driven workflow that can take action
|
||||
conditionally based on the contents of arbitrary
|
||||
http://www.fedmsg.com/en/latest/[fedmsg] data.
|
||||
|
||||
Background on the topic can be found in the
|
||||
https://fedoraproject.org/wiki/Changes/ReleaseEngineeringAutomationWorkflowEngine[Release
|
||||
Engineering Automation Workflow Engine] Change proposal, as well as in
|
||||
the https://pagure.io/releng-automation[releng-automation] pagure
|
||||
repository.
|
||||
|
||||
=== RelEng Workflow Automation Architecture
|
||||
|
||||
By using http://www.fedmsg.com/en/latest/[fedmsg] as the source of
|
||||
information feeding the event loop, we will configure
|
||||
https://github.com/maxamillion/loopabull[loopabull] to listen for
|
||||
specific
|
||||
https://fedora-fedmsg.readthedocs.io/en/latest/topics.html[fedmsg
|
||||
topics] which will correspond with https://ansible.com/[ansible]
|
||||
https://docs.ansible.com/ansible/playbooks.html[playbooks]. When one of
|
||||
the appropriate
|
||||
https://fedora-fedmsg.readthedocs.io/en/latest/topics.html[fedmsg
|
||||
topics] is encountered across the message bus, it's message payload is
|
||||
then injected into the corresponding playbook as an extra set of
|
||||
variables. A member of the Fedora Release Engineering Team can at that
|
||||
point use this as a means to perform whatever arbitrary action or series
|
||||
of actions they can otherwise perform with https://ansible.com/[ansible]
|
||||
(including what we can enable via custom
|
||||
https://docs.ansible.com/ansible/modules.html[modules]) based on the
|
||||
input of the message payload.
|
||||
|
||||
The general overview of the architecture is below as well as a
|
||||
description of how it works:
|
||||
|
||||
....
|
||||
+------------+
|
||||
| fedmsg |
|
||||
| |
|
||||
+---+--------+
|
||||
| ^
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
V |
|
||||
+------------------+-----------------+
|
||||
| |
|
||||
| Release Engineering |
|
||||
| Workflow Automation Engine |
|
||||
| |
|
||||
| - RabbitMQ |
|
||||
| - fedmsg-rabbitmq-serializer |
|
||||
| - loopabull |
|
||||
| |
|
||||
+----------------+-------------------+
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
V
|
||||
+-----------------------+
|
||||
| |
|
||||
| composer/bodhi/etc |
|
||||
| |
|
||||
+-----------------------+
|
||||
....
|
||||
|
||||
The flow of data will begin with an event somewhere in the
|
||||
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
|
||||
that sends a http://www.fedmsg.com/en/latest/[fedmsg] across the message
|
||||
bus, then the messages will be taken in and serialized in to a
|
||||
https://www.rabbitmq.com/[rabbitmq] worker queue using
|
||||
https://pagure.io/fedmsg-rabbitmq-serializer[fedmsg-rabbitmq-serializer].
|
||||
Then https://github.com/maxamillion/loopabull[loopabull] will be
|
||||
listening to the rabbitmq worker queue for tasks to come in. Once a
|
||||
message is recieved, it is processed and once it is either no-op'd or a
|
||||
corresponding ansible playbook is run to completion, the message will be
|
||||
`ack`'d and cleared from the worker queue. This will allow for us to
|
||||
scale loopabull instances independently from the message queue as well
|
||||
as ensure that work is not lost because of a downed or busy loopabull
|
||||
instance. Also, as a point of note, the loopabull service instances will
|
||||
be scaled using https://freedesktop.org/wiki/Software/systemd/[systemd]
|
||||
https://fedoramagazine.org/systemd-template-unit-files/[unit templates].
|
||||
|
||||
Once a playbook has been triggered, it will run tasks on remote systems
|
||||
on behalf of a loopabull automation user. These users can be privileged
|
||||
if need be, however the scope of their privilege is based on the purpose
|
||||
they serve. These user accounts are provisioned by the
|
||||
https://fedoraproject.org/wiki/Infrastructure[Fedora Infrastructure]
|
||||
Team based on the requirements of the
|
||||
`RelEng Task Automation User Request Standard Operating
|
||||
Procedure (SOP) <sop_requesting_task_automation_user>` document and
|
||||
tasks are subject to code and security audit.
|
||||
|
||||
=== Fedora Lib RelEng
|
||||
|
||||
https://pagure.io/flr[Fedora Lib RelEng] (flr), is a library and set of
|
||||
command line tools to expose the library that aims to provide re-usable
|
||||
code for common tasks that need to be done in Release Engineering.
|
||||
Combining this set of command line tools when necessary with the Release
|
||||
Engineering Automation pipeline allows for easy separation of
|
||||
permissions and responsibilities via sudo permissions on remote hosts.
|
||||
This is explained in more detail on the project's pagure page.
|
|
@ -162,15 +162,6 @@ OpenShift instance].
|
|||
This section contains various issues encountered during deployment or
|
||||
configuration changes and possible solutions.
|
||||
|
||||
=== Fedmsg messages aren't sent
|
||||
|
||||
*Issue:* Fedmsg messages aren't sent.
|
||||
|
||||
*Solution:* Set USER environment variable in pod.
|
||||
|
||||
*Explanation:* Fedmsg is using USER env variable as a username inside
|
||||
messages. Without USER env set it just crashes and didn't send anything.
|
||||
|
||||
=== Cronjob is crashing
|
||||
|
||||
*Issue:* Cronjob pod is crashing on start, even after configuration
|
||||
|
|
|
@ -139,9 +139,8 @@ unexpected changes on servers (or playbooks).
|
|||
|
||||
We have in place a callback plugin that stores history for any
|
||||
ansible-playbook runs and then sends a report each day to
|
||||
sysadmin-logs-members with any CHANGED or FAILED actions. Additionally,
|
||||
there's a fedmsg plugin that reports start and end of ansible playbook
|
||||
runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of
|
||||
sysadmin-logs-members with any CHANGED or FAILED actions.
|
||||
Ansible also logs to syslog verbose reporting of
|
||||
when and what commands and playbooks were run.
|
||||
|
||||
=== role based access control for playbooks
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
= datanommer SOP
|
||||
|
||||
Consume fedmsg bus activity and stuff it in a postgresql db.
|
||||
Consume fedora messaging activity and stuff it in a postgresql db.
|
||||
|
||||
== Contact Information
|
||||
|
||||
|
@ -11,7 +11,7 @@ Contact::
|
|||
Servers::
|
||||
busgateway01
|
||||
Purpose::
|
||||
Save fedmsg bus activity
|
||||
Save fedora messaging bus activity
|
||||
|
||||
== Description
|
||||
|
||||
|
@ -21,7 +21,7 @@ python-datanommer-models::
|
|||
Schema definition and API for storing new items and querying existing
|
||||
items
|
||||
python-datanommer-consumer::
|
||||
A plugin for the fedmsg-hub that actively listens to the bus and
|
||||
A plugin for the fedora messaging that actively listens to the bus and
|
||||
stores events.
|
||||
datanommer-commands::
|
||||
A set of CLI tools for querying the DB.
|
||||
|
|
|
@ -1,104 +0,0 @@
|
|||
= fedmsg-gateway SOP
|
||||
|
||||
Outgoing raw ZeroMQ message stream.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
See also: <<fedmsg-websocket.adoc#>>
|
||||
====
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner:::
|
||||
Messaging SIG, Fedora Infrastructure Team
|
||||
Contact:::
|
||||
#fedora-apps, #fedora-admin, #fedora-noc
|
||||
Servers:::
|
||||
busgateway01, proxy0*
|
||||
Purpose:::
|
||||
Expose raw ZeroMQ messages outside the FI environment.
|
||||
|
||||
== Description
|
||||
|
||||
Users outside of Fedora Infrastructure can listen to the production
|
||||
message bus by connecting to specific addresses. This is required for
|
||||
local users to run their own hubs and message processors ("Consumers").
|
||||
|
||||
The specific public endpoints are:
|
||||
|
||||
production::
|
||||
tcp://hub.fedoraproject.org:9940
|
||||
staging::
|
||||
tcp://stg.fedoraproject.org:9940
|
||||
|
||||
_fedmsg-gateway_, the daemon running on _busgateway01_, is listening to the
|
||||
FI production fedmsg bus and will relay every message that it receives
|
||||
out to a special ZMQ pub endpoint bound to port 9940. haproxy mediates
|
||||
connections to the _fedmsg-gateway_ daemon.
|
||||
|
||||
== Connection Flow
|
||||
|
||||
Clients connect through haproxy on `proxy0*:9940` are redirected to
|
||||
`busgateway0*:9940`. This can be found in the `haproxy.cfg` entry for
|
||||
`listen fedmsg-raw-zmq 0.0.0.0:9940`.
|
||||
|
||||
This is different than the apache reverse proxy pass setup we have for
|
||||
the _app0*_ and _packages0*_ machines. _That_ flow looks something like
|
||||
this:
|
||||
|
||||
....
|
||||
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
|
||||
....
|
||||
|
||||
The flow for the raw zmq stream provided by _fedmsg-gateway_ looks
|
||||
something like this:
|
||||
|
||||
....
|
||||
Client -> haproxy(proxy01) -> fedmsg-gateway(busgateway01)
|
||||
....
|
||||
|
||||
_haproxy_ is listening on a public port.
|
||||
|
||||
At the time of this writing, _haproxy_ does not actually load balance
|
||||
zeromq session requests across multiple _busgateway0*_ machines, but there
|
||||
is nothing stopping us from adding them. New hosts can be added in
|
||||
ansible and pressed from _busgateway01_'s template. Add them to the
|
||||
fedmsg-raw-zmq listen in _haproxy_'s config and it should Just Work.
|
||||
|
||||
== Increasing the Maximum Number of Concurrent Connections
|
||||
|
||||
HTTP requests are typically very short (a few seconds at most). This
|
||||
means that the number of concurrent tcp connections we require for most
|
||||
of our services is quite low (1024 is overkill). ZeroMQ tcp connections,
|
||||
on the other hand, are expected to live for quite a long time.
|
||||
Consequently we needed to scale up the number of possible concurrent tcp
|
||||
connections.
|
||||
|
||||
All of this is in ansible and should be handled for us automatically if
|
||||
we bring up new nodes.
|
||||
|
||||
* The pam_limits user limit for the fedmsg user was increased from 1024
|
||||
to 160000 on _busgateway01_.
|
||||
* The pam_limits user limit for the haproxy user was increased from 1024
|
||||
to 160000 on the _proxy0*_ machines.
|
||||
* The zeromq High Water Mark (HWM) was increased to 160000 on
|
||||
_busgateway01_.
|
||||
* The maximum number of connections allowed was increased in
|
||||
`haproxy.cfg`.
|
||||
|
||||
== Nagios
|
||||
|
||||
New nagios checks were added for this that check to see if the number of
|
||||
concurrent connections through haproxy is approaching the maximum number
|
||||
allowed.
|
||||
|
||||
You can check these numbers by hand by inspecting the _haproxy_ web
|
||||
interface: https://admin.fedoraproject.org/haproxy/proxy1#fedmsg-raw-zmq
|
||||
|
||||
Look at the "Sessions" section. "Cur" is the current number of sessions
|
||||
versus "Max", the maximum number seen at the same time and "Limit", the
|
||||
maximum number of concurrent connections allowed.
|
||||
|
||||
== RHIT
|
||||
|
||||
We had RHIT open up port 9940 special to _proxy01.iad2_ for this.
|
|
@ -1,57 +0,0 @@
|
|||
= fedmsg introduction and basics, SOP
|
||||
|
||||
General information about fedmsg
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Messaging SIG, Fedora Infrastructure Team
|
||||
Contact::
|
||||
#fedora-apps, #fedora-admin, #fedora-noc
|
||||
Servers::
|
||||
Almost all of them.
|
||||
Purpose::
|
||||
Introduce sysadmins to fedmsg tools and config
|
||||
|
||||
== Description
|
||||
|
||||
_fedmsg_ is a system that links together most of our webapps and services
|
||||
into a message mesh or net (often called a "bus"). It is built on top of
|
||||
the zeromq messaging library.
|
||||
|
||||
_fedmsg_ has its own developer documentation that is a good place to check
|
||||
if this or other SOPs don't provide enough information -
|
||||
http://fedmsg.rtfd.org
|
||||
|
||||
== Tools
|
||||
|
||||
Generally, _fedmsg-tail_ and _fedmsg-logger_ are the two most commonly used
|
||||
tools for debugging and testing. To see if bus-connectivity exists
|
||||
between two machines, log onto each of them and run the following on the
|
||||
first:
|
||||
|
||||
....
|
||||
$ echo testing from $(hostname) | fedmsg-logger
|
||||
....
|
||||
|
||||
And run the following on the second:
|
||||
|
||||
....
|
||||
$ fedmsg-tail --really-pretty
|
||||
....
|
||||
|
||||
== Configuration
|
||||
|
||||
_fedmsg_ configuration lives in `/etc/fedmsg.d/`
|
||||
|
||||
`/etc/fedmsg.d/endpoints.py` keeps the list of every possible fedmsg
|
||||
endpoint. It acts as a global index that defines the bus.
|
||||
|
||||
See https://fedmsg.readthedocs.org/en/stable/configuration/ for a full glossary of
|
||||
configuration values.
|
||||
|
||||
== Logs
|
||||
|
||||
_fedmsg_ daemons keep their logs in `/var/log/fedmsg`. _fedmsg_ message hooks
|
||||
in existing apps (like bodhi) will log any errors to the logs of the app
|
||||
they've been added to (like `/var/log/httpd/error_log`).
|
|
@ -1,73 +0,0 @@
|
|||
= Adding a new fedmsg message type
|
||||
|
||||
== Instrumenting the program
|
||||
|
||||
First, figure out how you're going to publish the message? Is it from a
|
||||
shell script or from a long running process?
|
||||
|
||||
If its from shell script, you need to just add a
|
||||
_fedmsg-logger_ statement to the script. Remember to set the
|
||||
_--modname_ and _--topic_ for your new message's
|
||||
fully-qualified topic.
|
||||
|
||||
If its from a python process, you need to just add a
|
||||
`fedmsg.publish(..)` call. The same concerns about modname and topic
|
||||
apply here.
|
||||
|
||||
If this is a short-lived python process, you'll want to add
|
||||
_active=True_ to the call to `fedmsg.publish(..)`. This will
|
||||
make the _fedmsg_ lib "actively" reach out to our _fedmsg-relay_ running on
|
||||
_busgateway01_.
|
||||
|
||||
If it is a long-running python process (like a WSGI thread), then you
|
||||
don't need to pass any extra arguments. You don't want it to reach out
|
||||
to the _fedmsg-relay_ if possible. Your process will require that some
|
||||
"endpoints" are created for it in `/etc/fedmsg.d/`. More on that below.
|
||||
|
||||
== Supporting infrastructure
|
||||
|
||||
You need to make sure that the machine this is running on has a cert and
|
||||
key that can be read by the program to sign its message. If you don't
|
||||
have a cert already, then you need to create it in the private repo. Ask
|
||||
a sysadmin-main member.
|
||||
|
||||
Then you need to declare those certs in the _fedmsg_certs
|
||||
data structure stored typically in our ansible `group_vars/` for this
|
||||
service. Declare both the name of the cert, what group and user it
|
||||
should be owned by, and in the `can_send:` section, declare the list of
|
||||
topics that this cert should be allowed to publish.
|
||||
|
||||
If this is a long-running python process that is _not_ passing
|
||||
_active=True_ to the call to
|
||||
`fedmsg.publish(..)`, then you have to also declare
|
||||
endpoints for it. You do that by specifying the `fedmsg_wsgi_procs` and
|
||||
`fedmsg_wsgi_vars` in the `group_vars` for your service. The iptables
|
||||
rules and _fedmsg_ endpoints should be automatically created for you on
|
||||
the next playbook run.
|
||||
|
||||
== Supporting code
|
||||
|
||||
At this point, you can push the change out to production and be
|
||||
publishing messages "okay". Everything should be fine.
|
||||
|
||||
However, your message will show up blank in _datagrepper_, in matrix, and in
|
||||
_FMN_, and everywhere else we try to render it. You _must_ then follow up
|
||||
and write a new _Processor_ for it in the _fedmsg_meta_
|
||||
library we maintain:
|
||||
https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure
|
||||
|
||||
You also _must_ write a test case for it there. The docs listing all
|
||||
topics we publish at http://fedora-fedmsg.rtfd.org/ is automatically
|
||||
generated from the test suite. Please don't forget this.
|
||||
|
||||
Lastly, you should cut a release of _fedmsg_meta_ and deploy it using the
|
||||
`playbooks/manual/upgrade/fedmsg.yml` playbook, which should
|
||||
update all the relevant hosts.
|
||||
|
||||
== Corner cases
|
||||
|
||||
If the process publishing the new message lives _outside_ our main
|
||||
network, you have to jump through more hoops. Look at _abrt_, _koschei_, and
|
||||
_copr_ for examples of how to configure this (you need a special firewall
|
||||
rule, and they need to be configured to talk to our "inbound gateway"
|
||||
running on the proxies.
|
|
@ -1,56 +0,0 @@
|
|||
= fedmsg-relay SOP
|
||||
|
||||
Bridge ephemeral scripts into the fedmsg bus.
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Messaging SIG, Fedora Infrastructure Team
|
||||
Contact::
|
||||
#fedora-apps, #fedora-admin, #fedora-noc
|
||||
Servers::
|
||||
app01
|
||||
Purpose::
|
||||
Bridge ephemeral bash and python scripts into the fedmsg bus.
|
||||
|
||||
== Description
|
||||
|
||||
_fedmsg-relay_ is running on _app01_, which is a bad choice. We should look
|
||||
to move it to a more isolated place in the future. _busgateway01_ would be
|
||||
a better choice.
|
||||
|
||||
"Ephemeral" scripts like `pkgdb2branch.py`, the post-receive git hook on
|
||||
_pkgs01_, and anywhere _fedmsg-logger_ is used all depend on _fedmsg-relay_.
|
||||
Instead of emitting messages "directly" to the rest of the bus, they use
|
||||
fedmsg-relay as an intermediary.
|
||||
|
||||
Check that _fedmsg-relay_ is running by looking for it in the process
|
||||
list. You can restart it in the standard way with
|
||||
`sudo service fedmsg-relay restart`. Check for its logs in
|
||||
`/var/log/fedmsg/fedmsg-relay.log`
|
||||
|
||||
Ephemeral scripts know where the _fedmsg-relay_ is by looking for the
|
||||
relay_inbound and relay_outbound values in the global fedmsg config.
|
||||
|
||||
== But What is it Doing? And Why?
|
||||
|
||||
The _fedmsg_ bus is designed to be "passive" in its normal operation. A
|
||||
_mod_wsgi_ process under _httpd_ sets up its _fedmsg_ publisher socket to
|
||||
passively emit messages on a certain port. When some other service wants
|
||||
to receive these messages, it is up to that service to know where
|
||||
_mod_wsgi_ is emitting and to actively connect there. In this way,
|
||||
emitting is passive and listening is active.
|
||||
|
||||
We get a problem when we have a one-off or "ephemeral" script that is
|
||||
not a long-running process -- a script like _pkgdb2branch_ which is run
|
||||
when a user runs it and which ends shortly after. Listeners who want
|
||||
these scripts messages will find that they are usually not available
|
||||
when they try to connect.
|
||||
|
||||
To solve this problem, we introduced the "_fedmsg-relay_" daemon which is
|
||||
a kind of "passive"-to-"passive" adaptor. It binds to an outbound port
|
||||
on one end where it will publish messages (like normal) but it also
|
||||
binds to an another port where it listens passively for inbound
|
||||
messages. Ephemeral scripts then actively connect to the passive inbound
|
||||
port of the _fedmsg-relay_ to have their payloads echoed on the
|
||||
bus-proper.
|
|
@ -1,70 +0,0 @@
|
|||
= websocket SOP
|
||||
|
||||
websocket communication with Fedora apps.
|
||||
|
||||
See-also: <<fedmsg-gateway.adoc#>>
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Messaging SIG, Fedora Infrastructure Team
|
||||
Contact::
|
||||
#fedora-apps, #fedora-admin, #fedora-noc
|
||||
Servers::
|
||||
busgateway01, proxy0*, app0*
|
||||
Purpose::
|
||||
Expose a websocket server for FI apps to use
|
||||
|
||||
== Description
|
||||
|
||||
_WebSocket_ is a protocol (an extension of HTTP/1.1) by which client web
|
||||
browsers can establish full-duplex socket communications with a server
|
||||
--the "real-time web".
|
||||
|
||||
In our case, webapps served from _app0*_ and _packages0*_ will include
|
||||
javascript code instructing client browsers to establish a second
|
||||
connection to our _WebSocket_ server. They point browsers to the following
|
||||
addresses:
|
||||
|
||||
production::
|
||||
wss://hub.fedoraproject.org:9939
|
||||
staging::
|
||||
wss://stg.fedoraproject.org:9939
|
||||
|
||||
The websocket server itself is a _fedmsg-hub_ daemon running on
|
||||
_busgateway01_. It is configured to enable its websocket server component
|
||||
in the presence of certain configuration values.
|
||||
|
||||
_haproxy_ mediates connections to the _fedmsg-hub_ _websocket_ server daemon.
|
||||
An _stunnel_ daemon provides SSL support.
|
||||
|
||||
== Connection Flow
|
||||
|
||||
The connection flow is much the same as in the <<fedmsg-gateway.adoc#>>,
|
||||
but is somewhat more complicated.
|
||||
|
||||
"Normal" HTTP requests to our app servers traverse the following chain:
|
||||
|
||||
....
|
||||
Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01)
|
||||
....
|
||||
|
||||
The flow for a websocket requests looks something like this:
|
||||
|
||||
....
|
||||
Client -> stunnel(proxy01) -> haproxy(proxy01) -> fedmsg-hub(busgateway01)
|
||||
....
|
||||
|
||||
stunnel is listening on a public port, negotiates the SSL connection,
|
||||
and redirects the connection to haproxy who in turn hands it off to the
|
||||
_fedmsg-hub_ websocket server listening on _busgateway01_.
|
||||
|
||||
At the time of this writing, _haproxy_ does not actually load balance
|
||||
zeromq session requests across multiple _busgateway0*_ machines, but there
|
||||
is nothing stopping us from adding them. New hosts can be added in
|
||||
ansible and pressed from _busgateway01_'s template. Add them to the
|
||||
_fedmsg-websockets_ listen in _haproxy_'s config and it should Just Work.
|
||||
|
||||
== RHIT
|
||||
|
||||
We had RHIT open up port 9939 special to _proxy01.iad2_ for this.
|
|
@ -1,51 +0,0 @@
|
|||
= github2fedmsg SOP
|
||||
|
||||
Bridge github events onto our fedmsg bus.
|
||||
|
||||
App: https://apps.fedoraproject.org/github2fedmsg/
|
||||
|
||||
Source: https://github.com/fedora-infra/github2fedmsg/
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Fedora Infrastructure Team
|
||||
Contact::
|
||||
#fedora-apps, #fedora-admin, #fedora-noc
|
||||
Servers::
|
||||
github2fedmsg01
|
||||
Purpose::
|
||||
Bridge github events onto our fedmsg bus.
|
||||
|
||||
== Description
|
||||
|
||||
github2fedmsg is a small Python Pyramid app that bridges github events
|
||||
onto our fedmsg bus by way of github's "webhooks" feature. It is what
|
||||
allows us to have notifications of github activity via fedmsg. It
|
||||
has two phases of operation:
|
||||
|
||||
* Infrequently, a user will log in to github2fedmsg via Fedora OpenID.
|
||||
They then push a button to also log in to github.com. They are then
|
||||
logged in to github2fedmsg with _both_ their FAS account and their
|
||||
github account.
|
||||
+
|
||||
They are then presented with a list of their github repositories. They
|
||||
can toggle each one: "on" or "off". When they turn a repo on, our webapp
|
||||
makes a request to github.com to install a "webhook" for that repo with
|
||||
a callback URL to our app.
|
||||
* When events happen to that repo on github.com, github looks up our
|
||||
callback URL and makes an http POST request to us, informing us of the
|
||||
event. Our github2fedmsg app receives that, validates it, and then
|
||||
republishes the content to our fedmsg bus.
|
||||
|
||||
== What could go wrong?
|
||||
|
||||
* Restarting the app or rebooting the host shouldn't cause a problem. It
|
||||
should come right back up.
|
||||
* Our database could die. We have a db with a list of all the repos we
|
||||
have turned on and off. We would want to restore that from backup.
|
||||
* If github gets compromised, they might have to revoke all of their
|
||||
application credentials. In that case, our app would fail to work. There
|
||||
are _lots_ of private secrets set in our private repo that allow our app
|
||||
to talk to github.com. There are inline comments there with instructions
|
||||
about how to generate new keys and secrets.
|
|
@ -98,11 +98,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
|
|||
* xref:failedharddrive.adoc[Replacing Failed Hard Drives]
|
||||
* xref:fas-openid.adoc[FAS-OpenID]
|
||||
* xref:fedmsg-certs.adoc[fedmsg (Fedora Messaging) Certs, Keys, and CA]
|
||||
* xref:fedmsg-gateway.adoc[fedmsg-gateway]
|
||||
* xref:fedmsg-introduction.adoc[fedmsg introduction and basics]
|
||||
* xref:fedmsg-new-message-type.adoc[Adding a new fedmsg message type]
|
||||
* xref:fedmsg-relay.adoc[fedmsg-relay]
|
||||
* xref:fedmsg-websocket.adoc[WebSocket]
|
||||
* xref:fedocal.adoc[Fedocal]
|
||||
* xref:fedora-releases.adoc[Fedora Release Infrastructure]
|
||||
* xref:fedorawebsites.adoc[Websites Release]
|
||||
|
@ -111,7 +106,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
|
|||
* xref:gdpr_sar.adoc[GDPR SAR]
|
||||
* xref:geoip-city-wsgi.adoc[geoip-city-wsgi]
|
||||
* xref:github.adoc[Using github for Infra Projects]
|
||||
* xref:github2fedmsg.adoc[github2fedmsg]
|
||||
* xref:greenwave.adoc[Greenwave]
|
||||
* xref:guest_migrate.adoc[Migrate Guest VMs]
|
||||
* xref:guestdisk.adoc[Guest Disk Resize]
|
||||
|
@ -139,7 +133,6 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
|
|||
* xref:mailman.adoc[Mailman Infrastructure]
|
||||
* xref:massupgrade.adoc[Mass Upgrade Infrastructure]
|
||||
* xref:mastermirror.adoc[Master Mirror Infrastructure]
|
||||
* xref:mbs.adoc[Module Build Service Infra]
|
||||
* xref:memcached.adoc[Memcached Infrastructure]
|
||||
* xref:message-tagging-service.adoc[Message Tagging Service]
|
||||
* xref:mini_initiatives.adoc[Mini initiative Process]
|
||||
|
@ -152,13 +145,11 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
|
|||
* xref:new-virtual-hosts.adoc[Virtual Host Addition]
|
||||
* xref:nonhumanaccounts.adoc[Non-human Accounts Infrastructure]
|
||||
* xref:openshift_sops.adoc[Openshift SOPs]
|
||||
* xref:odcs.adoc[On Demand Compose Service]
|
||||
* xref:openqa.adoc[OpenQA Infrastructure]
|
||||
* xref:openvpn.adoc[OpenVPN]
|
||||
* xref:outage.adoc[Outage Infrastructure]
|
||||
* xref:packagereview.adoc[Package Review]
|
||||
* xref:pagure.adoc[Pagure Infrastructure]
|
||||
* xref:pdc.adoc[PDC]
|
||||
* xref:pesign-upgrade.adoc[Pesign upgrades/reboots]
|
||||
* xref:planetsubgroup.adoc[Planet Subgroup Infrastructure]
|
||||
* xref:publictest-dev-stg-production.adoc[Machine Classes]
|
||||
|
|
|
@ -1,204 +0,0 @@
|
|||
= Module Build Service Infra SOP
|
||||
|
||||
The MBS is a build orchestrator on top of Koji for "modules".
|
||||
|
||||
https://fedoraproject.org/wiki/Changes/ModuleBuildService
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Release Engineering Team, Infrastructure Team
|
||||
Contact::
|
||||
fedora-admin, #fedora-releng
|
||||
Persons::
|
||||
jkaluza, fivaldi, breilly, mikem
|
||||
Public addresses::
|
||||
* mbs.fedoraproject.org
|
||||
Servers::
|
||||
* mbs-frontend0[1-2].iad2.fedoraproject.org
|
||||
* mbs-backend01.iad2.fedoraproject.org
|
||||
Purpose::
|
||||
Build modules for Fedora.
|
||||
|
||||
== Description
|
||||
|
||||
Users submit builds to _mbs.fedoraproject.org_ referencing their modulemd
|
||||
file in https://src.fedoraproject.org/[dist-git]. (In the future,
|
||||
users will not submit their own module
|
||||
builds. The _freshmaker_ daemon (running in infrastructure)
|
||||
will watch for `.spec` file changes and `modulemd.yaml` file changes -- it
|
||||
will submit the relevant module builds to the MBS on behalf of users.)
|
||||
|
||||
The request to build a module is received by the MBS flask app running
|
||||
on the `mbs-frontend` nodes.
|
||||
|
||||
Cursory validation of the submitted modulemd is performed on the
|
||||
frontend: are the named packages valid? Are their branches valid? The
|
||||
MBS keeps a copy of the modulemd and appends additional data describing
|
||||
which branches pointed to which hashes at the time of submission.
|
||||
|
||||
A fedmsg from the frontend triggers the backend to start building the
|
||||
module. First, tags and build/srpm-build groups are created. Then, a
|
||||
module-build-macros package is synthesized and submitted as an srpm
|
||||
build. When it is complete and available in the buildroot, the rest of
|
||||
the rpm builds are submitted.
|
||||
|
||||
These are grouped and limited in two ways:
|
||||
|
||||
* First, there is a global `NUM_CONCURRENT_BUILDS` config option that
|
||||
controls how many koji builds the MBS is allowed to have open at any
|
||||
time. It serves as a throttle.
|
||||
* Second, a given module may specify that it's components should have a
|
||||
certain "build order". If there are 50 components, it may say that the
|
||||
first 25 of them are in one buildorder batch, and the second 25 are in
|
||||
another buildorder batch. The first batch will be submitted and, when
|
||||
complete, tagged back into the buildroot. Only after they are available
|
||||
will the second batch of 25 begin.
|
||||
|
||||
When the last component is complete, the MBS backend marks the build as
|
||||
"done", and then marks it again as "ready". (There is currently no
|
||||
meaning to the "ready" state beyond "done". We reserved that state for
|
||||
future CI interactions.)
|
||||
|
||||
== Observing MBS Behavior
|
||||
|
||||
=== The mbs-build command
|
||||
|
||||
The https://pagure.io/fm-orchestrator[fm-orchestrator repo] and the
|
||||
_module-build-service_ package provide an
|
||||
_mbs-build_ command with a few subcommands. For general
|
||||
help:
|
||||
|
||||
....
|
||||
$ mbs-build --help
|
||||
....
|
||||
|
||||
To generate a report of all currently active module builds:
|
||||
|
||||
....
|
||||
$ mbs-build overview
|
||||
ID State Submitted Components Owner Module
|
||||
---- ------- -------------------- ------------ ------- -----------------------------------
|
||||
570 build 2017-06-01T17:18:11Z 35/134 psabata shared-userspace-f26-20170601141014
|
||||
569 build 2017-06-01T14:18:04Z 14/15 mkocka mariadb-f26-20170601141728
|
||||
....
|
||||
|
||||
To generate a report of an individual module build, given its ID:
|
||||
|
||||
....
|
||||
$ mbs-build info 569
|
||||
NVR State Koji Task
|
||||
---------------------------------------------- -------- ------------------------------------------------------------
|
||||
libaio-0.3.110-7.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803741
|
||||
BUILDING https://koji.fedoraproject.org/koji/taskinfo?taskID=19804081
|
||||
libedit-3.1-17.20160618cvs.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803745
|
||||
compat-openssl10-1.0.2j-6.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803746
|
||||
policycoreutils-2.6-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803513
|
||||
selinux-policy-3.13.1-255.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803748
|
||||
systemtap-3.1-5.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803742
|
||||
libcgroup-0.41-11.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685834
|
||||
net-tools-2.0-0.42.20160912git.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19804010
|
||||
time-1.7-52.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803747
|
||||
desktop-file-utils-0.23-3.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685835
|
||||
libselinux-2.6-6.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685833
|
||||
module-build-macros-0.1-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803333
|
||||
checkpolicy-2.6-1.module_414736cc COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19803514
|
||||
dbus-glib-0.108-2.module_ea91dfb0 COMPLETE https://koji.fedoraproject.org/koji/taskinfo?taskID=19685836
|
||||
....
|
||||
|
||||
To actively watch a module build in flight, given its ID:
|
||||
|
||||
....
|
||||
$ mbs-build watch 570
|
||||
Still building:
|
||||
libXrender https://koji.fedoraproject.org/koji/taskinfo?taskID=19804885
|
||||
libXdamage https://koji.fedoraproject.org/koji/taskinfo?taskID=19805153
|
||||
Failed:
|
||||
libXxf86vm https://koji.fedoraproject.org/koji/taskinfo?taskID=19804903
|
||||
|
||||
Summary:
|
||||
2 components in the BUILDING state
|
||||
34 components in the COMPLETE state
|
||||
1 components in the FAILED state
|
||||
97 components in the undefined state
|
||||
psabata's build #570 of shared-userspace-f26 is in the "build" state
|
||||
....
|
||||
|
||||
=== The releng repo
|
||||
|
||||
There are more tools located in the `scripts/mbs/` directory
|
||||
of the releng repo: https://pagure.io/releng/blob/main/f/scripts/mbs
|
||||
|
||||
== Cancelling a module build
|
||||
|
||||
Users can cancel their own module builds with:
|
||||
|
||||
....
|
||||
$ mbs-build cancel $BUILD_ID
|
||||
....
|
||||
|
||||
MBS admins can also cancel builds of any user.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
MBS admins are defined as members of the groups listed in the
|
||||
`ADMIN_GROUPS` configuration options in
|
||||
`roles/mbs/common/templates/config.py`.
|
||||
====
|
||||
== Logs
|
||||
|
||||
The frontend logs are on mbs-frontend0[1-2] in
|
||||
`/var/log/httpd/error_log`.
|
||||
|
||||
The backend logs are on mbs-backend01. Look in the journal for the
|
||||
`fedmsg-hub` service.
|
||||
|
||||
== Upgrading
|
||||
|
||||
The package in question is `module-build-service`. Please
|
||||
use the `playbooks/manual/upgrade/mbs.yml` playbook.
|
||||
|
||||
== Managing Bootstrap Modules
|
||||
|
||||
In general, modules use other modules to define their buildroots, but
|
||||
what defines the buildroot of the very first module? For this, we use
|
||||
"bootstrap" modules which are manually selected. For some history on
|
||||
this, see these tickets:
|
||||
|
||||
* https://pagure.io/releng/issue/6791
|
||||
* https://pagure.io/fedora-infrastructure/issue/6097
|
||||
|
||||
The tag for a bootstrap module needs to be manually created and
|
||||
populated by Release Engineering. Builds for that tag are curated and
|
||||
selected from other Fedora tags, with care to ensure that only as many
|
||||
builds are added as needed.
|
||||
|
||||
The existence of the tag is not enough for the bootstrap module to be
|
||||
useable by MBS. MBS discovers the bootstrap module as a possible
|
||||
dependency for other yet-to-be-built modules by querying PDC. During
|
||||
normal operation, these entries in PDC are automatically created by
|
||||
`pdc-updater` on _pdc-backend02_, but for the bootstrap tag they need to be
|
||||
manually created and linked to the new bootstrap tag.
|
||||
|
||||
To be usable, you'll need a token with rights to speak to staging/prod
|
||||
PDC. See the PDC SOP for information on client configuration in
|
||||
`/etc/pdc.d/` and on where to find those tokens.
|
||||
|
||||
== Things that could go wrong
|
||||
|
||||
=== Overloading koji
|
||||
|
||||
If koji is overloaded, it should be acceptable to _stop_ the fedmsg-hub
|
||||
daemon on _mbs-backend01_ at any time.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
As builds finish in koji, they will be _missed_ by the backend.. but
|
||||
when it restarts it should find them in datagrepper. If that fails as
|
||||
well, the mbs backend has a poller which should start up ~5 minutes
|
||||
after startup that checks koji for anything it may have missed, at which
|
||||
point it will resume functioning.
|
||||
====
|
||||
If koji continues to be overloaded after startup, try decreasing the
|
||||
`NUM_CONCURRENT_BUILDS` option in the config file in
|
||||
`roles/mbs/common/templates/`.
|
|
@ -1,163 +0,0 @@
|
|||
= On Demand Compose Service SOP
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
The ODCS is very new and changing rapidly. We'll try to keep this up to
|
||||
date as best we can.
|
||||
====
|
||||
|
||||
The ODCS is a service generating temporary compose from Koji tag(s)
|
||||
using Pungi.
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Factory2 Team, Release Engineering Team, Infrastructure Team
|
||||
Contact::
|
||||
fedora-admin, #fedora-releng
|
||||
Persons::
|
||||
jkaluza, cqi, qwan, threebean
|
||||
Public addresses::
|
||||
* odcs.fedoraproject.org
|
||||
Servers::
|
||||
* odcs-frontend0[1-2].iad2.fedoraproject.org
|
||||
* odcs-backend01.iad2.fedoraproject.org
|
||||
Purpose::
|
||||
Generate temporary compose from Koji tag(s) using Pungi.
|
||||
|
||||
== Description
|
||||
|
||||
ODCS clients submit request for a compose to _odcs.fedoraproject.org_. The
|
||||
requests are submitted using `python2-odcs-client` Python module or just
|
||||
using plain JSON.
|
||||
|
||||
The request contains all the information needed to build a compose:
|
||||
|
||||
* *source type*: Type of compose source, for example "tag" or "module"
|
||||
* *source*: Name of Koji tag or list of modules defined by
|
||||
name-stream-version.
|
||||
* *packages*: List of packages to include in a compose.
|
||||
* *seconds to live*: Number of seconds after which the compose is removed
|
||||
from the filesystem and is marked as "removed".
|
||||
* *flags*: Various flags further defining the compose - for example the
|
||||
"no_deps" flag saying that the *packages* dependencies
|
||||
should not be included in a compose.
|
||||
|
||||
The request is received by the ODCS flask app running on odcs-frontend
|
||||
nodes. The frontend does input validation of the request and then adds
|
||||
the compose request to database with "wait" state and sends fedmsg
|
||||
message about this event. The compose request gets its unique id which
|
||||
can be used by a client to query its status using frontend REST API.
|
||||
|
||||
The odcs-backend node then handles the compose requests in "wait" state
|
||||
and starts generating the compose using the Pungi tool. It does so by
|
||||
generating all the configuration files for Pungi and executing "pungi"
|
||||
executable. Backend also changes the compose request status to
|
||||
"generating" and sends fedmsg message about this event.
|
||||
|
||||
The number of concurrent pungi processes can be set using the
|
||||
_num_concurrent_pungi_ variable in ODCS configuration file.
|
||||
|
||||
The output directory for a compose is shared between frontend and
|
||||
backend node. Once the compose is generated, the backend changes the
|
||||
status of compose request to "done" and again sends fedmsg message about
|
||||
this event.
|
||||
|
||||
The shared directory with a compose is available using httpd on the
|
||||
frontend node and ODCS client can access the generated compose. By
|
||||
default this is on https://odcs.fedoraproject.org/composes/ URL.
|
||||
|
||||
If the compose generation goes wrong, the backend changes the state of
|
||||
the compose request to "failed" and again sends fedmsg message about
|
||||
this event. The "failed" compose is still available for
|
||||
*seconds to live* time in the shared directory for further
|
||||
examination of pungi logs if needed.
|
||||
|
||||
After the *seconds to live* time, the backend node removes
|
||||
the compose from filesystem and changes the state of compose request to
|
||||
"removed".
|
||||
|
||||
If there are compose requests for the very same composes, the ODCS will
|
||||
reuse older compose instead of generating new one and points the new
|
||||
compose to older one.
|
||||
|
||||
The "removed" compose can be renewed by a client to generate the same
|
||||
compose as in the past. The *seconds to live* attribute of a
|
||||
compose can be extended by a client when needed.
|
||||
|
||||
== Observing ODCS Behavior
|
||||
|
||||
There is currently no command line tool to query ODCS, but ODCS provides
|
||||
REST API which can be used to observe the ODCS behavior. This is
|
||||
available on https://odcs.fedoraproject.org/api/1/composes.
|
||||
|
||||
The API can be filtered by following keys entered as HTTP GET variables:
|
||||
|
||||
* owner
|
||||
* source_type
|
||||
* source
|
||||
* state
|
||||
|
||||
It is also possible to see all the current composes in the compose
|
||||
output directory, which is available on the frontend on
|
||||
https://odcs.fedoraproject.org/composes.
|
||||
|
||||
== Removing compose before its expiration time
|
||||
|
||||
Members of FAS group defined in the _admins_ section of ODCS
|
||||
configuration can remove any compose by sending DELETE request to
|
||||
following URL:
|
||||
|
||||
https://odcs.fedoraproject.org/api/1/composes/$compose_id
|
||||
|
||||
== Logs
|
||||
|
||||
The frontend logs are on odcs-frontend0[1-2] in
|
||||
`/var/log/httpd/error_log` or `/var/log/httpd/ssl_error_log`.
|
||||
|
||||
The backend logs are on odcs-backend01. Look in the journal for the
|
||||
_odcs-backend_ service.
|
||||
|
||||
== Upgrading
|
||||
|
||||
The package in question is _odcs-server_. Please use the
|
||||
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/manual/upgrade/odcs.yml[playbooks/manual/upgrade/odcs.yml]
|
||||
playbook.
|
||||
|
||||
== Things that could go wrong
|
||||
|
||||
=== Not enough space on shared volume
|
||||
|
||||
In case there are too many composes, member of FAS group defined in the
|
||||
ODCS configuration file _admins_ section should:
|
||||
|
||||
* Remove the oldest composes to get some free space immediatelly. List
|
||||
of such composes can be found on
|
||||
https://odcs.fedoraproject.org/composes/ by sorting by Last modified
|
||||
fields.
|
||||
* Decrease the *max_seconds_to_live* in ODCS configuration
|
||||
file.
|
||||
|
||||
=== The OIDC token expires
|
||||
|
||||
This will cause the cron job to fail on the backend. Tokens have a lifetime of one year, and should be therefore periodically regenerated.
|
||||
|
||||
To regenerate the token, run the following command in the ansible repo:
|
||||
|
||||
....
|
||||
scripts/generate-oidc-token odcs-prod -e 365 -u releng-odcs@service -s https://id.fedoraproject.org/scope/groups -s https://pagure.io/odcs/new-compose -s https://pagure.io/odcs/renew-compose -s https://pagure.io/odcs/delete-compose
|
||||
....
|
||||
|
||||
Follow the instructions given by the script: run the SQL command on the Ipsilon database server:
|
||||
|
||||
....
|
||||
ssh db-fas01.iad2.fedoraproject.org
|
||||
sudo -u postgres -i ipsilon
|
||||
ipsilon=# BEGIN;
|
||||
[...]
|
||||
ipsilon=# COMMIT;
|
||||
....
|
||||
|
||||
Save the value of the token generated by the script in the ansible-private repo under `files/releng/production/releng-odcs-oidc-token`.
|
||||
|
||||
Deploy the change by running the `playbooks/groups/odcs.yml` playbook.
|
|
@ -468,12 +468,3 @@ but is run on the openQA servers as it seems like as good a place as any
|
|||
to do it. As with all other message consumers, if making manual changes
|
||||
or updates to the components, remember to restart the consumer service
|
||||
afterwards.
|
||||
|
||||
== Autocloud ResultsDB forwarder (autocloudreporter)
|
||||
|
||||
An ansible role called `autocloudreporter` also runs on the openQA
|
||||
production server. This has nothing to do with openQA at all, but is run
|
||||
there for convenience. This role deploys a fedmsg consumer that listens
|
||||
for fedmsgs indicating that Autocloud (a separate automated test system
|
||||
which tests cloud images) has completed a test run, then forwards those
|
||||
results to ResultsDB.
|
||||
|
|
|
@ -1,185 +0,0 @@
|
|||
= PDC SOP
|
||||
|
||||
Store metadata about composes we produce and "component groups".
|
||||
|
||||
App: https://pdc.fedoraproject.org/
|
||||
|
||||
Source for frontend: https://github.com/product-definition-center/product-definition-center
|
||||
|
||||
Source for backend: https://github.com/fedora-infra/pdc-updater
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Release Engineering, Fedora Infrastructure Team
|
||||
Contact::
|
||||
#fedora-apps, #fedora-releng, #fedora-admin, #fedora-noc
|
||||
Servers::
|
||||
pdc-web0\{1,2}, pdc-backend01
|
||||
Purpose::
|
||||
Store metadata about composes and "component groups"
|
||||
|
||||
== Description
|
||||
|
||||
The Product Definition Center (PDC) is a webapp and API designed for
|
||||
storing and querying product metadata. We automatically populate our
|
||||
instance with data from our existing releng tools/processes. It doesn't
|
||||
do much on its own, but the goal is to enable us to develop more sane
|
||||
tooling down the road for future releases.
|
||||
|
||||
The webapp is a django app running on pdc-web0\{1,2}. Unlike most of our
|
||||
other apps, it does not use OpenID for authentication, but it instead
|
||||
uses SAML2. It uses _mod_auth_mellon_ to achieve this (in
|
||||
cooperation with ipsilon). The webapp allows new data to be POST'd to it
|
||||
by admin users.
|
||||
|
||||
The backend is a _fedmsg-hub_ process running on
|
||||
_pdc-backend01_. It listens for new composes over fedmsg and then POSTs
|
||||
data about those composes to PDC. It also listens for changes to the
|
||||
fedora atomic host git repo in pagure and updates "component groups" in
|
||||
PDC to reflect what rpm components constitute fedora atomic host.
|
||||
|
||||
For long-winded history and explanation, see the original Change
|
||||
document: https://fedoraproject.org/wiki/Changes/ProductDefinitionCenter
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
PDC is being replaced by fpdc (Fedora Product Definition Center)
|
||||
====
|
||||
|
||||
== Upgrading the Software
|
||||
|
||||
There is an upgrade playbook in `playbooks/manual/upgrade/pdc.yml` which
|
||||
will upgrade both the frontend and the backend if new packages are
|
||||
available. Database schema upgrades should be handled automatically with
|
||||
a run of that playbook.
|
||||
|
||||
== Logs
|
||||
|
||||
Logs for the frontend are in `/var/log/httpd/error_log` on
|
||||
pdc-web0\{1,2}.
|
||||
|
||||
Logs for the backend can be accessed with
|
||||
`journalctl -u fedmsg-hub -f` on _pdc-backend01_.
|
||||
|
||||
== Restarting Services
|
||||
|
||||
The frontend runs under apache. So either `apachectl graceful`
|
||||
or `systemctl restart httpd` should do it.
|
||||
|
||||
The backend runs as a _fedmsg-hub_, so
|
||||
`systemctl restart fedmsg-hub` should restart it.
|
||||
|
||||
== Scripts
|
||||
|
||||
The _pdc-updater_ package (installed on _pdc-backend01_) provides three
|
||||
scripts:
|
||||
|
||||
* `pdc-updater-audit`
|
||||
* `pdc-updater-retry`
|
||||
* `pdc-updater-initialize`
|
||||
|
||||
A possible failure scenario is that we will lose a fedmsg message and
|
||||
the backend will not update the frontend with info about that compose.
|
||||
To detect this, we provide the `pdc-updater-audit` command
|
||||
(which gets run once daily by cron with emails sent to the releng-cron
|
||||
list). It compare all of the entries in PDC with all of the entries in
|
||||
kojipkgs and then raises an alert if there is a discrepancy.
|
||||
|
||||
Another possible failure scenario is that the fedmsg message is
|
||||
published and received correctly, but there is some processing error
|
||||
while handling it. The event occurred, but the import to the PDC db
|
||||
failed. The `pdc-updater-audit` script should detect this
|
||||
discrepancy, and then an admin will need to manually repair the problem
|
||||
and retry the event with the `pdc-updater-retry` command.
|
||||
|
||||
If doomsday occurs and the whole thing is totally hosed, you can delete
|
||||
the db and re-ingest all information available from releng with the
|
||||
`pdc-updater-initialize` tool. (Creating the initial schema needs to
|
||||
happen on pdc-web01 with the standard django settings.py commands.)
|
||||
|
||||
== Manually Updating Information
|
||||
|
||||
In general, you shouldn't have to do these things. `pdc-updater` will
|
||||
automatically create new releases and update information, but if you
|
||||
ever need to manipulate PDC data, you can do it with the _pdc-client_
|
||||
tool. A copy is installed on _pdc-backend01_ and there are some
|
||||
credentials there you'll need, so ssh there first.
|
||||
|
||||
Make sure that you are root so that you can read
|
||||
`/etc/pdc.d/fedora.json`.
|
||||
|
||||
Try listing all of the releases:
|
||||
|
||||
....
|
||||
$ pdc -s fedora release list
|
||||
....
|
||||
|
||||
Deactivating an EOL release:
|
||||
|
||||
....
|
||||
$ pdc -s fedora release update fedora-21-updates --deactivate
|
||||
....
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
There are lots more attribute you can manipulate on a release (you can
|
||||
change the type, and rename them, etc..) See `pdc --help`
|
||||
and `pdc release --help` for more information.
|
||||
====
|
||||
|
||||
Listing all composes:
|
||||
|
||||
....
|
||||
$ pdc -s fedora compose list
|
||||
....
|
||||
|
||||
We're not sure yet how to flag a compose as the Gold compose, but when
|
||||
we do, the answer should appear here:
|
||||
https://github.com/product-definition-center/product-definition-center/issues/428
|
||||
|
||||
== Adding superusers
|
||||
|
||||
Some small group of release engineers need to be superuser to set eol
|
||||
dates and add/remove components. You can grant them permissions to do
|
||||
this via some direct database calls. First find out their email address
|
||||
listed in fas, then login to _db01.iad2.fedoraproject.org_:
|
||||
|
||||
....
|
||||
sudo -u postgresql psql pdc pdc-
|
||||
# update kerb_auth_user set is_superuser = 'true' where email = 'usersemailfromfas';
|
||||
....
|
||||
|
||||
The user will now have privs with their normal tokens.
|
||||
|
||||
== Updating SAML2 certificates
|
||||
|
||||
As stated previously, the authentication uses SAML2 with _mod_auth_mellon_ (as the Service Provider on PDC's side) and Ipsilon (as the Identity Provider). This form of authentication relies on SSL certificates and XML metadata.
|
||||
|
||||
PDC's certificates live in the _ansible-private_ repository, in `files/saml2/pdc{,.stg}.fedoraproject.org/certificate.{pem,key}`. They are generated from the PKI in `files/saml2/ {staging,production}/`. The certificates can be self-signed as long as they are properly embedded in the metadata XML file and this file is distributed identically to PDC and to Ipsilon.
|
||||
|
||||
To renew the certificate, generate a new one with the provided script in the _ansible-private_ repo:
|
||||
|
||||
....
|
||||
$ files/saml2/staging/build-key-server pdc.stg.fedoraproject.org
|
||||
$ mv files/saml2/staging/keys/pdc.stg.fedoraproject.org.crt files/saml2/pdc.stg.fedoraproject.org/certificate.pem
|
||||
$ mv files/saml2/staging/keys/pdc.stg.fedoraproject.org.key files/saml2/pdc.stg.fedoraproject.org/certificate.key
|
||||
....
|
||||
|
||||
And for production:
|
||||
|
||||
....
|
||||
$ files/saml2/production/build-key-server pdc.fedoraproject.org
|
||||
$ mv files/saml2/production/keys/pdc.fedoraproject.org.crt files/saml2/pdc.fedoraproject.org/certificate.pem
|
||||
$ mv files/saml2/production/keys/pdc.fedoraproject.org.key files/saml2/pdc.fedoraproject.org/certificate.key
|
||||
....
|
||||
|
||||
And commit the changes:
|
||||
|
||||
....
|
||||
$ git commit -a -s -m "PDC: new certificate"
|
||||
$ git pull --rebase
|
||||
$ git push
|
||||
....
|
||||
|
||||
Then run the PDC and the Ipsilon playbooks. The PDC playbook will push the new certificates and re-generate the `metadata.xml` file in `/etc/httpd/saml2/`. The Ipsilon playbook will retrieve this `metadata.xml` file from the PDC server and insert it into the `/etc/ipsilon/root/configuration.conf` file.
|
|
@ -50,8 +50,7 @@ given permissions by virtual host.
|
|||
|
||||
The /pubsub virtual host is the generic publish-subscribe virtual host
|
||||
used by most applications. Messages published via AMQP are sent to the
|
||||
"amq.topic" exchange. Messages being bridged from fedmsg into AMQP are
|
||||
sent via "zmq.topic".
|
||||
"amq.topic" exchange.
|
||||
|
||||
==== /public_pubsub
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue