infra-docs-fpo/modules/sysadmin_guide/pages/haproxy.adoc
Michal Konečný d405e732a6 Review haproxy SOP
Signed-off-by: Michal Konečný <mkonecny@redhat.com>
2021-09-02 13:17:57 +02:00

139 lines
6 KiB
Text

= Haproxy Infrastructure SOP
haproxy is an application that does load balancing at the tcp layer or
at the http layer. It can do generic tcp balancing but it does
specialize in http balancing. Our proxy servers are still running apache
and that is what our users connect to. But instead of using
mod_proxy_balancer and ProxyPass balancer://, we do a ProxyPass to
http://localhost:10001/ or http://localhost:10002/. haproxy must
be told to listen to an individual port for each farm. All haproxy farms
are listed in /etc/haproxy/haproxy.cfg.
== Contents
* <<_contact_information>>
* <<_how_it_works>>
* <<_configuration_example>>
* <<_stats>>
* <<_advanced_usage>>
== Contact Information
Owner:::
Fedora Infrastructure Team
Contact:::
#fedora-admin, sysadmin-main, sysadmin-web group
Location:::
Phoenix, Tummy, Telia
Servers:::
proxy1, proxy2, proxy3, proxy4, proxy5
Purpose:::
Provides load balancing from the proxy layer to our application layer.
== How it works
haproxy is a load balancer. If you're familiar, this section won't be
that interesting. haproxy in its normal usage acts just like a web
server. It listens on a port for requests. Unlike most webservers though
it then sends that request to one of our back end application servers
and sends the response back. This is referred to as reverse proxying. We
typically configure haproxy to send check to a specific url and look for
the response code. If this url isn't sent, it just does basic checks to
/. In most of our configurations we're using round robin balancing. IE,
request 1 goes to app1, request2 goes to app2, request 3 goes to app3
request 4 goes to app1, and the whole process repeats.
[WARNING]
====
These checks do add load to the app servers. As well as additional
connections. Be smart about which url you're checking as it gets checked
often. Also be sure to verify the application servers can handle your
new settings, monitor them closely for the hour or two after you make
changes.
====
== Configuration example
The below example is how our fedoraproject wiki could be configured.
Each application should have its own farm. Even though it may have an
identical configuration to another farm, this allows easy addition and
subtraction of specific nodes when we need them.:
....
listen fpo-wiki 0.0.0.0:10001
balance roundrobin
server app1 app1.fedora.iad2.redhat.com:80 check inter 2s rise 2 fall 5
server app2 app2.fedora.iad2.redhat.com:80 check inter 2s rise 2 fall 5
server app4 app4.fedora.iad2.redhat.com:80 backup check inter 2s rise 2 fall 5
option httpchk GET /wiki/Infrastructure
....
* The first line "listen ...." Says to create a farm called _fpo-wiki_.
Listening on all IP's on port 10001. _fpo-wiki_ can be arbitrary but make
it something obvious. Aside from that the important bit is :10001.
Always make sure that when creating a new farm, its listening on a
unique port. In Fedora's case we're starting at 10001, and moving up by
one. Just check the config file for the lowest open port above 10001.
* The next line _balance roundrobin_ says to use round robin balancing.
* The server lines each add a new node to the balancer farm. In this
case the wiki is being served from app1, app2 and app4. If the wiki is
available at http://app1.fedora.iad2.redhat.com/wiki/ Then this
config would be used in conjunction with "RewriteRule ^/wiki/(.*)
http://localhost:10001/wiki/$1 [P,L]".
* _server_ means we're adding a new node to the farm
* _app1_ is the worker name, it is analagous to fpo-wiki but should::
match shorthostname of the node to make it easy to follow.
* _app1.fedora.iad2.redhat.com:80_ is the hostname and port to be
contacted.
* _check_ means to check via bottom line "option httpchk GET
/wiki/Infrastructure" which will use /wiki/Infrastructure to verify the
wiki is working. If that URL fails, that entire node will be taken out
of the farm mix.
* _inter 2s_ means to check every 2 seconds. 2s is the same as 2000 in
this case.
* _rise 2_ means to not put this node back in the mix until it has had
two successful connections in a row. haproxy will continue to check
every 2 seconds whether a node is up or down
* _fall 5_ means to take a node out of the farm after 5 failures.
* _backup_ You'll notice that app4 has a _backup_ option. We don't
actually use this for the wiki but do for other farms. It basically
means to continue checking and treat this node like any other node but
don't send it any production traffic unless the other two nodes are
down.
All of these options can be tweaked so keep that in mind when changing
or building a new farm. There are other configuration options in this
file that are global. Please see the haproxy documentation for more
info:
....
/usr/share/doc/haproxy-1.3.14.6/haproxy-en.txt
....
== Stats
In order to view the stats for a farm please see the stats page. Each
proxy server has its own stats page since each one is running its own
haproxy server. To view the stats point your browser to
https://admin.fedoraproject.org/haproxy/shorthostname/ so proxy1 is at
https://admin.fedoraproject.org/haproxy/proxy1/ The trailing / is
important.
* https://admin.fedoraproject.org/haproxy/proxy1/
* https://admin.fedoraproject.org/haproxy/proxy2/
* https://admin.fedoraproject.org/haproxy/proxy3/
* https://admin.fedoraproject.org/haproxy/proxy4/
* https://admin.fedoraproject.org/haproxy/proxy5/
== Advanced Usage
haproxy has some more advanced usage that we've not needed to worry
about yet but is worth mentioning. For example, one could send users to
just one app server based on session id. If user A happened to hit app1
first and user B happened to hit app4 first. All subsequent requests for
user A would go to app1 and user B would go to app4. This is handy for
applications that cannot normally be balanced because of shared storage
needs or other locking issues. This won't solve all problems though and
can have negative affects for example when app1 goes down user A would
either lose their session, or be unable to work until app1 comes back
up. Please do some great testing before looking in to this option.