Add blockers to dl.fedoraproject.org

Looked at logs of servers being hit by the 'non-responsive' bots and
the following were hit heavily every day multiple times a day:

100006 nagios.fedoraproject.org-access.log
102150 koschei.fedoraproject.org-access.log
162296 lists.fedoraproject.org-access.log
495776 fedoraproject.org-access.log
850471 dl.fedoraproject.org-access.log

Added bloks to dl.fedoraproject to try and lower its hit rate. Others
need review from people who know their internals more.

Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
This commit is contained in:
Stephen Smoogen 2024-07-08 12:16:52 -04:00 committed by zlopez
parent 7e426dbf37
commit 969bbfcf2a
3 changed files with 13 additions and 5 deletions

View file

@ -3,8 +3,13 @@ RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "lftp"
RewriteRule ^.*$ https://fedoraproject.org/wiki/Infrastructure/Mirroring#Tools_to_avoid [R,L]
RewriteRule ^/$ /pub [R=302,L]
# Spiders-gone-wild
# These spiders may not follow robots.txt and will
# hit admin sections which consume large amounts of CPU
RewriteCond %{HTTP_USER_AGENT} ^.*(Bytespider|ClaudeBot|Amazonbot|YandexBot|ChatGLM-Spider|GPTBot|Barkrowler|YisouSpider|MJ12bot).*$ [NC]
RewriteRule .* - [F,L]
RewriteRule ^/$ /pub [R=302,L]
RedirectMatch 302 ^/pub/fedora/linux/atomic/(.*$) https://kojipkgs.fedoraproject.org/atomic/$1
RedirectMatch 302 ^/pub/fedora/linux/atomic https://kojipkgs.fedoraproject.org/atomic/

View file

@ -37,7 +37,7 @@ RewriteRule ^/$ /archives [R,L]
# Spiders-gone-wild
# These spiders may not follow robots.txt and will
# hit admin sections which consume large amounts of CPU
RewriteCond %{HTTP_USER_AGENT} ^.*(Bytespider|ClaudeBot|Amazonbot|YandexBot|claudebot|ChatGLM-Spider|GPTBot|Barkrowler|YisouSpider|MJ12bot).*$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^.*(Bytespider|ClaudeBot|Amazonbot|YandexBot|ChatGLM-Spider|GPTBot|Barkrowler|YisouSpider|MJ12bot).*$ [NC]
RewriteRule .* - [F,L]
# Old static archives

View file

@ -138,10 +138,13 @@ MaxConnectionsPerChild 1000
# RewriteEngine On
# RewriteCond %{REQUEST_URI} ^/fedora-web/websites$
# RewriteRule .* - [F]
# Reject Bytespider spider
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} .*Bytespider.*
RewriteRule .* - [F]
# Spiders-gone-wild
# These spiders may not follow robots.txt and will
# hit admin sections which consume large amounts of CPU
RewriteCond %{HTTP_USER_AGENT} ^.*(Bytespider|ClaudeBot|Amazonbot|YandexBot|ChatGLM-Spider|GPTBot|Barkrowler|YisouSpider|MJ12bot).*$ [NC]
RewriteRule .* - [F,L]
<Location /apache-status>
SetHandler server-status