mm2-crawler: reduce database load during crawling

The two crawlers used to start 25 threads (each) every 12 hours to crawl
the mirrors. The second crawler was already started two hours later to
reduce the database load. This commit increases the delay of the second
crawler to 6 hours, so that every crawler has the MM database for 6
hours on its own. At the same time the number of parallel crawlers is
reduced from 25 to 20 to also reduce the load on the database.

In addition the crawl timeout has been increased from 3 to 4 hours. This
is related to the fact the especially pub/archive has grown and
pub/fedora with the addition of the modular tree. Crawl timeouts can now
be seen more often, which can lead to mirrors being auto-disabled.

The main reason for these changes is that it can be seen in the logs that
the actual crawling of the mirrors does not always require most of time
but updating the state of all directories of each mirror in the database
can take a very long time. By reducing the number of parallel accesses to
the database, in the best case from 50 to 20, the crawling should get
faster (hopefully).

Signed-off-by: Adrian Reber <adrian@lisas.de>
This commit is contained in:
Adrian Reber 2017-10-08 13:15:48 +02:00
parent 5864ea7d9d
commit 7146663478

View file

@ -1,7 +1,7 @@
# run the crawler twice a day
# logs sent to /var/log/mirrormanager/crawler.log and crawl/* by default
#
# [ "`hostname -s`" == "mm-crawler02" ] && sleep 2h is used to start the crawl
# [ "`hostname -s`" == "mm-crawler02" ] && sleep 6h is used to start the crawl
# later on the second crawler to reduce the number of parallel accesses to
# the database
#
@ -11,4 +11,4 @@
# wait for 5 minutes to give the crawler a chance to shutdown. After that the
# crawler is killed. To make sure we only end the cron started crawler we look
# for the following process "/usr/bin/python /usr/bin/mm2_crawler --threads 25".
0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 2h; pkill -14 -f "^/usr/bin/python2 -s /usr/bin/mm2_crawler --threads 25"; sleep 5m; pkill -9 -f "^/usr/bin/python2 -s /usr/bin/mm2_crawler --threads 25"; /usr/bin/mm2_crawler --threads 25 --timeout-minutes 180 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1
0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 6h; pkill -14 -f "^/usr/bin/python2 -s /usr/bin/mm2_crawler --threads 20"; sleep 5m; pkill -9 -f "^/usr/bin/python2 -s /usr/bin/mm2_crawler --threads 20"; /usr/bin/mm2_crawler --threads 20 --timeout-minutes 240 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1