Start the crawl later on the second crawler.

Even with rsync as crawl method some hosts are taking a very long time
to be crawled. The network connection with rsync is only open for a
short time, but with both crawlers reading and writing from the database
it takes a very long time until the status of all directories is
updated. Therefore this patch introduces a 3 hour delay of the crawl
on the second crawler. This could also be solved with two different
cron.d files; one for each crawler.
This commit is contained in:
Adrian Reber 2015-05-13 11:23:21 +00:00
parent 9a06295fdf
commit 703a46bada

View file

@ -1,4 +1,8 @@
# run the crawler twice a day
# logs sent to /var/log/mirrormanager/crawler.log and crawl/* by default
# 32GB of RAM is not enough for 75 threads, 38 seems to work so far
0 */12 * * * mirrormanager /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1
#
# [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h is used to start the crawl
# later on the second crawler to reduce the number of parallel accesses to
# the database
0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1