From 703a46badad38fe7ba06db70de88691dabfcf5ee Mon Sep 17 00:00:00 2001 From: Adrian Reber Date: Wed, 13 May 2015 11:23:21 +0000 Subject: [PATCH] Start the crawl later on the second crawler. Even with rsync as crawl method some hosts are taking a very long time to be crawled. The network connection with rsync is only open for a short time, but with both crawlers reading and writing from the database it takes a very long time until the status of all directories is updated. Therefore this patch introduces a 3 hour delay of the crawl on the second crawler. This could also be solved with two different cron.d files; one for each crawler. --- roles/mirrormanager/crawler/files/crawler.cron | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/roles/mirrormanager/crawler/files/crawler.cron b/roles/mirrormanager/crawler/files/crawler.cron index 3d695ca515..3f6d89495c 100644 --- a/roles/mirrormanager/crawler/files/crawler.cron +++ b/roles/mirrormanager/crawler/files/crawler.cron @@ -1,4 +1,8 @@ # run the crawler twice a day # logs sent to /var/log/mirrormanager/crawler.log and crawl/* by default # 32GB of RAM is not enough for 75 threads, 38 seems to work so far -0 */12 * * * mirrormanager /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1 +# +# [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h is used to start the crawl +# later on the second crawler to reduce the number of parallel accesses to +# the database +0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1