Start the crawl later on the second crawler.
Even with rsync as crawl method some hosts are taking a very long time to be crawled. The network connection with rsync is only open for a short time, but with both crawlers reading and writing from the database it takes a very long time until the status of all directories is updated. Therefore this patch introduces a 3 hour delay of the crawl on the second crawler. This could also be solved with two different cron.d files; one for each crawler.
This commit is contained in:
parent
9a06295fdf
commit
703a46bada
1 changed files with 5 additions and 1 deletions
|
@ -1,4 +1,8 @@
|
|||
# run the crawler twice a day
|
||||
# logs sent to /var/log/mirrormanager/crawler.log and crawl/* by default
|
||||
# 32GB of RAM is not enough for 75 threads, 38 seems to work so far
|
||||
0 */12 * * * mirrormanager /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1
|
||||
#
|
||||
# [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h is used to start the crawl
|
||||
# later on the second crawler to reduce the number of parallel accesses to
|
||||
# the database
|
||||
0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue