On Wed, 18 Jul 2007, Stathakopoulos George wrote:
> Hello,
>
> I'm reposting this because I didn't find anything that can help to solve
> this issue.
I did not see a reply to this message I sent earlier:
-----------------------------------------------------------------------------
On Fri, 13 Jul 2007, Stathakopoulos Giorgos wrote:
> Hello all,
>
> Our CE (ce01.kallisto.hellasgrid.gr) is overloaded due to many
> globus-job-manager processes of
>
> 1) globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf
> -type fork -rdn jobmanager-fork -machine-type unknown -publish-jobs
> 2) /usr/bin/perl /opt/globus/libexec/globus-job-manager-script.pl -m pbs
> -f /tmp/gram_xxxxx -c remote_io_file_create
> 3) /opt/globus/libexec/globus-gass-cache-util -cleanup-tag -t
> https://ce01.kallisto.hellasgrid.gr:xxxxx/xxxxx/xxxxxxx
>
> Above processes start with a ratio of about 50/hour and they stay
> running. After a few hours CE stops responding and it runs out of
> memory. We have to reboot it to get it back.
>
> We have the latest update of middleware installed.
>
> Any ideas?
In /var/log I noticed that /opt/lcg/sbin/cleanup-grid-accounts.sh last did
something on June 24:
---------------------------------------------------------------------------------------
-rw-r--r-- 1 root root 92 Jul 14 02:16 cleanup-grid-accounts.log
-rw-r--r-- 1 root root 107 Jul 13 02:16 cleanup-grid-accounts.log.1.gz
-rw-r--r-- 1 root root 107 Jul 12 03:14 cleanup-grid-accounts.log.2.gz
[...]
-rw-r--r-- 1 root root 107 Jun 25 02:10 cleanup-grid-accounts.log.18.gz
-rw-r--r-- 1 root root 18094 Jun 24 02:10 cleanup-grid-accounts.log.19.gz
-rw-r--r-- 1 root root 25510 Jun 22 05:03 cleanup-grid-accounts.log.20.gz
-rw-r--r-- 1 root root 21685 Jun 21 05:07 cleanup-grid-accounts.log.21.gz
---------------------------------------------------------------------------------------
This is because /opt/lcg/etc/cleanup-grid-accounts.conf ends like this:
---------------------------------------------------------------------------------------
# next lines added by YAIM on Thu Jul 12 13:19:27 EEST 2007
ACCOUNTS='
'
---------------------------------------------------------------------------------------
Any idea how that happened? Please try the following:
---------------------------------------------------------------------------------------
/opt/glite/yaim/bin/yaim -r -s your-site-info.def -f config_users
---------------------------------------------------------------------------------------
Then check if /opt/lcg/etc/cleanup-grid-accounts.conf lists all grid accounts.
If they do not get cleaned up regularly, that could slow things down a lot.
You use GPFS for the home directories: maybe it has problems with the large
numbers of hard links under the .globus/.gass_cache subdirectories?
Is GPFS in good shape? Any hardware errors?
|