On Sat, 14 Jul 2007, Kyriakos Ginis wrote:
> We also have observed a recent increase in the processes spawned through
> the fork jobmanager. Has anything been changed recently regarding the
> way the jobs are submitted and monitored by the RB/WMS?
This does not look good:
--------------------------------------------------------------------------
$ globus-job-run ce.hep.ntua.gr /bin/df
GRAM Job submission failed because cannot access cache files in ~/.globus/
.gass_cache, check permissions, quota, and disk space (error code 76)
--------------------------------------------------------------------------
Using a VOMS proxy:
--------------------------------------------------------------------------
$ globus-job-run ce.hep.ntua.gr /bin/ls -ld /home/dteamsgm02
drwx------ 4 14057 dteamsgm 4096 Jun 5 08:43 /home/dteamsgm02
$ globus-job-run ce.hep.ntua.gr /usr/bin/id dteamsgm02
uid=14067(dteamsgm02) gid=1420(dteamsgm) groups=1420(dteamsgm),2688(dteam)
--------------------------------------------------------------------------
The directory has the wrong ownership. This does not explain the increase
in the number of fork jobmanager processes, but should be fixed anyway.
Note that we recently advised not to use sgm/prd pool account prefixes
like "dteamsgm" because the corresponding accounts may end up taken by
ordinary users as well. The sgm/prd accounts can have names like these:
dtmsgm01
dtmprd01
sgmdtm01
prddtm01
Note that it is best to use 8 characters at most, to avoid unexpected
side effects in utilities like "ps", which prints the UID instead of
the account name if it exceeds 8 characters...
When you adapt users.conf, verify the consistency between the accounts
listed in that file, /etc/passwd and /etc/grid-security/gridmapdir.
Accounts that should no longer be used should only be removed when
the service is in scheduled downtime and activity has drained.
To clean up /etc/grid-security/gridmapdir the following procedure
should be applied:
1. cd /etc/grid-security/gridmapdir
2. For any unwanted account name file "abc123" run this command:
ls -li abc123
3. If the link count is 1, the file can be removed.
4. If the link count is 2, note the inode number of the file and
run this command:
ls -li | awk '$1 == inode_number'
For example:
ls -li | awk '$1 == 2467912'
That will report 2 files: the unwanted account name file and
the file whose name contains '%' characters and represents
the user mapped to the account. Both must then be removed.
|