On Sun, 27 Nov 2005, Filippidis christos wrote:
> hi again,
>
> after sending a lot of sft jobs to xg009.inp.demokritos.gr (from
> https://monitoring.egee.man.poznan.pl/) i found out that from time to time
> the sfts was successfull ,i took 6 successfull sft jobs in a row and then
> the same JS error (this was for the same WN).i realise that when i took
> the succesful sft jobs i had first change the max_running for dteam queue
> from 1 to 6( qmgr -c "set queue dteam max_running=6")
> then i change it again to 1 an so i took again the JS error despite the
> fact that there was not running another dteam job and there was a free WN.
Do you have one WN that can only run dteam jobs? If so, it may be in bad
shape, causing all dteam jobs to fail as long as max_running=1.
> after a couple of hours i make an qmgr -c "set queue dteam
> max_running=10" and i had an successful sft job
>
> strange ...
>
> i tried the following but i dont see anything to change :
>
>
> "In /etc/grid-security/gridmapdir/ there are hard links
>
> (with strange names like
> %2fc%3dch%2fo%3dcern%2fou%3dgrid%2fcn%3dpiotr%20nyczyk%209654) to each
> pool account that is taken. They have the same inode number (ls -li
> <filename>) as the pool account file they point to. If there's no pool
> account file left free, run
>
> /opt/edg/sbin/lcg-expiregridmapdir.pl"
That advice is not good. One must ensure /etc/cron.d contains a job
called "lcg-expiregridmapdir" and that it runs regularly. When all pool
accounts are taken, the only recourse is to create more pool accounts.
|