ciao, back to this (last time all the problems did vanish and I could not find a test cae to debug further).
Here I have it this time: a job
- finished in LSF > 2 hours
- not finished in the MW (did not get the gridinfo: [18480-8664] Job 1273062133:lcglsf:internal_2414998632:26378.1273062130 (ID 5329059) has finished message)
job is:
LSF: 5325625
[boccali@gridce2 ~]$ bacct -l 5325625
...
Job <5325625>, Job Name <GRIDJOB>, User <cmsprd>, Project <default>, Status <DO
NE>, Queue <cms>, Command <#! /bin/sh;#;# LSF batch job sc
ript built by Globus Job Manager;#BSUB -J GRIDJOB;#BSUB -q
cms;#BSUB -i /dev/null;#BSUB -e /dev/null;#BSUB -o /dev/nu
ll;#BSUB -f "/home/cmsprd/.lcgjm/globus-cache-export.w2030
4/globus-cache-export.w20304.gpg > globus-cache-export.w20
304.gpg";X509_USER_PROXY="/home/cmsprd/.globus/job/gridce2
.pi.infn.it/12479.1273033934/x509_up"; export X509_USER_PR
OXY;GLOBUS_REMOTE_IO_URL="/home/cmsprd/.lcgjm/.remote_io_p
tr/remote_io_file-12479.1273033934"; export GLOBUS>
Wed May 5 06:33:42: Submitted from host <gridce2>, CWD <$HOME>, Input File </d
ev/null>, Output File </dev/null>;
Wed May 5 06:33:45: Dispatched to <gridwn154>;
Wed May 5 13:53:20: Completed <done>.
finished 1.5 hours ago.
So question is: how can I see locally why the job is not considered finished for grid? Since now I have a real case under my hand, I would like indications about what to search in /home/cmsprd/.lcgjm/globus-cache-export.w20304 ( I guess). Or, Should I look elsewhere?
For example, the job output is there:
-rw-r--r-- 1 cmsprd cms 10240 May 5 13:53 gridwn154.1007.import.txt.tar
so I can exclude problems in communication between WN and CE at the end of job ...
thanks a lot
tommaso
On 15 Apr 2010, at 19:36, Maarten Litmaath wrote:
> Ciao Tommaso,
>
>>> Indeed all the jobs production was complaining about did pass froma
>>> single LB (not sure it is the same WMS, though)
>>
>> That could be a clue and point to some problem with that LB or
>> a particular WMS using that LB. Can you check the wmproxy logs
>
> Sorry, I meant the gatekeeper logs!
>
>> for the WMS node(s) that submitted those jobs?
|