For the record, the problem was likely this bug:
https://savannah.cern.ch/bugs/?73329
Cheers, Massimo
On Thu, 2 Dec 2010, Massimo Sgaravatto - INFN Padova wrote:
> Debugging off-list ...
>
> On Thu, 2 Dec 2010, Nilsen Dimitri wrote:
>
>> It is a CREAM_CE.. and the strange thing, it seems to affect only one of
>> 3 our creams. also not all jobs stay forever in "running".. some of them
>> passed good.
>> status at CREAM also done.. date of "done" at LB and CREAM seems to be
>> the same. But if we look at the date ff the entry in LB database, it is
>> an hour earlier. Is it somehow UTC staff?
>> cream and log-output:15:20:05
>> LB database: 14 20 05 ..
>> (seel logs in my first mail)
>>
>> CREAM-log:
>> /opt/glite/var/log/glite-ce-cream.log.1:01 Dec 2010 15:20:05,606 INFO
>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
>> (AbstractJobExecutor.java:2094) - (Worker Thread 19) JOB CREAM215908827
>> STATUS CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0]
>> [localUser=dcms063]
>> [gridJobId=https://lb-1-fzk.gridka.de:9000/HToevj_pLkQDZcEKfXQQCw]
>> [lrmsJobId=982499] [workerNode=c01-016-117]
>> [delegationId=12910455052E687689wms2D12Dfzk2Egridka2Ede]
>>
>> I read the docu at
>> http://goc.grid.sinica.edu.tw/gocwiki/Jobs_sent_to_some_CE_stay_in_Running_state_forever
>> but I don't think there are some background processes. Job was simple:
>> just:
>> #!/bin/bash
>> /bin/hostname
>> /usr/bin/id
>> date
>>
>> What I don't understand: By glite-wms-job-status the attribute is a JobID
>> with a reference to
>> LB(https://lb-1-fzk.gridka.de:9000/D4Fhkep6xv-fLQI_aEV1-w). So,
>> glite-wms-job-status makes a connection to LB to check the status, right?
>> At LB Job is marked as "done".. in the database.. why it shows "running"?
>>
>> Regards
>> Dimitri
>>
>>
>>
>>
>> On 12/01/2010 08:01 PM, Massimo Sgaravatto - INFN Padova wrote:
>>> Was this submitted to a CREAM-CE or LCG-CE ?
>>>
>>> In the former case, what is the status of that job wrt CREAM (you can
>>> find this info in the glite-ce-cream.log)
>>>
>>> Cheers, Massimo
>>>
>>>
>>> On Wed, 1 Dec 2010, Nilsen Dimitri wrote:
>>>
>>>> Hi
>>>>
>>>> we observe that many jobs stay in running state forever.
>>>> But the job was done successfully and output copied back to WMS. What
>>>> could be the reason?
>>>>
>>>> example:
>>>> gridka24 $ glite-wms-job-status
>>>> https://lb-1-fzk.gridka.de:9000/HToevj_pLkQDZcEKfXQQCw
>>>> ...Current Status: Running...
>>>>
>>>>
>>>> but:
>>>> gridka24 $ glite-wms-job-logging-info -v 2
>>>> https://lb-1-fzk.gridka.de:9000/HToevj_pLkQDZcEKfXQQCw
>>>> Event: Done
>>>> - Arrived = Wed Dec 1 15:20:05 2010 CET
>>>> - Exit code = 0
>>>> - Host = c01-016-117.gridka.de
>>>> - Reason = job completed
>>>>
>>>> @LB mysql db:
>>>> | HToevj_pLkQDZcEKfXQQCw | 14 | DG.LLLID=2430000
>>>> DG.USER="/O=GermanGrid/OU=Uni Karlsruhe/CN=Andreas Oehler"
>>>> DATE=20101201142005.522378 HOST="c01-016-117.gridka.de" PROG=edg-wms
>>>> LVL=SYSTEM DG.PRIORITY=4 DG.SOURCE="LRMS" DG.SRC_INSTANCE=""
>>>> DG.EVNT="Done"
>>>> DG.JOBID="https://lb-1-fzk.gridka.de:9000/HToevj_pLkQDZcEKfXQQCw"
>>>> DG.SEQCODE="UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LRMS=000005:APP=000000:LBS=000000"
>>>>
>>>> DG.DONE.STATUS_CODE="OK" DG.DONE.REASON="job completed"
>>>> DG.DONE.EXIT_CODE="0"
>>>>
>>>> @WMS:
>>>> # cat
>>>> /var/glite/SandboxDir/HT/https_3a_2f_2flb-1-fzk.gridka.de_3a9000_2fHToevj_5fpLkQDZcEKfXQQCw/output/gc.stdout
>>>>
>>>> <some output, job done correct>
>>>>
>>>> LB and WMS are different hosts and have latest updates.
>>>> I tried to restart interlogd processes.. no effect.
>>>>
>>>> Regards
>>>> Dimitri
>>>>
>>>> --
>>>> Dimitri Nilsen, Dipl.-Ing(FH)
>>>>
>>>> Karlsruhe Institute of Technology (KIT)
>>>> Steinbuch Centre for Computing
>>>> Postfach 3640
>>>> 76344 Eggenstein-Leopoldshafen, Germany
>>>>
>>>> Tel.: +49 7247 82-8607
>>>> Fax.: +49 7247 82-4972
>>>> Email: [log in to unmask]
>>>>
>>>
>>> \|||/
>>> -----------0oo----( o o )----oo0-------------------
>>> (_)
>>> INFN Sezione di Padova
>>> Via Marzolo, 8
>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
>>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
>>> Fax: ++39 0498275952
>>
>>
>> --
>> Dimitri Nilsen, Dipl.-Ing(FH)
>>
>> Karlsruhe Institute of Technology (KIT)
>> Steinbuch Centre for Computing
>> Postfach 3640
>> 76344 Eggenstein-Leopoldshafen, Germany
>>
>> Tel.: +49 7247 82-8607
>> Fax.: +49 7247 82-4972
>> Email: [log in to unmask]
>>
>>
>
> \|||/
> -----------0oo----( o o )----oo0-------------------
> (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> Fax: ++39 0498275952
>
>
>
>
\|||/
-----------0oo----( o o )----oo0-------------------
(_)
INFN Sezione di Padova
Via Marzolo, 8
35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
Tel: ++39 0498275908 Skype: massimo.sgaravatto
Fax: ++39 0498275952
|