JISCMail - LCG-ROLLOUT Archives

Dear Maarten,

Maarten Litmaath wrote:
> Hallo Christoph,
> 
>> It seems that the WMS recovered itself (being in 
>> drain mode) over the weekend.  The WMS is full of Conder jobs in state 
>> "H" (hold). Do they harm? Some are weeks old already.
> 
> Normally held jobs do not harm, but the latest WMS version has an issue
> for which the admin may need to intervene occasionally:
> 
>     https://savannah.cern.ch/bugs/?69841

This looks rather similar to picture that we saw on Friday afternoon. 
Actually I also removed a lot of held jobs. Perhaps that did the trick 
to recover WMS.

> A cleanup cron job for held jobs is included in this bug:
> 
>     https://savannah.cern.ch/bugs/?70401
> 
> The grace period of 1 week probably should be lowered to 1 day,
> or even just a few hours...

We will try the cron job.

>> Another question, perhaps someone know the answer. Trying to get some 
>> understanding of the flow of a job through the WMS, I tried to follow a 
>> job that goes to  a CREAM-CE. Are those jobs supposed to showup in the 
>> list of jobs listed with conder_q?
> 
> No.  On a WMS the jobs for CREAM are handled by ICE, while jobs sent to
> LCG-CE or ARC-CE instances are handled by Condor-G:
> 
>     https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLiteJobSubmissionSchema
> 
> To see ICE details one can use /opt/glite/bin/queryDb on the WMS.
> The "-h" option shows how.

Thanks for the hints. The picture is rather busy, but it is clear if you 
know that ICE does not deal with Condor-G internally. (I did know before...)

Best wishes, Christoph