Dear Maarten,
Maarten Litmaath wrote:
> Hallo Christoph,
>
>> It seems that the WMS recovered itself (being in
>> drain mode) over the weekend. The WMS is full of Conder jobs in state
>> "H" (hold). Do they harm? Some are weeks old already.
>
> Normally held jobs do not harm, but the latest WMS version has an issue
> for which the admin may need to intervene occasionally:
>
> https://savannah.cern.ch/bugs/?69841
This looks rather similar to picture that we saw on Friday afternoon.
Actually I also removed a lot of held jobs. Perhaps that did the trick
to recover WMS.
> A cleanup cron job for held jobs is included in this bug:
>
> https://savannah.cern.ch/bugs/?70401
>
> The grace period of 1 week probably should be lowered to 1 day,
> or even just a few hours...
We will try the cron job.
>> Another question, perhaps someone know the answer. Trying to get some
>> understanding of the flow of a job through the WMS, I tried to follow a
>> job that goes to a CREAM-CE. Are those jobs supposed to showup in the
>> list of jobs listed with conder_q?
>
> No. On a WMS the jobs for CREAM are handled by ICE, while jobs sent to
> LCG-CE or ARC-CE instances are handled by Condor-G:
>
> https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLiteJobSubmissionSchema
>
> To see ICE details one can use /opt/glite/bin/queryDb on the WMS.
> The "-h" option shows how.
Thanks for the hints. The picture is rather busy, but it is clear if you
know that ICE does not deal with Condor-G internally. (I did know before...)
Best wishes, Christoph
|