Print

Print


Hi Maarten,

we run into the same problem with upgraded WMS again. Now I believe I 
understood the problem. In /var/local/condor/log/GridmanagerLog.glite 
many restarts of the component were reported after such a crash:

07/31 16:40:04 [18921] gahp server not up yet, delaying ping
07/31 16:40:04 [18921] GAHP server pid = 19331
07/31 16:40:04 [18921] gahp->nordugrid_ldap_query returned -101 for 
resource korundi.grid.helsinki.fi
07/31 16:40:04 [18921] ERROR "nordugrid_ldap_query failed!" at line 211 
in file nordugridresource.cpp

Searching a bit the internet I found that you got trapped with a similar 
problem in May:

http://lindir.ics.muni.cz/pipermail/egee-jra1/2010-May/012580.html

When I remove the old  nordugrid_gahp and use the one included in 
Condor-7.4 things start to work again.

Is there a ticket for YAIM people on that?

Cheers, Christoph

On 07/25/2010 06:52 PM, Maarten Litmaath wrote:
> Hallo Christoph,
>
>    
>> It seems that the WMS recovered itself (being in
>> drain mode) over the weekend.  The WMS is full of Conder jobs in state
>> "H" (hold). Do they harm? Some are weeks old already.
>>      
> Normally held jobs do not harm, but the latest WMS version has an issue
> for which the admin may need to intervene occasionally:
>
>      https://savannah.cern.ch/bugs/?69841
>
> A cleanup cron job for held jobs is included in this bug:
>
>      https://savannah.cern.ch/bugs/?70401
>
> The grace period of 1 week probably should be lowered to 1 day,
> or even just a few hours...
>
>    
>> Another question, perhaps someone know the answer. Trying to get some
>> understanding of the flow of a job through the WMS, I tried to follow a
>> job that goes to  a CREAM-CE. Are those jobs supposed to showup in the
>> list of jobs listed with conder_q?
>>      
> No.  On a WMS the jobs for CREAM are handled by ICE, while jobs sent to
> LCG-CE or ARC-CE instances are handled by Condor-G:
>
>      https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLiteJobSubmissionSchema
>
> To see ICE details one can use /opt/glite/bin/queryDb on the WMS.
> The "-h" option shows how.
>