Print

Print


This thread went off-list, for completeness and posterity an
explanation of this problem is here:
http://northgrid-tech.blogspot.com/2009/02/jobmanager-pbsqueue-cache-locked.html

Thanks very much Maarten and Andrey for tracking this!

2009/2/5 Peter Love <[log in to unmask]>:
> OK thanks for taking the time, we'll play with tunings in the morning.
>
> The tests here are consistently not completing:
> http://pprc.qmul.ac.uk/~lloyd/gridpp/atest.html
>
> See the row UKI-NORTHGRID-LANCS-HEP, where green 'S' successful jobs
> are from our old glite3.0 CE and the yellow 'C' jobs are current,
> never completing on our glite3.1 lcg-CE.
>
> There are some log links further down the page, not sure if these
> provide clues. We can get the user (Steve Lloyd) to provide proxy
> details if necessary.
>
> Cheers,
> Peter
>
>
> 2009/2/5  <[log in to unmask]>:
>> On Thu, 5 Feb 2009 [log in to unmask] wrote:
>>
>>> Andrey and I submitted jobs for dteam, ops and atlas via CERN WMS nodes
>>> without problems.  The updates are slow: it may take half an hour or so
>>> before a hello-world job is considered done by the WMS.
>>>
>>> Such delays could be reduced by halving the "tick", "proctout" and
>>
>> Well, "proctout" is the timeout for hung procesess to be killed:
>> if that happens often, there would be another problem to be fixed.
>>
>>> "stateage" parameters in /opt/globus/etc/globus-{gma,*-marshal}.conf.
>>> Going to much lower values is not recommended, as it would increase
>>> the load on the batch system (and the CE).
>>
>>
>