This thread went off-list, for completeness and posterity an explanation of this problem is here: http://northgrid-tech.blogspot.com/2009/02/jobmanager-pbsqueue-cache-locked.html Thanks very much Maarten and Andrey for tracking this! 2009/2/5 Peter Love <[log in to unmask]>: > OK thanks for taking the time, we'll play with tunings in the morning. > > The tests here are consistently not completing: > http://pprc.qmul.ac.uk/~lloyd/gridpp/atest.html > > See the row UKI-NORTHGRID-LANCS-HEP, where green 'S' successful jobs > are from our old glite3.0 CE and the yellow 'C' jobs are current, > never completing on our glite3.1 lcg-CE. > > There are some log links further down the page, not sure if these > provide clues. We can get the user (Steve Lloyd) to provide proxy > details if necessary. > > Cheers, > Peter > > > 2009/2/5 <[log in to unmask]>: >> On Thu, 5 Feb 2009 [log in to unmask] wrote: >> >>> Andrey and I submitted jobs for dteam, ops and atlas via CERN WMS nodes >>> without problems. The updates are slow: it may take half an hour or so >>> before a hello-world job is considered done by the WMS. >>> >>> Such delays could be reduced by halving the "tick", "proctout" and >> >> Well, "proctout" is the timeout for hung procesess to be killed: >> if that happens often, there would be another problem to be fixed. >> >>> "stateage" parameters in /opt/globus/etc/globus-{gma,*-marshal}.conf. >>> Going to much lower values is not recommended, as it would increase >>> the load on the batch system (and the CE). >> >> >