On 3 Nov 2010, at 16:35, Peter Gronbech wrote:
> Kashif has noticed that the WMS's in the UK are not working very well.
>
> See Steve Lloyds page: http://pprc.qmul.ac.uk/~lloyd/gridpp/rbtest.html
>
> It looks like they may all be overloaded, this is affecting the jobs he submits from gridppnagios. This means that the status information for some sites is old, (Waiting for a successful job run), and consequently could adversely affect availability and reliability figures.
>
> Does anybody know why the WMSs are over loaded? Presumably this is affecting all grid job submission not just our monitoring jobs.
It's not nessecerily the WMSen that are the problem here.
A monitoring job has to hit a worker node, before it complete's successfully. If all the CE's are full (like we are), then it's going to have to queue; irrespective of the WMS. (Certianly, one of our WMS's - svr022 looks perfectly calm; the other only moderatly busy).
The root problem is that the test _doesn't_ test the WMS only - it (by it's very nature) has to be an end-to-end test.
The only real way I can think of to alleviate that issue would be to have a CE and a worker node somewhere that was reserved for these tests; to guarantee fast response. (Also: LCG-CE's have about a 15 minute latency - CREAM is a lot better in this respect).
|