Hi
We are seeing a problem of ops jobs stay in running state for long time at most of the sites. It is because of a mis-configured message broker being published in BDII and WN's picking that message broker to send back job output.
SAM-Nagios team has removed this broker but it might take some time before it go away completely from BDII. I hope that nagios tests will come back to normal in 3-4 hours.
Thanks
Kashif
-----Original Message-----
From: Elena Korolkova [mailto:[log in to unmask]]
Sent: 20 November 2012 09:23
To: Kashif Mohammad
Cc: J Coles
Subject: nagios problem in Sheffield
Hi Kashif
Sheffield is red in nagios tests since 4 am today. I couldn't find a problem. Ops jobs arrive and run in Sheffield. There are several jobs which run for > 4 hours. This problem was discussed on LCG-ROLLOUT list.
When I check https://gridppnagios.physics.ox.ac.uk/nagios/ I find that the problem is this long job.
Could you have a look, please.
Thanks
Elena
__________________________________________________
Dr Elena Korolkova
Email: [log in to unmask]
Tel.: +44 (0)114 2223553
Fax: +44 (0)114 2223555
Department of Physics and Astronomy
University of Sheffield
Sheffield, S3 7RH, United Kingdom
|