We have been encountering a problem for some time where our local CE at
TCD drops out of our top-level BDII (also hosted on the TCD network). This
appears to happen at periods of approx. 10 or 20 minutes and normally
lasts for approx. 90 seconds (although sometimes for ~ 190 s).
What seems to be happening is that at some stage in the update cycle the
top-level BDII fails to contact the BDII on the CE (because of some
unexplained network interaction). This causes the BDII database to be
created without the information from the TCD CE: this means that the CE
appears to have disappeared for an entire cycle rather than just
instantaneously.
These are both LCG 2.4.0 BDIIs still. There are a number of other
non-standard things in our setup that may be causing the problem: the CE
is running on a Xen VM with software bridging and there is also the
possibility that our (supposedly isolated) testbed is interfering. I am
continuing to investigate but just wanted to see if the pattern of
failures meant anything to anyone.
Stephen
--
Dr. Stephen Childs,
Research Fellow, EGEE Project, phone: +353-1-6081797
Computer Architecture Group, email: Stephen.Childs @ cs.tcd.ie
Trinity College Dublin, Ireland web: http://www.cs.tcd.ie/Stephen.Childs
|