Yo,
our RB (bosheks) has been acting rather strangely today. from the WM
daemon log file:
> 08 Nov, 11:03:38 -I- checkRank: tbn20.nikhef.nl:2119/jobmanager-pbs-qlong, -116564
> 08 Nov, 11:03:38 -I- Helper::resolve: Selected prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-atlas_low_1 for job https://bosheks.nikhef.nl:9000/ClgB2KzZtNHTUtoUcQq_9w
> 08 Nov, 11:05:50 -W- retrieveCloseSEsInfo_allce: Timed out
> 08 Nov, 11:05:50 -W- retrieveCloseSEsInfo: Could not get CloseSEsInfo for CE: prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-atlas_low_1
> 08 Nov, 11:05:51 -I- Helper::resolve: Matchmaking for job id https://bosheks.nikhef.nl:9000/p0-s27IkeOAAlimrXR9ksg
> 08 Nov, 11:08:13 -E- prefetchCEInfo: subcluster rsgrid3.its.uiowa.edu undefined!
> 08 Nov, 11:08:13 -I- checkRequirement: grid10.lal.in2p3.fr:2119/jobmanager-pbs-atlas, Ok!
Note that it took it more than two minutes for
"retrieveCloseSEsInfo_allce" to time out. It is doing this for every
job right now, and this seems to be killing the throughput. Any
guessegtions?
J "no, that's not a real word, don't worry" T
|