Print

Print


On Friday 30 March 2007 09:31:08 Condurache, C (Catalin) wrote:
> Hi,
>
> I'm looking for some help regarding one of our RBs at RAL. On
> lcgrb01.gridpp.rl.ac.uk the lcg-mon-job-status daemon is running but the
> /opt/lcg/var/log/lcg-mon-job-status.log file is empty. The same are
> lcg-mon-job-status.log.{1,2,3,4,5,6} and the lcg-mon-job-status.log.7
> (24 Mar) contains
>
> 2007-03-23 13:07:54,363: [ERROR] Failed to insert tuple.
> 2007-03-23 13:07:54,364: [ERROR] Could not contact R-GMA server at
> lcgmon01.gridpp.rl.ac.uk:8443 - (-1, 'Connection timed out')
> 2007-03-23 13:07:54,365: [ERROR] Will try again in 60s
> 2007-03-23 13:13:54,417: [ERROR] Failed to insert tuple.
> 2007-03-23 13:13:54,418: [ERROR] Could not contact R-GMA server at
> lcgmon01.gridpp.rl.ac.uk:8443 - (-1, 'Connection timed out')
> 2007-03-23 13:13:54,418: [ERROR] Will try again in 60s
> 2007-03-23 13:19:54,470: [ERROR] Failed to insert tuple.
> 2007-03-23 13:19:54,471: [ERROR] Could not contact R-GMA server at
> lcgmon01.gridpp.rl.ac.uk:8443 - (-1, 'Connection timed out')
> 2007-03-23 13:19:54,472: [ERROR] Will try again in 60s
> 2007-03-23 13:25:54,524: [ERROR] Failed to insert tuple.
> 2007-03-23 13:25:54,525: [ERROR] Could not contact R-GMA server at
> lcgmon01.gridpp.rl.ac.uk:8443 - (-1, 'Connection timed out')
> 2007-03-23 13:25:54,525: [ERROR] Will try again in 60s
> 2007-03-23 13:31:54,577: [ERROR] Failed to insert tuple.
> 2007-03-23 13:31:54,578: [ERROR] Could not contact R-GMA server at
> lcgmon01.gridpp.rl.ac.uk:8443 - (-1, 'Connection timed out')
> 2007-03-23 13:31:54,579: [ERROR] Will try again in 60s
>
> A restart of lcg-mon-job-status service didn't fix it.
> Any thoughts?

Hi Catalin

I can see a producer on lcgmon01.gridpp.rl.ac.uk which has been created by 
client lcgrb01.gridpp.rl.ac.uk. This was started Fri Mar 30 08:18:39 UTC 2007 
and has inserted 10238 tuples with the last insert 294 seconds ago. So it 
looks like the RB is ok now. The errors that you have are dated from the 23rd 
do you have any more recent failures? Ganglia for lcgmon01 shows a gap in 
information last week covering the 23 so I cant see if there were any 
problems.

Alastair

>
> Thanks,
> Catalin