Hi,
We had recently a *very* difficult to debug problem,
a "sometimes" malfunctioning globus-job-run command.
We finally find out it was because of misconfiguration of one
of the nameservers for the reverse zone for a certain IP address range.
Please, make sure you are fine with your DNS setup,
by going to the site http://www.dnsstuff.com -> Reverse DNS lookup
enter eg. your ce's hostname, click RevDNS, go the bottom, Click Here,
and see if everything looks normal. Nice thing to "DNS lookup", too,
are the SOA records for forward and reverse, that they are ok.
You may catch some minor issues with this one, too:
http://www.ripe.net/cgi-bin/delcheck/delcheck2.cgi
Also, I don't know if anyone cares about this,
but we've found out a few discrepancies in time keeping of some CEs, too.
The list of CEs was obtained with lcg-infosites, it's where actual jobs go.
Didn't check for the WNs, that's somebody else's job:)
29 Apr 08:53:03 ntpdate[26569]: #bigmac-lcg-ce.physics.utoronto.ca
29 Apr 08:53:04 ntpdate[26569]: step time server 194.177.210.41 offset
150.945045 sec
--
29 Apr 08:54:12 ntpdate[2841]: #ce.prd.hp.com
29 Apr 08:54:13 ntpdate[2841]: step time server 194.177.210.41 offset
94.418203 sec
--
29 Apr 13:55:53 ntpdate[5285]: #lunegw.lancs.ac.uk
29 Apr 13:55:53 ntpdate[5285]: step time server 194.177.210.41 offset
-17.437382 sec
--
29 Apr 13:57:47 ntpdate[24174]: #lcg-01.hpl.hp.com
29 Apr 13:57:48 ntpdate[24174]: step time server 194.177.210.41 offset
-132.697517 sec
--
29 Apr 14:54:53 ntpdate[26819]: #ce1.egee.fr.cgg.com
29 Apr 14:54:54 ntpdate[26819]: step time server 194.177.210.41 offset
42.968241 sec
--
29 Apr 14:55:35 ntpdate[13047]: #ce01-lcg.projects.cscs.ch
29 Apr 14:55:35 ntpdate[13047]: step time server 194.177.210.41 offset
-1.822170 sec
--
29 Apr 14:55:55 ntpdate[12689]: #zeus02.cyf-kr.edu.pl
29 Apr 14:55:55 ntpdate[12689]: step time server 194.177.210.41 offset
-18.545212 sec
--
29 Apr 14:56:14 ntpdate[2215]: #cmsboce1.bo.infn.it
29 Apr 14:56:15 ntpdate[2215]: step time server 194.177.210.41 offset
-41.527098 sec
--
29 Apr 15:53:44 ntpdate[19087]: #grid8.wdcb.ru
29 Apr 15:53:45 ntpdate[19087]: step time server 194.177.210.41 offset
3710.784405 sec
--
29 Apr 15:55:35 ntpdate[25442]: #node001.grid.auth.gr
29 Apr 15:55:35 ntpdate[25442]: step time server 194.177.210.41 offset
0.793555 sec
--
29 Apr 15:59:49 ntpdate[5387]: #ce01.gridctb.uoa.gr
29 Apr 15:59:50 ntpdate[5387]: step time server 194.177.210.41 offset
-257.770010 sec
--
29 Apr 16:54:46 ntpdate[6308]: #ce.keldysh.ru
29 Apr 16:54:47 ntpdate[6308]: step time server 194.177.210.41 offset
46.078382 sec
--
29 Apr 16:55:35 ntpdate[30755]: #ce001.m45.ihep.su
29 Apr 16:55:35 ntpdate[30755]: step time server 194.177.210.41 offset
-0.923147 sec
--
29 Apr 17:55:31 ntpdate[19164]: #CE.pakgrid.org.pk
29 Apr 17:55:33 ntpdate[19164]: step time server 194.177.210.41 offset
9.229910 sec
The command used was ntpdate -q <ntpserver>, the systems that report
less than a second or two of time difference, are probably just clock-skewed.
cheers,
Fotis
--
echo "sysadmin know better bash than english" | sed s/min/mins/ \
| sed 's/better bash/bash better/' # Yelling in a CERN forum
|