Thank you for such a speedy reply!
Words written by `Jean-Philippe Baud' on 02 Aug 2005 at 07:56:48 +0200 prompted:
> Although I agree that the rfio_parseln routine needs to be improved (this
> has been on my list for some time), it is certainly not necessary that the
> 'hostname' command return a short (non fully qualified) hostname. Actually
> all our machines on RH73 use short names while all our SLC3 machines run
> with FDQN.
> So I would be interested in knowing a concrete case where a FDQN implies
> "Connection timeout".
$ export RFIO_TRACE=999
=== Failure ===========================================================
# hostname dev03.gridpp.rl.ac.uk
$ rfdir dev03:/tmp/a
**** : trace level set to 999
rfio: rfio_stat64(dev03:/tmp/a, bfffba80)
rfio: Added by j.mencak: I'm not a localhost
rfio: rfio_connect: getenv(RFIO_PORT)
rfio: rfio_connect: getenv(RFIO_PORT)
rfio: rfio_connect: *** Warning: using port 3147
rfio: rfio_connect: connecting(dev03)
rfio: rfio_connect: Cgethostbyname(dev03)
rfio: rfio_connect: socket(2, 1, 0)
rfio: rfio_connect: netconnect_timeout(3, bfffa920, 16, 120)
rfio: rfio_connect: connect(): ERROR occured (Connection refused)
rfio: rfio_serror: errno=115, serrno=111, rfio_errno=0
dev03:/tmp/a: Connection refused
rfio: rfio_end entered, Tid=-1
rfio: rfio_end: Lock mstat_tab
rfio: rfio_end: Unlock mstat_tab
=== Success ===========================================================
# hostname dev03
$ rfdir dev03:/tmp/a
**** : trace level set to 999
rfio: rfio_stat64(dev03:/tmp/a, bfff8350)
rfio: Added by j.mencak: I'm a localhost
rfio: rfio_stat64: using local stat64(/tmp/a, bfff8350)
rfio: Added by j.mencak: I'm a localhost
rfio: rfio_opendir(dev03:/tmp/a)
rfio: Added by j.mencak: I'm a localhost
rfio: rfio_opendir(dev03:/tmp/a) rfio_parse returns host=(nil)
rfio: rfio_readdir(9540800)
rfio: rfio_readdir: check if HSM directory
rfio: rfio_HsmIf_FindDirEntry(0x9540800) -> RC=-1
rfio: rfio_readdir: using local readdir(9540800)
rfio: Added by j.mencak: I'm a localhost
drwxr-xr-x 2 mencak esc 1024 Aug 02 08:16 .
rfio: rfio_readdir(9540800)
rfio: rfio_readdir: check if HSM directory
rfio: rfio_HsmIf_FindDirEntry(0x9540800) -> RC=-1
rfio: rfio_readdir: using local readdir(9540800)
rfio: Added by j.mencak: I'm a localhost
drwxrwxrwx 14 root root 2048 Aug 02 08:22 ..
rfio: rfio_readdir(9540800)
rfio: rfio_readdir: check if HSM directory
rfio: rfio_HsmIf_FindDirEntry(0x9540800) -> RC=-1
rfio: rfio_readdir: using local readdir(9540800)
rfio: Added by j.mencak: I'm a localhost
-rw-r--r-- 1 mencak esc 21 Aug 02 08:16 one_file.txt
rfio: rfio_readdir(9540800)
rfio: rfio_readdir: check if HSM directory
rfio: rfio_HsmIf_FindDirEntry(0x9540800) -> RC=-1
rfio: rfio_readdir: using local readdir(9540800)
rfio: rfio_closedir(0x9540800)
rfio: rfio_closedir: check if HSM directory
rfio: rfio_HsmIf_FindDirEntry(0x9540800) -> RC=-1
rfio: rfio_closedir: using local closedir(0x9540800)
rfio: rfio_end entered, Tid=-1
rfio: rfio_end: Lock mstat_tab
rfio: rfio_end: Unlock mstat_tab
=======================================================================
I've added a trace message in a piece of code which does the detection
whether a machine is a localhost. The detection routine is copied
elsewhere in rfio_parseln().
If you need more tracing, please let me know. For example, I get
``connection refused'' as well when I do
$ rfdir dev03:/tmp/a
from dev02 box which is on the same subnet (when the same command
succeeds from dev03). I've temporarily switched firewalls off on
both boxes during this little test. It could be possible that RFIO
is still filtered by firewalls here at RAL I'll check with sysadmins
of those boxes.
> I will certainly look at the zombie problem. It was a last minute change
> to avoid warnings in the logs.
> I'm sure that we can improve our installation guide. Could you please tell
> us which information is missing?
Your installation guide seems fine, but, I think that adding some
sanity tests to ensure that DPM is configured and running correctly
(like the RFIO test which helped me to get SRM working) would be
very beneficial. Graeme's notes can be found here:
http://www.physics.gla.ac.uk/gridpp/datamanagement/index.php/DiskPoolManager
I've found the ``DpmTesting'' especially useful.
Thanks again for your feedback.
Best regards.
--
Jiri
|