[ ... ]
>> 11/15 11:42:20 15873,3 dpm_srv_addfs: DP098 - addfs dpmCam_2007 disk08.hep.phy.cam.ac.uk /dpm_data
>> 11/15 11:42:20 15873,3 dpm_addfs2poolconf: disk08.hep.phy.cam.ac.uk:/dpm_data: Connection closed by remote end
Sometimes in these cases I try one of:
nmap -R -sT -p 5001 disk08.hep.phy.cam.ac.uk
tcptraceroute disk08.hep.phy.cam.ac.uk 5001
telnet disk08.hep.phy.cam.ac.uk:5001
for a very basic port check (I have just them from home and
there is something listening on that port) after the
'traceroute' (which you have already done) and then
openssl s_client -CApath /etc/grid-security/certificates \
-key /etc/grid-security/dpmmgr/dpmkey.pem \
-connect disk08.hep.phy.cam.ac.uk:5001
(from memory, so may have to tweak it a bit) to see if the
SSL connection is possible.
In the specific case of adding a disk server I guess that the
connection above from the DPM verifies that the RFIO server
works and serves the right directory; then you can use RFIO
commands on th DPM server directly to check out whether RFIO is
working on the disk server and is accessible from the DPM
server, for example:
rfdf disk08.hep.phy.cam.ac.uk:/dpm_data
and then 'strace -e trace=network -f ....' that if that fails; or
even 'strace -e trace=network -f -p $(pidof rfiod)' to see what the
RFIO server daemon is doing.
> Does the head node "trust" the pool server for dpm operations
> ie is the new disk server in the line DPM TRUST in
> /etc/shift.conf on the head node
IIRC all disk servers should be in 'DPNS TRUST' too, but I guess
that the existing disk servers will be there and the name of the
new one will have been added to both places.
[ ... ]
|