Have you made sure the pnfs is up and running on the other node before
starting the main head node?
-----Original Message-----
From: Sergey [mailto:[log in to unmask]]
Sent: 12 June 2008 15:04
To: Gerd Behrmann
Cc: [log in to unmask]; [log in to unmask];
[log in to unmask]; Synge, Owen
Subject: Re: dCache upgrade with remote pnfs installation
OK, thanks
We send a formal request to [log in to unmask] any way to raise the
problem.
Hope Owen will pick up all the information from this thread.
Sergey
2008/6/12 Gerd Behrmann <[log in to unmask]>:
> The pnfsDomain was started because NODE_TYPE was not correctly set. I
> have no idea why the PNFS is still mounted. I have put Owen on cc,
> since he maintains the install script.
>
> Cheers,
>
> /gerd
>
> Alessandra Forti wrote:
>>
>> Hi Gerd,
>>
>> maybe I'm being naive but if pnfs and pnfsDomain have been moved to
>> another machine and the flags are correctly set to 'no' in
node_config.
>> dcache shouldn't try to mount the file system nor to start the
pnfsDomain.
>> There aren't even gridftp doors on that node which apparently require
it.
>>
>> thanks for the suggestion though.
>>
>> cheers
>> alessandra
>>
>> Gerd Behrmann wrote:
>>>
>>> Have you tried cleaning out the /pnfs mount points on the head node.
I.e.
>>> while PNFS is *not* mounted on the head node, remove the directory
>>> entries under /pnfs. Then rerun the installation.
>>>
>>> I am no expert in the different PNFS mount points, but it seems the
>>> install script does like whatever was leftover from before you moved
>>> PNFS to another machine.
>>>
>>> Just be careful not to accidentally delete something actually stored
>>> in PNFS :-)
>>>
>>> Cheers,
>>>
>>> /gerd
>>>
>>> Sergey wrote:
>>>>
>>>> Hi Gerd,
>>>>
>>>> I have changed that NODE_TYPE to custom.
>>>> Now the installation script still doesn't like it:
>>>>
>>>> [root@dcache01 ~]# /opt/d-cache/install/install.sh INFO:Skipping
>>>> ssh key generation
>>>>
>>>> Checking MasterSetup ./config/dCacheSetup O.k.
>>>>
>>>> Sanning dCache batch files
>>>>
>>>> Processing adminDoor
>>>> Processing chimera
>>>> Processing dCache
>>>> Processing dir
>>>> Processing door
>>>> Processing gPlazma
>>>> Processing gridftpdoor
>>>> Processing gsidcapdoor
>>>> Processing httpd
>>>> Processing infoProvider
>>>> Processing lm
>>>> Processing pnfs
>>>> Processing pool
>>>> Processing replica
>>>> Processing srm
>>>> Processing statistics
>>>> Processing utility
>>>> Processing xrootdDoor
>>>>
>>>>
>>>> Checking Users database .... Ok
>>>> Checking Security .... Ok
>>>> Checking JVM ........ Ok
>>>> Checking Cells ...... Ok
>>>> dCacheVersion ....... Version production-1-8-0-15p5
>>>>
>>>> INFO:Will be mounted to dcache01.tier2.hep.manchester.ac.uk:/fs by
>>>> dcache-core start-up script.
>>>> INFO:Creating link /pnfs/tier2.hep.manchester.ac.uk -->
>>>> /pnfs/fs/usr/ MADE THE SYMBOLIC LINK INFO:Link
>>>> /pnfs/tier2.hep.manchester.ac.uk --> /pnfs/fs/usr already there.
>>>> INFO:[INFO] Creating link /pnfs/ftpBase --> /pnfs/fs which is used
>>>> by the GridFTP door.
>>>> INFO:PNFS is not running. It is needed to prepare dCache. ...
>>>> ERROR:Not allowed to start it. Set PNFS_START in etc/node_config to
>>>> 'yes' or start by hand. Exiting.
>>>>
>>>> Sergey
>>>>
>>>> 2008/6/12 Gerd Behrmann <[log in to unmask]>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Notice that you should set NODE_TYPE to custom - otherwise most of
>>>>> the flags in node_config are not respected.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> /gerd
>>>>>
>>>>> Sergey wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Have to add to our previous email (below) that pnfsDomain always
>>>>>> started on the head node after /opt/d-cache/bin/dcache
>>>>>> start/restart though the node was configured in the node_config
>>>>>> to run with a remote pnfs and not start pnfs on this one. Here is
>>>>>> our node_config on the head node:
>>>>>>
>>>>>> # $Id: node_config.template,v 1.6 2007-06-19 10:04:10 tigran Exp
>>>>>> $ #
>>>>>> NODE_TYPE=admin #admin, pool, door or custom
>>>>>> DCACHE_HOME=/opt/d-cache
>>>>>> POOL_PATH=/opt/d-cache/etc
>>>>>> NUMBER_OF_MOVERS=100
>>>>>>
>>>>>> #
>>>>>> # which namespace in installed?
>>>>>> #
>>>>>> # Possible values is chimera or pnfs # if nothing is defined or
>>>>>> none of above, then pnfs is used # NAMESPACE=''
>>>>>> PNFS_ROOT=/pnfs
>>>>>> PNFS_INSTALL_DIR=/opt/pnfs
>>>>>> #PNFS_START=yes
>>>>>> PNFS_START=no
>>>>>> PNFS_OVERWRITE=no
>>>>>>
>>>>>> # SERVER_ID=domain.name # defaults to `hostname -d`
>>>>>> SERVER_ID=tier2.hep.manchester.ac.uk
>>>>>> #ADMIN_NODE=myAdminNode # only needed for GridFTP door
which
>>>>>> is not on the admin node
>>>>>> ADMIN_NODE=dcache01.tier2.hep.manchester.ac.uk
>>>>>> NAMESPACE_NODE=dcache01.tier2.hep.manchester.ac.uk
>>>>>>
>>>>>> # ---- Services to be started on this node
>>>>>> # The following services are only started on this node
>>>>>> # if the corresponding parameter is set to 'yes'.
>>>>>> # Exeption: The PnfsManager is started on the admin node
>>>>>> # if the parameter is not specified.
>>>>>> #
>>>>>> GSIDCAP=no
>>>>>> DCAP=no
>>>>>> GRIDFTP=no
>>>>>> SRM=yes
>>>>>> XROOTD=no
>>>>>>
>>>>>>
>>>>>> #
>>>>>> # Following variables is for admin node only #
>>>>>>
>>>>>> # ---- Start the Replica Manager on this node.
>>>>>> # The variable 'replicaManager' in config/dCacheSetup has to
be set
>>>>>> # to 'yes' on every node of the dCache instance, if the
replica
>>>>>> manager
>>>>>> # is started with the following variable
>>>>>> # Make sure that there is only one replica manager running in
a
>>>>>> dCache
>>>>>> # instance.
>>>>>> #
>>>>>> #replicaManager=no # default: no
>>>>>> replicaManager=yes
>>>>>>
>>>>>>
>>>>>> # ---- Start the info provider on this node.
>>>>>> # With this variable, it is possible to install the info
provider
>>>>>> # on a separate node and not on the admin node.
>>>>>> #
>>>>>> #infoProvider=yes # default: 'yes' on 'admin' node
>>>>>> otherwise 'no'
>>>>>>
>>>>>>
>>>>>> # ---- Start the statistics module # # Make sure that
>>>>>> statisticsLocation variable in dCacheSetup file points to # an
>>>>>> existing directory.
>>>>>> #
>>>>>> #statistics=no # default: 'no'
>>>>>>
>>>>>>
>>>>>>
>>>>>> #################################################################
>>>>>> ###############
>>>>>> #
>>>>>> #
>>>>>> #
>>>>>> #
>>>>>> # DO NOT MODIFY THIS PART UNLESS YOU KNOW WHAT YOU ARE
DOING
>>>>>> #
>>>>>> #
>>>>>> #
>>>>>> # USED ONLY IF NODE_TYPE=custom
>>>>>> #
>>>>>> #
>>>>>> #
>>>>>>
>>>>>>
>>>>>> #################################################################
>>>>>> ###############
>>>>>>
>>>>>> #
>>>>>> # default components of a admin node #
>>>>>>
>>>>>> #
>>>>>> # Location manager. Single instace per dCache installation #
>>>>>> Required.
>>>>>> #
>>>>>> lmDomain=yes
>>>>>>
>>>>>> #
>>>>>> # httpd service. Single instace per dCache installation #
>>>>>> optional, recomented # httpDomain=yes
>>>>>>
>>>>>> #
>>>>>> # pnfs manager. Single instace per dCache installation #
>>>>>> Required.
>>>>>> #
>>>>>> #pnfsManager=yes
>>>>>> pnfsManager=no
>>>>>>
>>>>>> #
>>>>>> # PoolManager manager (AKA dCacheDomain). Single instace per
>>>>>> dCache installation # Required.
>>>>>> poolManager=yes
>>>>>>
>>>>>> #
>>>>>> # admin door. Single instace per dCache installation # optional,
>>>>>> recomented # adminDoor=yes
>>>>>>
>>>>>> #
>>>>>> # utilities ( pinManager and Co.). Single instace per dCache
>>>>>> installation # Required.
>>>>>> #
>>>>>> utilityDomain=yes
>>>>>>
>>>>>> #
>>>>>> # directory lookup service. Single instace per dCache
>>>>>> installation # required if at least one dcapDoor is running #
>>>>>> dirDomain=yes
>>>>>>
>>>>>> # gPlazma authentification serive. Single instace per dCache
>>>>>> installation # #
>>>>>> #gPlazmaService=no # default: 'no'
>>>>>> gPlazmaService=yes
>>>>>>
>>>>>>
>>>>>> Sergey
>>>>>>
>>>>>> 2008/6/11 Sergey <[log in to unmask]>:
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> We have a splitted dCache head node where the pnfs+psqlpostgre
>>>>>>> is installed on remote node.
>>>>>>> On the head node we have
>>>>>>> PNFS_START=no
>>>>>>> and
>>>>>>> pnfsManager=no
>>>>>>>
>>>>>>> After server rpm upgrade from 1.8 12p6 to 1.8 15p5 we have had
>>>>>>> to run /opt/d-cache/install/install.sh script to complete the
>>>>>>> upgrade. It fails with the following error:
>>>>>>>
>>>>>>> [root@dcache01 etc]# /opt/d-cache/install/install.sh
>>>>>>> INFO:Skipping ssh key generation
>>>>>>>
>>>>>>> Checking MasterSetup ./config/dCacheSetup O.k.
>>>>>>>
>>>>>>> Sanning dCache batch files
>>>>>>>
>>>>>>> Processing adminDoor
>>>>>>> Processing chimera
>>>>>>> Processing dCache
>>>>>>> Processing dir
>>>>>>> Processing door
>>>>>>> Processing gPlazma
>>>>>>> Processing gridftpdoor
>>>>>>> Processing gsidcapdoor
>>>>>>> Processing httpd
>>>>>>> Processing infoProvider
>>>>>>> Processing lm
>>>>>>> Processing pnfs
>>>>>>> Processing pool
>>>>>>> Processing replica
>>>>>>> Processing srm
>>>>>>> Processing statistics
>>>>>>> Processing utility
>>>>>>> Processing xrootdDoor
>>>>>>>
>>>>>>>
>>>>>>> Checking Users database .... Ok
>>>>>>> Checking Security .... Ok
>>>>>>> Checking JVM ........ Ok
>>>>>>> Checking Cells ...... Ok
>>>>>>> dCacheVersion ....... Version production-1-8-0-15p5
>>>>>>>
>>>>>>> INFO:Will be mounted to dcache01.tier2.hep.manchester.ac.uk:/fs
>>>>>>> by dcache-core start-up script.
>>>>>>> ERROR:The file/directory /pnfs/tier2.hep.manchester.ac.uk is in
>>>>>>> the way.
>>>>>>> Please move it out
>>>>>>> ERROR:of the way and call me again. Exiting.
>>>>>>>
>>>>>>> As a result the upgrade is not completed and
>>>>>>>
>>>>>>> 1. we still observe the old version on the Admin web page:
>>>>>>> SRM-dcache01 srm-dcache01Domain 0 3 80 msec 06/11 14:51:18
>>>>>>> production-1-8-0-12p6(1.151)
>>>>>>>
>>>>>>> 2. the srm protocol does not work properly we can srmcp TO the
>>>>>>> dCache, delete with srmrm and srmcp from it, but the "srmcp TO"
>>>>>>> hangs on the client side at the very end of operation and then
>>>>>>> ends with error by
>>>>>>> timeout:
>>>>>>>
>>>>>>> SRMClientV2 : srmPrepareToPut, contacting service
>>>>>>> httpg://dcache01.tier2.hep.manchester.ac.uk:8443/srm/managerv2
>>>>>>> Wed Jun 11 16:48:22 BST 2008: srm returned requestToken =
>>>>>>> -2147098616 Wed Jun 11 16:48:22 BST 2008: sleeping 1 seconds ...
>>>>>>> copy_jobs is not empty
>>>>>>> Wed Jun 11 16:48:23 BST 2008: no more pending transfers,
>>>>>>> breaking the loop copying CopyJob, source = file:////bin/bash
>>>>>>> destination =
>>>>>>>
>>>>>>>
>>>>>>> gsiftp://bohr3931.tier2.hep.manchester.ac.uk:2811//pnfs/tier2.he
>>>>>>> p.manchester.ac.uk/data/dteam/basht1
>>>>>>> GridftpClient: memory buffer size is set to 131072
>>>>>>> GridftpClient: connecting to bohr3931.tier2.hep.manchester.ac.uk
>>>>>>> on port
>>>>>>> 2811
>>>>>>> GridftpClient: gridFTPClient tcp buffer size is set to 0
>>>>>>> GridftpClient: gridFTPWrite started, source file is
>>>>>>> java.io.RandomAccessFile@2f0d54 destination path is
>>>>>>> /pnfs/tier2.hep.manchester.ac.uk/data/dteam/basht1
>>>>>>> GridftpClient: gridFTPWrite started, destination path is
>>>>>>> /pnfs/tier2.hep.manchester.ac.uk/data/dteam/basht1
>>>>>>> GridftpClient: set local data channel authentication mode to
>>>>>>> None
>>>>>>> GridftpClient: parallelism: 10
>>>>>>> GridftpClient: adler32 for file java.io.RandomAccessFile@2f0d54
>>>>>>> is
>>>>>>> 0b6f56d6
>>>>>>> GridftpClient: waiting for completion of transfer
>>>>>>> GridftpClient: starting a transfer to
>>>>>>> /pnfs/tier2.hep.manchester.ac.uk/data/dteam/basht1
>>>>>>> GridftpClient: DiskDataSink.close() called
>>>>>>> GridftpClient: gridFTPWrite() wrote 616248bytes
>>>>>>> GridftpClient: closing client :
>>>>>>> org.dcache.srm.util.GridftpClient$FnalGridFTPClient@15dd910
>>>>>>> GridftpClient: closed client
>>>>>>> execution of CopyJob, source = file:////bin/bash destination =
>>>>>>>
>>>>>>>
>>>>>>> gsiftp://bohr3931.tier2.hep.manchester.ac.uk:2811//pnfs/tier2.he
>>>>>>> p.manchester.ac.uk/data/dteam/basht1
>>>>>>> completed
>>>>>>> SRMClientV2 : put: try # 0 failed with error
>>>>>>> SRMClientV2 : ; nested exception is:
>>>>>>> java.net.SocketTimeoutException: Read timed out
>>>>>>> SRMClientV2 : put: try again
>>>>>>> SRMClientV2 : sleeping for 10000 milliseconds before retrying
>>>>>>> SRMClientV2 : put: try # 1 failed with error
>>>>>>> SRMClientV2 : ; nested exception is:
>>>>>>> ....
>>>>>>>
>>>>>>> Although the file appers in place and can be copied back or
>>>>>>> deleted with srmrm.
>>>>>>> Also srmls command does not work at all (just hangs).
>>>>>>>
>>>>>>> Does anybody have a suggestion how to complete the upgrade
correctly?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Sergey
>>>>>>>
>>>>>>> Manchester Tier2
>>>>>>>
>>>>>
>>>
>>>
>>
>
>
|