Hi all,
I did a complete re-install of our dCache admin node today, including the
OS. The edited highlights of what I did are below. If you skip to the
bottom of the email you will find the problem I am currently having with
pnfs.
1. Installed OS (SL 3.0.5)
2. Installed java, configured ntp, unpacked host certificates
3. Installed latest yaim
4. Customised site-info.def
5. Installed dCache meta package
/opt/lcg/yaim/scripts/install_node /opt/lcg/yaim/examples/site-info.def
lcg-SE_dcache | tee /tmp/dcache_install.txt
6. Configured dCache
/opt/lcg/yaim/scripts/configure_node /opt/lcg/yaim/examples/site-info.def
SE_dcache |tee /tmp/dcache_config.txt
7. Open up correct ports in firewall.
8. Setup cron jobs for grid-mapfile2dcache-kpwd and logrotate.d
There were no problems with this and the install _should_ result in a
working system.
====================
Problem begins here!
====================
However, to spice things up a bit, I wanted to try and retain the same
pnfs database that I had in my previous install (just to see how easy it
was to do) which would enable me to still access all of the files that I
had transferred into the dCache up till now. So, before I wiped the
machine, I had made a backup of the /opt/pnfsdb directory tree on the
admin node. In between steps 4 and 5 above, I unpacked this back into the
/opt directory on the newly installed OS.
I then continued on with steps 5 and 6. The pnfsdb was not overwritten
(presumable due to 'PNFS_OVERWRITE = no' in
/opt/pnfs.3.1.10/pnfs/etc/pnfs_config). However, after waiting for ~3
minutes, the web interface showed that the pnfsDomain was offline while
all other domains were online. After some searching in the pnfs
documentation (specifically /opt/pnfs.3.1.10/pnfs/docs/html/movedb.html) I
discovered that moving the pnfs database also requires you to copy over
the file /usr/etc/pnfsSetup (not the same as
/opt/d-cache/config/pnfsSetup) and to create the following log files:
# ls /var/log/pnfsd.log/
dbserver.log pmountd.log pnfsd.log
by touching them.
After doing all of this, I could start the pnfs services. However, I now
have a problem with in that I cannot get my disk pools online. I have not
touched my pool node setup, other than to turn the dcache-pool services on
and off.
The logs are reporting various error messages that may indicate a problem
with the pnfs database:
i.e. dCache.log
08/03 12:38:02 Cell(RoutingMgr@dCacheDomain) : update can't send update
to
RoutingMgr{uoid=<1123072682677:133>;path=[>RoutingMgr@local];msg=Missing
routing entry for RoutingMgr@local}
08/03 12:38:04 Cell(RoutingMgr@dCacheDomain) : Couldn't add wellknown
route : java.lang.IllegalArgumentException: Duplicated Entry
08/03 12:38:09 Cell(RoutingMgr@dCacheDomain) : update can't send update
to
RoutingMgr{uoid=<1123072689884:139>;path=[>RoutingMgr@local];msg=Missing
routing entry for RoutingMgr@local}
08/03 12:43:56 Cell(PoolManager@dCacheDomain) : 00010000000000000004F878 :
Configuration Error : No entries in Permission Matrix for this request
08/03 12:43:56 Cell(PoolManager@dCacheDomain) : 00010000000000000004F878 :
Configuration Error : No entries in Permission Matrix for this request
08/03 12:59:29 Cell(PoolManager@dCacheDomain) : 00010000000000000004F878 :
Configuration Error : No entries in Permission Matrix for this request
08/03 13:14:29 Cell(PoolManager@dCacheDomain) : 00010000000000000004F878 :
Configuration Error : No entries in Permission Matrix for this request
Has anyone experienced this problem before or know of a way to sort out
the pnfs database? Or is it that I have forgotten to set something up on
the admin node? It may be that I have to wipe the database and start
again...
Thanks,
Greig
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
DCACHE PAGES: http://www.gridpp.ac.uk/deployment/admin/dcache/index.html
=======================================================================
|