Hello all,
Now that the dust has settled from our upgrade I thought I'd share our
experiances at Lancaster.
Overview:
At first I planned to do the upgrade without yaim- by upgrading the
rpm's and then running the install scripts. But after missing the
e-mail concerning the missing install script I switched to the yaim
method to configure my dcache as I was fast running out of downtime.
We used the instructiosn found at:
http://www.dcache.org/manuals/dcacheUpgrade_1_6-1_7.shtml
Preupgrade Steps:
I had some difficulty with the preupgrade steps- dropping some of the
old databases and recreating them. These were due to not all databases
being owned by the srmdcache postgres role, and the srmdcache role
itself not actually having the ability to create databases. This was
fixed quite quickly with a quick peek in the postgres manual, and is
something other sites with younger, fresher dcaches then mine probably
won't have to worry about.
Upgrade:
Went really smoothly, no problems. As I said we upgraded using the
"rpm method". RPM kindly backed up our PoolManager.conf, something I
was very thankful for.
Post-Upgrade:
Switched to using yaim. We needed to upgrade to the latest yaim
version, and edit our site-info.defs, but this stage also went without
problems.
Post-post upgrade notes.
After the upgrade we had some troubles. Firstly we ran afoul of a
change in the ports dcache uses, and we had to open the ports 50000 -
52000 on all our nodes to get our dcache to work.
Secondly we had a problem whereby apt auto-updated dcache up a patch
version (from 1.7-16 to 1.7-17) and broke the server (java based
problems). This was fixed by stopping dcache, rerunning the yaim
configure, then restarting dcache. Owen suggested a faster way of
fixing this problem by just running the dcache install.sh instead of a
full yaim configure.
Thirdly the upgrade wipes your custom PoolManager.conf- I simply put
my old one in place and reloaded things in the PoolManager admin cell.
DCache didn't seem to like this too much, and barfed, but after
restarting everything seems fine.
With the upgrade each pool node now has a DCAP door as well as a
gridftp door on it. As I can't really see any advantages of this I'm
probably going to close the DCAP door on the pool nodes.
Finally, we Lancastrians don't appear to have seen the last of the
CLOSE_WAIT problem, we failed RM tests from 4am-10am this morning when
I restarted all the doors. My test transfers all met with the "end of
file" error. I've upped the number of gridftp logins for each door (it
was set at 100, I added an extra 0 to that), hopefully that will
alleviate the problem.
Hope that's useful to you chaps,
cheers,
Matt
|