> 3. dCache experiences (mainly Tier 1?) --
> stability/performance problems?
I may not be at tomorrows phone conference, depending on Taxi pick up time for a
trip to CERN.
regarding this item.
- We have found that DCACHE itself has been very stable. Would that we could say the same
about NFS. Unfortunatly we choose to provision our DCACHE pool nodes via NFS rather
than native DCACHE (to decouple DCACHE development from disk server O/S issues). Would
that we had not - we have suffered many NFS hangs on SL3, taking out a number of services
including the proto-DCACHE. Derek moved to DCACHE native on the disk servers last Friday
and all looks to be stable now. We will form a more measured view if we run sucessfully
for at least a week.
- Regarding performance. We are seeing pretty good performance for our friends from CMS.
DCACHE is sustaining 200Mb/s (thats about 25MB/s) over many hours for file replication
from CERN. This is as good as CMS are seeing anywhere else. From our analysis it looks
like there is substantial headroom in our network infrastructure/disk servers. At present
we only have 3 deployed or CMS and they tend to be idle mich of the time. We don't know
what limit CMS, but suspect the catalogue manipulation at he CERN end - or maybe a
combination of manipulations in many places. I suspect the RAL DCACHE has plenty of
headroom for other experiments - time will tell.
We are rather worried about the dcache head node/SRM. during CMS production its running
at at least 50% CPU, manipulation of the SRM state D/B on postgress seems very heavy
and maybe we have some tuning to do. This is potentially a problem if transaction
rate on the SRM increases. Derek is in the process of splitting off the postgress
D/B onto a standalone node with access via a network client.
- Jens has made some measurements for smaller files which maybe he will circulate.
I don't have exact details of conditions/performance but
I note that those tests were done into a single disk pool on the head node rather than
implemented on one of the big disk arrays. Given our discovery of the headnodes state
of business hitting it with files will not help its performance.
- We will deploy another instance of DCACHE for the March service challenge. This will
have up to 16 gridftp transfer nodes and 4 disk servers (8 RAID arrays). We are aiming
to peak at 2gb/s and sustain 1 gbit/s for 2 weeks.
Regards
Andrew
|