Hi Simon,
> 1) Hopefully this is the easy one: does someone know why RHUL is
> reporting zero disk space in our DPM on
> http://www.gridpp.ac.uk/storage/status/gridppDiscStatus.html
> - is it just because our site is in down time at the moment?
Yep, your DPM is definitely reporting 0 used/available space:
$ ldapsearch -x -H ldap://ce1.pp.rhul.ac.uk:2170 -b
mds-vo-name=UKI-LT2-RHUL,o=grid|grep Space
GlueSAStateAvailableSpace: 0
GlueSAStateUsedSpace: 0
GlueSAStateAvailableSpace: 0
....
I just tried srmcp'ing a file into your DPM to check it's functionality
and it is reporting that there is no space left:
Wed Oct 18 18:21:56 BST 2006: rs.state = Failed rs.error = No space left
on device
Wed Oct 18 18:21:56 BST 2006: ====> fileStatus state ==Failed
java.io.IOException: rs.state = Failed rs.error = No space left on device
Can you run a dpm-qryconf on the node and send me the output. If there are
a lot of dteam files in your DPM we should consider deleting them.
> 2) The plans we have based on our SRIF3 budget and GridPP3 mean that we
> will have to design, procure, deploy and manage a SRM that rises to
> about 260 TB in the next few years (to get the 1:2 ratio with 500kSI2k).
>
> I notice from the disc status page that the largest DPM is currently 18
> TB (at QMUL). This is a massive step up in size and I am pretty sure
> that our current setup does not scale to that size, in terms of
> availability and maintenance effort if not performance.
>
> Based on our current system I could buy 38 7TB (net) RAID6 systems to
> set up as DPM pool nodes. Has anyone got any better advice? I was
> wondering about things like SAN storage or multi-way SCSI disk systems
> hooked up to redundant pool nodes, redundant head nodes, etc. but I have
> no experience of this sort of stuff - neither performance, cost,
> practicalities, etc. Is DPM still the right SRM for this scale?
> Anyone found a vendor who can provide a complete solution for something
> like this?
There's no intrinsic limit on the amount of storage that DPM can support.
That being said, I've no idea (and I'm not sure anyone else does) on how
it will scale to the level of storage that you are talking about. I would
like to be able to do some testing of such a system, unfortunately I've
not got a spare 200TB lying around!
dCache has generally been used at sites with larger amounts of storage.
For example, at FNAL it operates with more than 300TB of disk and
continues to grow. Of course, this is at a large site with lots of admin
manpower and where some of the dCache developers are based. As you can
imagine, the available support may be slightly higher compared to a
Tier-2.
Regarding the hardware to use, I don't think anyone has all the answers
right now. At Edinburgh we plan to tap into the university run SAN,
attaching to it via fibre channel. However this is only for a few 10's of
TB, not the 100's that you are talking about.
> I imagine a lot of people will be scaling up a similar amount over the
> next few years so hopefully this is a question of general interest.
Yes, it definitely is. Thanks for raising it.
Cheers,
Greig
|