Hello Ron,
At CERN we've had a small fraction of our batch nodes with 48 cores for
a over one year now.
Other people have already mentioned some aspects you need to take into
account, these are our observations on those:
- Greater potential for interference between jobs: we haven't seen
problems in these nodes any more than in any other nodes. But I think we
do profit from having a very good mix of different jobs in the system.
- Disk performance: based on the performance we saw on these nodes we
did a small performance study over our most popular HW configurations at
the time. We found that 2 random-IOPS/HEPSPEC06 (*) is the minimum we
should aim at in new nodes. Less than that and the machines get very
sluggish (we start to see increased queue sizes and wait times in the
access to disk, as displayed by iostat)).
- Network performance: the single 1 Gb connection seems to be enough
(but I didn't check extensively).
Overall I think they are a very good solution, particularly when you are
mixing different types of jobs, as long as the disk I/O is sufficient.
Cheers,
Ricardo
(*): By random-IOPS I mean: 4k random r/w, (nr SMT cores/2) jobs, 24
queue depth, 10 GB file per job, as measured with fio. We use local
scratch space only, with RAID0 across all drives in the system.
On 7/10/12 11:53 AM, Gila Arrondo Miguel Angel wrote:
> Hi Ron, instead of getting machines with tons of disks and tons of cores,
> have you thought about getting machines with tons of cores and 1 or 2
> disks plus a shared scratch area?
>
> Here at CSCS we use GPFS for scratch and we get performance values way
> bigger than the aggregate of all our systems using local disks. Previously
> we used Lustre and, although reliability is not as good, performance was
> also in the same range. GPFS is paid, Lustre is free.
>
> Our WNs are 32 core systems with 64GB RAM and 2 disks for OS and CVMFS.
>
> My 2 cents :-)
>
> Miguel
>
> On 7/9/12 5:41 PM, "Andreas Gellrich" <[log in to unmask]> wrote:
>
>> Hi Ron,
>> We successfully run a mixture of WN hardware reaching from
>> 8-core/8-slots via 8-core/16-slots w/ hyperthreading to
>> 48-core/48-slots hosts at DESY-HH.
>>
>> In our experience, it is crucial to intelligently distribute jobs over
>> the hardware in order to make optimal use of the resources.
>>
>> You might be interested in our contribution to CHEP2012 containing the
>> poster and the paper:
>>
>> https://indico.cern.ch/contributionDisplay.py?contribId=290&sessionId=8&co
>> nfId=149557
>>
>> Cheers
>> Andreas Gellrich
>>
>> On Mon, 9 Jul 2012, Ron Trompert wrote:
>>
>>>
>>> Hi All,
>>>
>>> At the moment we are looking at buying new WNs for our compute cluster.
>>> Currently we have a lot of 8-12 core blade servers. At the moment we are
>>> looking at machines with a lot of disk, memory and a lot of cores,
>>> something like 48-64 cores. Now we have the following questions.
>>>
>>> Has anyone got experience with running these kind of nodes in
>>> production, if so, what are your experiences with them? Another question
>>> would be, how are HEP jobs running on them in comparison with other
>>> hardware you may have?
>>>
>>> Cheers and thanks in advance,
>>>
>>> Ron
>>>
>>
>> # Andreas Gellrich
>> # DESY IT / Grid Computing
>> # 2b/317, Notkestr. 85, D-22607 Hamburg, +49 40 8998 2732
|