> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Alastair Dewhurst
>
> I was told there was a request at the dteam meeting for an update on the
> progress of the ARC CE + Condor testing at RAL.
>
I think (at the risk of it just being a bit late now) that I'd
like to hear a bit more about is what happened to SLURM. Andrew's
talk at HepSysMan referred to it having scaling problems at
about 6000 jobs on 100 nodes, but the LLNL page on SLURM
describes it having run on clusters with well over a hundred
thousand cores, and with very high jobs start rates.
There are some documented 'tweaks' needed to get SLURM to run
on very large systems; assuming those were done it seems very
strange that RAL hit scale problems on a system that's so much
smaller.
Ewan
|