Dear Users,
I’m very pleased to announce the SCARF18 CPU procurement is now available to all users, it has Intel Xeon Gold 6126 2.60GHz CPUs giving 3552 physical cores over 148 hosts. Each host has a 10GbE ethernet connection, a 100GbE EDR infiniband connection and 192GB of memory (8GB per physical core).
However, because of increasing licensing costs due to the rapid expansion of SCARF in the past few years, we have made the difficult decision to migrate from our existing batch system software LSF to SLURM. While SLURM is relatively new it is already in use by a significant number of large HPC installations worldwide.
As such, SCARF18 is the first set of nodes to be made available via the SLURM batch system. We will be migrating the existing LSF SCARF nodes over to the SLURM service over the next few months, and in doing so will be migrating them to RHEL7 too.
Other than small changes made to implement SLURM, the payload on the SCARF18 nodes is identical to that of our existing RHEL7 service (the scarf-rhel7 queue), therefore we would not expect binaries to need recompilation if they have already been tested on the existing RHEL7 service.
We have some instructions on how to use SLURM available at http://www.scarf.rl.ac.uk/slurm and the SLURM client commands are available on the scarf frontends scarf.rl.ac.uk<http://scarf.rl.ac.uk> and ui3.scarf.rl.ac.uk<http://ui3.scarf.rl.ac.uk> and the test RHEL7 user interface scarf-testui.rl.ac.uk<http://scarf-testui.rl.ac.uk>. Unfortunately the online portal cannot be used to submit jobs to the new service as it is tightly coupled with LSF. We are investigating options for providing a new web based portal to replace the existing portal and hope to make something available soon.
We have also added two new partitions (SLURM’s equivalent of queues) in addition to the default scarf partition (which is configured to match the LSF scarf queue): The devel partition is intended for interactive, development work, and enforces exclusive use of the nodes allocated and a 12 hour maximum runtime. The preemptable partition is for lower priority, re-runnable work - jobs running in this partition may be cancelled if jobs submitted to other partitions need their resources, therefore it is only suitable for jobs which checkpoint frequently or you are prepared to resubmit, however the jobs in this partition count less towards fairshare usage and are able to use any available hardware in SLURM. We hope you find these new partitions useful.
Feedback on any aspect of the SLURM service or the documentation would be greatly appreciated, or if you have any other questions or concerns, please reply to this e-mail and we will get back to you.
Many Thanks,
Derek Ross
—
SCARF Service Manager
########################################################################
To unsubscribe from the SCARF-USERS list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=SCARF-USERS&A=1
|