Frankly I think this method is no different from what APEL is currently
supplying. Sorry Andrew, this was also my objection to this model. As
far as I'm concerned APEL is also better than the RAW cpu unweighted
hours that are currently used in Steve tables.
If we are going through the pain of changing from APEL mapping cpu model
to HS06 for each job is going to be possible only if some Grand Atlas
Master extracts the information from the panda database because at the
moment this information is not in the historical dashboard that Steve
wanted to use and so it is going to be a bit more complicated to set it
up. If Atlas supplies site_name, cpu_id, job_type, cpu_hours we can
supply the HS06 per cpu_id to feed to a script to build Steve metric table.
The problem of Hyperthreading remains. At the moment Manchester and QMUL
are definitely loosing out when the cluster is half full which is often.
The number varies from a max to a min the simplest way could be to use
the average between the max hepspec corresponding to the node up to half
full and the min value when HT kicks in for each job. Of course pilots
could also count how many jobs are concurrently running on the node they
land and we could add that as additional column but I don't see how that
is going to happen.
cheers
alessandra
|