Those sites which have only one type of CPU should be able to get
identical results just by scaling the APEL numbers. It would be good to
do this as a comparison. Check njobs/sumcpu, and elapsedcpu.
One issue is that APEL only matches grid jobs. A site who has local
people in the same groups will obviously see different results. If your
local users are in different groups then you would expect the APEL
results to be the same.
John
-----Original Message-----
From: Testbed Support for GridPP member institutes
[mailto:[log in to unmask]] On Behalf Of Mike Kenyon
Sent: 05 June 2009 13:02
To: [log in to unmask]
Subject: PBS accounting script
Hi All,
As requested, I've knocked a script together that goes through the pbs
logs and calculates the CPU (or wall) time delivered per group in
HEPSPEChours, where group in this context is a unix group, such as
atlas or atlasprd etc. If TB-support allows attachments, then the
file's attached here. Yeah it's hacky, but it grew up from a
grep-awk-sed one-liner.
Running the script is fairly easy (at Glasgow ;-) for example, to get
the CPU-delivered (in HEPSPEChours and hours) of all jobs since Oct
1st 2008, I run
./account.py -f 20081001 -l 20090604
to get the walltime instead, I'd have added the -w flag
There's a --help option available and the code's fairly well
commented. Officially this is unsupported, so, in the words of our T2
coordinator - "if it breaks, you get to keep both parts".
The following assumptions are made:
Your logs live at /var/spool/pbs/server_priv/accounting/ (see --help)
and are named YYYYMMDD
Your worker nodes are named nodeXXX (hack if they aren't)
You have up to two sub-clusters with different HEPSPEC scores. If you
don't have sub-clusters, and all nodes have the same HEPSPEC score,
set hepSpec1core and hepSpec2core to be equal in the code. If you have
more than two sub-clusters, you'll have to do deeper fiddling.
If there are any major bugs/problems, let me know...otherwise, enjoy.
Cheers,
Mike.
--
Scanned by iCritical.
|