What happens when you grep for "subject" in [batchdir]/*/glide*/execute/dir*/_condor_stdout ?--On 19 September 2014 15:40, John Hill <[log in to unmask]> wrote:Hi Winnie,
You don't mention which CE you have. On my CREAM CE, /var/log/messages shows the mapping from DN to local user.
John
On 19/09/2014 15:19, Winnie Lacesso wrote:
Happy Friday!
About 1/4 Bristol's WN kernel panic'd today & it looks like the culprit
are user jobs, guess, overloading or other badness on the WN - the kernel
panic mentions gfortran & the 8-core WN load hits about 29 before it
bails. The jobs are via cmspil004.
Some cmspil004 jobs are still running & we seem unable in any cmspil004
working dir to find the real CMS user DN, & we're usually pretty good at
being able to do that (I can for lhcb pilot jobs no problem).
We emailed some CMS contacts & they said the real user DN must be in the
glexec logs. On the WN, /var/log/glexec/* files are all empty. On the
CREAM CE /var/log/glexec does not exist &
/var/blah/user_blah_job_registry.bjr/registry.proxydir points to pilot
proxies - again no real user DN info.
I'm not very familiar with the argus server & don't see any logs in
/var/log/argus/* that look like they contain DNs. But said logs must be
*somewhere* on it.....?
So we should be able to trace via glexec info from the pilot job arriving
at CE, to WN, & find out the DN of a real user's job; the pool account
cmspil004 can run jobs for many CMS users, we just want to identify this
one....
Is there some (ideally clear & easy) guidance "out there" for how to do
this? I've been away from LCG support for 2 yrs so may've missed it if
it's well known "out there" somewhere.
Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
Sent from the pit of despair
-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/