Print

Print


Incidentally Bristol isn't uniquely in the situation (the overloaded machines that is), though you need to be a CMS member to read this thread:
https://hypernews.cern.ch/HyperNews/CMS/get/t2/1095.html


On 19 September 2014 16:02, Daniela Bauer <[log in to unmask]> wrote:
What happens when you grep for "subject" in [batchdir]/*/glide*/execute/dir*/_condor_stdout ?


On 19 September 2014 15:40, John Hill <[log in to unmask]> wrote:
Hi Winnie,
   You don't mention which CE you have. On my CREAM CE, /var/log/messages shows the mapping from DN to local user.

John


On 19/09/2014 15:19, Winnie Lacesso wrote:
Happy Friday!

About 1/4 Bristol's WN kernel panic'd today & it looks like the culprit
are user jobs, guess, overloading or other badness on the WN - the kernel
panic mentions gfortran & the 8-core WN load hits about 29 before it
bails. The jobs are via cmspil004.

Some cmspil004 jobs are still running & we seem unable in any cmspil004
working dir to find the real CMS user DN, & we're usually pretty good at
being able to do that (I can for lhcb pilot jobs no problem).

We emailed some CMS contacts & they said the real user DN must be in the
glexec logs. On the WN, /var/log/glexec/* files are all empty. On the
CREAM CE /var/log/glexec does not exist &
/var/blah/user_blah_job_registry.bjr/registry.proxydir points to pilot
proxies - again no real user DN info.

I'm not very familiar with the argus server & don't see any logs in
/var/log/argus/* that look like they contain DNs. But said logs must be
*somewhere* on it.....?

So we should be able to trace via glexec info from the pilot job arriving
at CE, to WN, & find out the DN of a real user's job; the pool account
cmspil004 can run jobs for many CMS users, we just want to identify this
one....

Is there some (ideally clear & easy) guidance "out there" for how to do
this? I've been away from LCG support for 2 yrs so may've missed it if
it's well known "out there" somewhere.

Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK




--
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/



--
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/