Incidentally Bristol isn't uniquely in the situation (the overloaded machines that is), though you need to be a CMS member to read this thread: https://hypernews.cern.ch/HyperNews/CMS/get/t2/1095.html On 19 September 2014 16:02, Daniela Bauer <[log in to unmask] > wrote: > What happens when you grep for "subject" in > [batchdir]/*/glide*/execute/dir*/_condor_stdout ? > > > On 19 September 2014 15:40, John Hill <[log in to unmask]> wrote: > >> Hi Winnie, >> You don't mention which CE you have. On my CREAM CE, /var/log/messages >> shows the mapping from DN to local user. >> >> John >> >> >> On 19/09/2014 15:19, Winnie Lacesso wrote: >> >>> Happy Friday! >>> >>> About 1/4 Bristol's WN kernel panic'd today & it looks like the culprit >>> are user jobs, guess, overloading or other badness on the WN - the kernel >>> panic mentions gfortran & the 8-core WN load hits about 29 before it >>> bails. The jobs are via cmspil004. >>> >>> Some cmspil004 jobs are still running & we seem unable in any cmspil004 >>> working dir to find the real CMS user DN, & we're usually pretty good at >>> being able to do that (I can for lhcb pilot jobs no problem). >>> >>> We emailed some CMS contacts & they said the real user DN must be in the >>> glexec logs. On the WN, /var/log/glexec/* files are all empty. On the >>> CREAM CE /var/log/glexec does not exist & >>> /var/blah/user_blah_job_registry.bjr/registry.proxydir points to pilot >>> proxies - again no real user DN info. >>> >>> I'm not very familiar with the argus server & don't see any logs in >>> /var/log/argus/* that look like they contain DNs. But said logs must be >>> *somewhere* on it.....? >>> >>> So we should be able to trace via glexec info from the pilot job arriving >>> at CE, to WN, & find out the DN of a real user's job; the pool account >>> cmspil004 can run jobs for many CMS users, we just want to identify this >>> one.... >>> >>> Is there some (ideally clear & easy) guidance "out there" for how to do >>> this? I've been away from LCG support for 2 yrs so may've missed it if >>> it's well known "out there" somewhere. >>> >>> Winnie Lacesso / Bristol University Particle Physics Computing Systems >>> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK >>> >>> > > > -- > Sent from the pit of despair > > ----------------------------------------------------------- > [log in to unmask] > HEP Group/Physics Dep > Imperial College > London, SW7 2BW > Tel: +44-(0)20-75947810 > http://www.hep.ph.ic.ac.uk/~dbauer/ > -- Sent from the pit of despair ----------------------------------------------------------- [log in to unmask] HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/