Print

Print


Incidentally Bristol isn't uniquely in the situation (the overloaded
machines that is), though you need to be a CMS member to read this thread:
https://hypernews.cern.ch/HyperNews/CMS/get/t2/1095.html


On 19 September 2014 16:02, Daniela Bauer <[log in to unmask]
> wrote:

> What happens when you grep for "subject" in
> [batchdir]/*/glide*/execute/dir*/_condor_stdout ?
>
>
> On 19 September 2014 15:40, John Hill <[log in to unmask]> wrote:
>
>> Hi Winnie,
>>    You don't mention which CE you have. On my CREAM CE, /var/log/messages
>> shows the mapping from DN to local user.
>>
>> John
>>
>>
>> On 19/09/2014 15:19, Winnie Lacesso wrote:
>>
>>> Happy Friday!
>>>
>>> About 1/4 Bristol's WN kernel panic'd today & it looks like the culprit
>>> are user jobs, guess, overloading or other badness on the WN - the kernel
>>> panic mentions gfortran & the 8-core WN load hits about 29 before it
>>> bails. The jobs are via cmspil004.
>>>
>>> Some cmspil004 jobs are still running & we seem unable in any cmspil004
>>> working dir to find the real CMS user DN, & we're usually pretty good at
>>> being able to do that (I can for lhcb pilot jobs no problem).
>>>
>>> We emailed some CMS contacts & they said the real user DN must be in the
>>> glexec logs. On the WN, /var/log/glexec/* files are all empty. On the
>>> CREAM CE /var/log/glexec does not exist &
>>> /var/blah/user_blah_job_registry.bjr/registry.proxydir points to pilot
>>> proxies - again no real user DN info.
>>>
>>> I'm not very familiar with the argus server & don't see any logs in
>>> /var/log/argus/* that look like they contain DNs. But said logs must be
>>> *somewhere* on it.....?
>>>
>>> So we should be able to trace via glexec info from the pilot job arriving
>>> at CE, to WN, & find out the DN of a real user's job; the pool account
>>> cmspil004 can run jobs for many CMS users, we just want to identify this
>>> one....
>>>
>>> Is there some (ideally clear & easy) guidance "out there" for how to do
>>> this? I've been away from LCG support for 2 yrs so may've missed it if
>>> it's well known "out there" somewhere.
>>>
>>> Winnie Lacesso / Bristol University Particle Physics Computing Systems
>>> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
>>>
>>>
>
>
> --
> Sent from the pit of despair
>
> -----------------------------------------------------------
> [log in to unmask]
> HEP Group/Physics Dep
> Imperial College
> London, SW7 2BW
> Tel: +44-(0)20-75947810
> http://www.hep.ph.ic.ac.uk/~dbauer/
>



-- 
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/