It might be lcgwms03... Its MySQL DB reached a respectable size of 57GB, and I noticed some 'wait CPU' due to possible IO operations.
Anyway, this server is going to be upgraded next Monday (drain will be starting this Thursday)
Cheers,
Catalin
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Kashif Mohammad
> Sent: 02 October 2012 14:09
> To: [log in to unmask]
> Subject: Re: Nagios error "Cannot take token; reason=1"
>
> Hi Daniela
>
> Its seems that problem is not with your CE but lcgwms03.gridpp through
> which jobs were submitted. Lcgwms03 is in warning state for last few
> hours. Jobs are submitted randomly through different WMS's listed in
> Nagios configuration.
>
> Cheers
> Kashif
>
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Daniela Bauer
> Sent: 02 October 2012 13:56
> To: [log in to unmask]
> Subject: Nagios error "Cannot take token; reason=1"
>
> Hi,
>
> I just received nagios warnings for two of my CEs. The error reads:
>
> CRITICAL: Job was aborted.
> CRITICAL: Job was aborted.
>
> Testing from: gridppnagios.lancs.ac.uk
> DN: /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=kashif
> mohammad/CN=proxy/CN=proxy/CN=proxy/CN=proxy
> VOMS FQANs: /ops/Role=lcgadmin/Capability=NULL,
> /ops/ROC/Role=NULL/Capability=NULL, /ops/Role=NULL/Capability=NULL
> glite-wms-job-status https://lcglb01.gridpp.rl.ac.uk:9000/2CyS0-
> 9e_toUlWW0rh4Y4A
>
>
> ======================= glite-wms-job-status Success
> =====================
> BOOKKEEPING INFORMATION:
>
> Status info for the Job :
> https://lcglb01.gridpp.rl.ac.uk:9000/2CyS0-9e_toUlWW0rh4Y4A
> Current Status: Aborted
> Logged Reason(s):
> - Cannot take token; reason=1; Timeout waiting for server response.
> Closing connection to service. Timeout waiting for server response.
> Closing connection to service. Timeout waiting for server response.
> Closing connection to service. Cannot take token
> - Cannot take token; Timeout waiting for server response. Closing
> connection to service. Timeout waiting for server response. Closing
> connection to service. Timeout waiting for server response. Closing
> connection to service. Cannot take token; reason=1
> Status Reason: hit job shallow retry count (1)
> Destination: ceprod06.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q
> Submitted: Tue Oct 2 12:54:10 2012 BST
> =======================================================================
> ===
> glite-wms-job-logging-info -v 2
> https://lcglb01.gridpp.rl.ac.uk:9000/2CyS0-9e_toUlWW0rh4Y4A
>
>
> As far as I can tell there is nothing wrong with the machines, they
> are not under load and no other hint of trouble.
>
> Does someone have an inkling what is going on here ?
>
> Cheers,
> Daniela
>
> --
> Sent from the pit of despair
>
> -----------------------------------------------------------
> [log in to unmask]
> HEP Group/Physics Dep
> Imperial College
> Tel: +44-(0)20-75947810
> http://www.hep.ph.ic.ac.uk/~dbauer/
--
Scanned by iCritical.
|