On 12/13/2012 01:22 PM, Torsten Harenberg wrote:
> Dear all,
>
> we are having problems filling out cluster (~1400 slots) with enough pilots from ATLAS, as from time to time, job submission stops with
>
> 012 (3143462.005.000) 12/13 13:05:30 Job was held.
> CREAM error: CREAM_Job_Register Error: MethodName=[jobRegister] ErrorCode=[0] Description=[The CREAM service cannot accept jobs at the moment] FaultCause=[Submissions are disabled!] Timestamp=[Thu 13 Dec 2012 13:05:28]
> Code 0 Subcode 0
> ...
>
> The CREAM CE already has a lot of resources (6 cores, 10 GB RAM) and is under a load of about 2-3.
>
> While trying to find out the cause, I checked glite_cream_load_monitor:
>
> [root@cream-ce ~]# /usr/bin/glite_cream_load_monitor /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --show
> Threshold for Load Average(1 min): 40 => Detected value for Load Average(1 min): 3.68
> Threshold for Load Average(5 min): 40 => Detected value for Load Average(5 min): 4.26
> Threshold for Load Average(15 min): 20 => Detected value for Load Average(15 min): 5.24
> Threshold for Memory Usage: 95 => Detected value for Memory Usage: 76.51%
> Threshold for Swap Usage: 95 => Detected value for Swap Usage: 18.75%
> Threshold for Free FD: 500 => Detected value for Free FD: 988765
> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 0
> Threshold for FTP Connection: 30 => Detected value for FTP Connection: 10
> Threshold for Number of active jobs: -1 => Detected value for Number of active jobs: 1112
> Threshold for Number of pending commands: -1 => Detected value for Number of pending commands: 0
> Threshold for Disk Usage: 95% => Detected value for Partition Eingehängt : %
> Threshold for Disk Usage: 95% => Detected value for Partition / : 22%
>
>
> See the "0" for "Tomcat FD"
>
> this is the relevant Perl code
>
> # Get file descriptor used by tomcat process
> sub get_fdtomcat_usage{
>
> my $tomcatpidfile= "/var/run/tomcat5.pid";
> if (-e $tomcatpidfile)
> {
> my $tomcatpid=`cat /var/run/tomcat5.pid`;
> chomp $tomcatpid;
> my $fdtomcatfile= "/proc/" . $tomcatpid . "/fd";
> $tomcatfd=`ls -l $fdtomcatfile | wc -l`;
> chomp $tomcatfd;
> }
> else {
> $tomcatfd=0;
> }
> }
>
>
> On our system, we only have a /var/log/tomcat6.pid. Is this a known issue?
>
> After changing I have now
>
> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 339
I don't think the job submissions were disabled because of this issue (0
and 339 are < 800).
I suspect instead something related with the "Eingehängt" partition
Try to run:
/usr/bin/glite_cream_load_monitor
/etc/glite-ce-cream-utils/glite_cream_load_monitor.conf --test
and check the exit code
>
> So next question would be how to enlarge this (maybe it's somewhere in the Wiki, I just couldn't find it yet)?
>
https://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI2#3_14_Self_limiting_CREAM_behavio
> Thanks
>
> Torsten
>
>
>
>
>
>
> --
> <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
> <> <>
> <> Dr. Torsten Harenberg [log in to unmask] <>
> <> Bergische Universitaet <>
> <> FB C - Physik Tel.: +49 (0)202 439-3521 <>
> <> Gaussstr. 20 Fax : +49 (0)202 439-2811 <>
> <> 42097 Wuppertal <>
> <> <>
> <><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>
>
|