Hi Rod,
Am 15.12.2012 um 12:32 schrieb Rodney Walker <[log in to unmask]>:
> Torsten can you try increasing
> FTPConn = 30
I did already:
[root@cream-ce cream]# cat /etc/glite-ce-cream-utils/glite_cream_load_monitor.conf
# Thresholds for glite_cream_load_monitor
# -1 means no limit
#
Load1 = 40
Load5 = 40
Load15 = 20
MemUsage = 95
SwapUsage = 95
FDNum = 500
DiskUsage = 95
FTPConn = 150
FDTomcatNum = 800
ActiveJobs = -1
PendingCmds = -1
[root@cream-ce cream]#
But I see also something different (now I know what to look for):
[root@cream-ce cream]# zgrep "gliteCreamLoadMonitor" glite-ce-cream.log*
glite-ce-cream.log.12:15 Dec 2012 10:58:22,953 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.12:15 Dec 2012 11:08:23,003 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.13:15 Dec 2012 10:38:24,829 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.13:15 Dec 2012 10:48:24,511 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.14:15 Dec 2012 10:18:23,277 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.14:15 Dec 2012 10:28:22,800 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.15:15 Dec 2012 09:58:22,815 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.15:15 Dec 2012 10:08:23,148 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.16:15 Dec 2012 09:28:22,450 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.16:15 Dec 2012 09:38:22,578 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.16:15 Dec 2012 09:48:23,112 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.17:15 Dec 2012 09:08:22,355 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.17:15 Dec 2012 09:18:22,263 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.18:15 Dec 2012 08:48:22,697 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.18:15 Dec 2012 08:58:22,451 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.19:15 Dec 2012 08:28:22,411 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.19:15 Dec 2012 08:38:22,088 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.20:15 Dec 2012 08:08:22,959 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = ls: cannot access /proc/14261/fd/354: No such file or directory
glite-ce-cream.log.20:15 Dec 2012 08:18:21,767 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for Swap Usage: 95 => Detected value for Swap Usage: 99.99%
glite-ce-cream.log.6:15 Dec 2012 12:03:43,963 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for FTP Connection: 30 => Detected value for FTP Connection: 47
glite-ce-cream.log.9:15 Dec 2012 11:33:11,203 org.glite.ce.creamapi.jobmanagement.cmdexecutor.JobSubmissionManager - gliteCreamLoadMonitor: exitCode = 1 messageError = Threshold for FTP Connection: 30 => Detected value for FTP Connection: 51
The machine has 10 GB of RAM already plus 4 GB of swap, I will enlarge the RAM now to 20 GB - which is quite a bit considering we are "only" a mid-size Tier-2.
I cannot identify a single process eating up all the memory at the moment, but the machine tons of "blahpd" running which keep 1.5% of the RAM (if it's not shared, that would mean 1.5 GB each).
30492 tomcat 20 0 5668m 463m 5544 S 131.0 4.7 20:13.56 java
1995 tomcat 20 0 239m 146m 1484 S 0.0 1.5 0:09.05 blahpd
1929 tomcat 20 0 239m 146m 1484 S 4.3 1.5 0:13.01 blahpd
2108 tomcat 20 0 239m 146m 1488 S 0.0 1.5 0:09.34 blahpd
10013 tomcat 20 0 239m 146m 1492 S 0.0 1.5 0:10.38 blahpd
32728 tomcat 20 0 239m 146m 1480 S 2.6 1.5 0:10.34 blahpd
30373 tomcat 20 0 151m 145m 468 S 63.5 1.5 5:45.59 BUpdaterSGE
13509 tomcat 20 0 236m 144m 1476 S 0.0 1.5 0:07.51 blahpd
9726 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:07.37 blahpd
9737 tomcat 20 0 236m 144m 1484 S 0.0 1.5 0:07.62 blahpd
1178 tomcat 20 0 236m 144m 1480 S 1.3 1.5 0:05.66 blahpd
1187 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:06.61 blahpd
1064 tomcat 20 0 236m 144m 1480 S 0.3 1.5 0:07.88 blahpd
1142 tomcat 20 0 236m 144m 1480 S 1.7 1.5 0:07.56 blahpd
1149 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:06.85 blahpd
1169 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:06.41 blahpd
1173 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:07.49 blahpd
1176 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:07.25 blahpd
1184 tomcat 20 0 236m 144m 1480 S 0.0 1.5 0:07.16 blahpd
1136 tomcat 20 0 236m 144m 1476 S 0.7 1.5 0:06.87 blahpd
1145 tomcat 20 0 236m 144m 1476 S 0.0 1.5 0:06.44 blahpd
1151 tomcat 20 0 236m 144m 1476 S 0.0 1.5 0:08.42 blahpd
Cheers
Torsten
--
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> <>
<> Dr. Torsten Harenberg [log in to unmask] <>
<> Bergische Universitaet <>
<> FB C - Physik Tel.: +49 (0)202 439-3521 <>
<> Gaussstr. 20 Fax : +49 (0)202 439-2811 <>
<> 42097 Wuppertal <>
<> <>
<><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>
|