I have increased the number to 100 (it was 50) and no more queue :-)
10:13 creamce.gina.sara.nl:/usr/share/tomcat5
tomcat$ /opt/glite/bin/glite_cream_load_monitor --show
Threshold for Load Average(1 min): 40 => Detected value for Load Average(1
min): 1.46
Threshold for Load Average(5 min): 40 => Detected value for Load Average(5
min): 1.75
Threshold for Load Average(15 min): 20 => Detected value for Load Average(15
min): 1.08
Threshold for Memory Usage: 95 => Detected value for Memory Usage: 20.39%
Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
Threshold for Free FD: 500 => Detected value for Free FD: 2387483
Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 314
Threshold for FTP Connection: 30 => Detected value for FTP Connection: 2
Threshold for Number of active jobs: -1 => Detected value for Number of active
jobs: 5649
Threshold for Number of pending commands: -1 => Detected value for Number of
pending commands: 0
Threshold for Disk Usage: 95% => Detected value for Partition / : 36%
The load and memory usage did increase. I have not lowered the purge_interval,
since the node it is currently running on has more than enough memory, 24GB.
A you can see here:
http://ganglia.sara.nl/?c=GINA%20Cluster&h=creamce.gina.sara.nl&m=load_one&r=hour&s=descending&hc=4&mc=2
there is a little bump in the memory (and cpu) usage but still low in respect
to the total amount available.
Thanks for the help!
Cheers,
Maarten
On Thursday 14 October 2010 09:54:41 Massimo Sgaravatto - INFN Padova wrote:
> On Thu, 14 Oct 2010, Maarten van Ingen wrote:
> > Can we increase the number of threads?
>
> Yes, the relevant parameter is cream_concurrency_level in
> /opt/glite/etc/glite-ce-cream/cream-config.xml
>
> Then restart tomcat
>
> If you are using the new blah blparser and if you are going to increase
> the number of threads, it is also suggested to decrease
> (i.e. set to 2500000) the value for purge_interval in
> /opt/glite/etc/blah.config to reduce memory usage.
> Then kill the bupdater process (a newer one will be started automatically)
>
>
> Cheers, Massimo
>
> > The node has little to no load so there
> > should be room for more threads. If possible, would this help to reduce
> > the number of commands in the queue?
> >
> > If not, is there any way to work around this problem until the update is
> > available?
> >
> > Cheers,
> > Maarten
> >
> > On Wednesday 13 October 2010 16:37:47 Massimo Sgaravatto - INFN Padova
wrote:
> >> The submit command is actually divided in 2 operations: the register and
> >> the start.
> >> The former is a synchronos operation, the latter is an asyncronous
> >> operation.
> >> Asynchronous commands are stored in a queue, from which they are taken
> >> and processed by a pool of threades
> >>
> >> In your CE there is a queue (not too long) of commands to be
> >> processed. This explains why the start operation is not done immediately
> >> at submission time.
> >>
> >>
> >> In your CE most of the commands in the queue are proxy renewal
> >> operations.
> >>
> >> With CREAM CE 1.6.3 (that we plan to certify by the end of this month)
> >> there will the fix for this bug:
> >>
> >> http://savannah.cern.ch/bugs/?73765
> >>
> >> and this should help a lot in keeping the queue of commands short.
> >>
> >> Cheers, Massimo
> >>
> >> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>> The grep:
> >>>
> >>> (of course this example does go one step further...)
> >>>
> >>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,234 INFO
> >>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>> (AbstractJobExecutor.java:826) - (Worker Thread 11)
> >>> REMOTE_REQUEST_ADDRESS=145.100.5.194;
> >>> USER_DN=/O=dutchgrid/O=users/O=sara/CN=Maarten Hendrik van Ingen;
> >>> USER_FQAN={ /pvier/Role=NULL/Capability=NULL;
> >>> /pvier/infra/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START;
> >>> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
> >>> commandName=JOB_START; cmdExecutorName=BLAHExecutor;
> >>> userId=_O_dutchgrid_O_users_O_sara_CN_Maarten_Hendrik_van_Ingen_pvier_R
> >>> o le_NULL_Capability_NULL; jobId=CREAM598295305; status=PROCESSING;
> >>>
> >>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,287 INFO
> >>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>> (AbstractJobExecutor.java:2094) - (Worker Thread 11) JOB CREAM598295305
> >>> STATUS CHANGED: REGISTERED => PENDING [localUser=pvi032]
> >>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>
> >>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,553 INFO
> >>> org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:232)
> >>> - (http-8443-Processor19) Job inserted. JobId = CREAM598295305
> >>>
> >>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,661 INFO
> >>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>
> >>> /opt/glite/bin/glite_cream_load_monitor --show:
> >>>
> >>> Threshold for Load Average(1 min): 40 => Detected value for Load
> >>> Average(1 min): 1.06
> >>>
> >>> Threshold for Load Average(5 min): 40 => Detected value for Load
> >>> Average(5 min): 0.97
> >>>
> >>> Threshold for Load Average(15 min): 20 => Detected value for Load
> >>> Average(15 min): 0.69
> >>>
> >>> Threshold for Memory Usage: 95 => Detected value for Memory Usage:
> >>> 17.57%
> >>>
> >>> Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
> >>>
> >>> Threshold for Free FD: 500 => Detected value for Free FD: 2386973
> >>>
> >>> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 269
> >>>
> >>> Threshold for FTP Connection: 30 => Detected value for FTP Connection:
> >>> 1
> >>>
> >>> Threshold for Number of active jobs: -1 => Detected value for Number of
> >>> active jobs: 5866
> >>>
> >>> Threshold for Number of pending commands: -1 => Detected value for
> >>> Number of pending commands: 431
> >>>
> >>> Threshold for Disk Usage: 95% => Detected value for Partition / : 35%
> >>>
> >>> SQL:
> >>>
> >>> mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c
> >>> where
> >>>
> >>> -> jm.commandId =c.id order by c.creationTime limit 20;
> >>>
> >>> +----------------+---------------------+
> >>>
> >>> | name | creationTime |
> >>>
> >>> +----------------+---------------------+
> >>>
> >>> | SET_JOB_STATUS | 2010-10-13 14:33:02 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:33:09 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:33:59 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:34:00 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:35:54 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:37:11 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:39:57 |
> >>> |
> >>> | PROXY_RENEW | 2010-10-13 14:40:41 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:45:10 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:46:11 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:46:14 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:46:17 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:49:15 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:49:16 |
> >>> |
> >>> | SET_JOB_STATUS | 2010-10-13 14:50:19 |
> >>> |
> >>> | JOB_START | 2010-10-13 14:53:37 |
> >>>
> >>> +----------------+---------------------+
> >>>
> >>> 20 rows in set (0.00 sec)
> >>>
> >>> mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c
> >>> where
> >>>
> >>> -> jm.commandId =c.id group by c.name;
> >>>
> >>> +---------------------------+---------------+
> >>>
> >>> | name | count(c.name) |
> >>>
> >>> +---------------------------+---------------+
> >>>
> >>> | COPY_NEW_PROXY_TO_SANDBOX | 3 |
> >>> |
> >>> | JOB_PURGE | 61 |
> >>> |
> >>> | JOB_START | 190 |
> >>> |
> >>> | PROXY_RENEW | 539 |
> >>> |
> >>> | SET_JOB_STATUS | 195 |
> >>>
> >>> +---------------------------+---------------+
> >>>
> >>> 5 rows in set (0.00 sec)
> >>>
> >>> Cheers,
> >>>
> >>> Maarten
> >>>
> >>> On Wednesday 13 October 2010 16:14:49 Massimo Sgaravatto - INFN Padova
> >
> > wrote:
> >>>> What does:
> >>>>
> >>>>
> >>>>
> >>>> grep -i 598295305 /opt/glite/var/log/glite-ce-cream.log*
> >>>>
> >>>>
> >>>>
> >>>> report ?
> >>>>
> >>>>
> >>>>
> >>>> Can you please issue this command on the CREAM CE as user tomcat:
> >>>>
> >>>>
> >>>>
> >>>> /opt/glite/bin/glite_cream_load_monitor --show
> >>>>
> >>>>
> >>>>
> >>>> ?
> >>>>
> >>>>
> >>>>
> >>>> Is there a huge number of "Detected value for Number of pending
> >>>>
> >>>> commands" ?
> >>>>
> >>>> If so, can you please issue these mysql commands ?
> >>>>
> >>>>
> >>>>
> >>>> use creamdb;
> >>>>
> >>>> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c where
> >>>>
> >>>> jm.commandId =c.id order by c.creationTime limit 20;
> >>>>
> >>>>
> >>>>
> >>>> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c where
> >>>>
> >>>> jm.commandId =c.id group by c.name;
> >>>>
> >>>>
> >>>>
> >>>> Cheers, Massimo
> >>>>
> >>>> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>>
> >>>>> One of our creamce keeps jobs in registered state and many will not
> >>>>> come
> >>>>>
> >>>>> out of it.
> >>>>>
> >>>>> Sometimes they will get through, but this could take some hours.
> >>>>>
> >>>>>
> >>>>>
> >>>>> For example this job:
> >>>>>
> >>>>> maarten$ glite-ce-job-submit -a -r
> >>>>>
> >>>>> creamce.gina.sara.nl:8443/cream-pbs-infra ./gina
> >>>>>
> >>>>> 2010-10-13 15:47:25,246 WARN - No configuration file suitable for
> >>>>>
> >>>>> loading. Using built-in configuration
> >>>>>
> >>>>> https://creamce.gina.sara.nl:8443/CREAM598295305
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> maarten$ glite-ce-job-status
> >>>>>
> >>>>> https://creamce.gina.sara.nl:8443/CREAM598295305 2010-10-13
> >>>>> 15:49:12,791
> >>>>>
> >>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>
> >>>>> configuration
> >>>>>
> >>>>>
> >>>>>
> >>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM598295305]
> >>>>>
> >>>>>
> >>>>>
> >>>>> Status = [REGISTERED]
> >>>>>
> >>>>>
> >>>>>
> >>>>> When I have a look into the logging, all I can find is this:
> >>>>>
> >>>>> root# grep 598295305 glite-ce-cream.log
> >>>>>
> >>>>> 13 Oct 2010 15:47:27,553 INFO
> >>>>>
> >>>>> org.glite.ce.cream.jobmanagement.db.table.JobTable
> >>>>> (JobTable.java:232) -
> >>>>>
> >>>>> (http-8443-Processor19) Job inserted. JobId = CREAM598295305
> >>>>>
> >>>>> 13 Oct 2010 15:47:27,661 INFO
> >>>>>
> >>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>
> >>>>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>>>>
> >>>>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>>>>
> >>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> The jdl used is the same as I use to submit to a wms (hence the
> >>>>>
> >>>>> "Requirements" part):
> >>>>>
> >>>>>
> >>>>>
> >>>>> Executable = "/bin/env";
> >>>>>
> >>>>> Arguments = "| /bin/mail -s $(hostname) [log in to unmask]";
> >>>>>
> >>>>> Stdoutput = "message.txt";
> >>>>>
> >>>>> StdError = "stderror";
> >>>>>
> >>>>> Requirements = other.GlueCEUniqueID ==
> >>>>>
> >>>>> "creamce.gina.sara.nl:8443/cream-pbs- infra";
> >>>>>
> >>>>> RetryCount=0;
> >>>>>
> >>>>> ShallowRetryCount=0;
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Also when I use bogus information for the requested queue it stays in
> >>>>> the
> >>>>>
> >>>>> REGISTERED state.:
> >>>>>
> >>>>>
> >>>>>
> >>>>> maarten$ glite-ce-job-submit -a -r
> >>>>> creamce.gina.sara.nl:8443/cream-pbs-
> >>>>>
> >>>>> thisisbogus ./gina
> >>>>>
> >>>>> 2010-10-13 15:57:55,017 WARN - No configuration file suitable for
> >>>>>
> >>>>> loading. Using built-in configuration
> >>>>>
> >>>>> https://creamce.gina.sara.nl:8443/CREAM392820764
> >>>>>
> >>>>>
> >>>>>
> >>>>> maarten$ glite-ce-job-status
> >>>>>
> >>>>> https://creamce.gina.sara.nl:8443/CREAM392820764 2010-10-13
> >>>>> 15:58:08,130
> >>>>>
> >>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>
> >>>>> configuration
> >>>>>
> >>>>>
> >>>>>
> >>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM392820764]
> >>>>>
> >>>>>
> >>>>>
> >>>>> Status = [REGISTERED]
> >>>>>
> >>>>>
> >>>>>
> >>>>> Anyone got an idea on whats going on?
> >>>>>
> >>>>> I have the feeling this is something small I am overlooking :-) but
> >>>>> it
> >>>>>
> >>>>> keeps me busy.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Maarten
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> SARA Computing and Networking Services
> >>>>>
> >>>>> PO Box 94613
> >>>>>
> >>>>> 1090 GP Amsterdam, Netherlands
> >>>>>
> >>>>>
> >>>>>
> >>>>> Tel: +31 (0)20 592 3000
> >>>>>
> >>>>> Fax: +31 (0)20 668 3167
> >>>>
> >>>> \|||/
> >>>>
> >>>> -----------0oo----( o o )----oo0-------------------
> >>>>
> >>>> (_)
> >>>>
> >>>> INFN Sezione di Padova
> >>>>
> >>>> Via Marzolo, 8
> >>>>
> >>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >>>>
> >>>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >>>>
> >>>> Fax: ++39 0498275952
> >>>
> >>> --
> >>>
> >>> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>>
> >>> SARA Computing and Networking Services
> >>>
> >>> PO Box 94613
> >>>
> >>> 1090 GP Amsterdam, Netherlands
> >>>
> >>> Tel: +31 (0)20 592 3000
> >>>
> >>> Fax: +31 (0)20 668 3167
> >>>
> >> \|||/
> >>
> >> -----------0oo----( o o )----oo0-------------------
> >>
> >> (_)
> >>
> >> INFN Sezione di Padova
> >> Via Marzolo, 8
> >> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >> Fax: ++39 0498275952
> >
> > --
> > ing. M.H. van Ingen, HPC&V Systems Programmer
> >
> > SARA Computing and Networking Services
> > PO Box 94613
> > 1090 GP Amsterdam, Netherlands
> >
> > Tel: +31 (0)20 592 3000
> > Fax: +31 (0)20 668 3167
>
> \|||/
> -----------0oo----( o o )----oo0-------------------
> (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> Fax: ++39 0498275952
--
ing. M.H. van Ingen, HPC&V Systems Programmer
SARA Computing and Networking Services
PO Box 94613
1090 GP Amsterdam, Netherlands
Tel: +31 (0)20 592 3000
Fax: +31 (0)20 668 3167
|