Print

Print


The grep:

(of course this example does go one step further...)

/opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,234 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:826) - (Worker Thread 11) REMOTE_REQUEST_ADDRESS=145.100.5.194; USER_DN=/O=dutchgrid/O=users/O=sara/CN=Maarten Hendrik van Ingen; USER_FQAN={ /pvier/Role=NULL/Capability=NULL; /pvier/infra/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; cmdExecutorName=BLAHExecutor; userId=_O_dutchgrid_O_users_O_sara_CN_Maarten_Hendrik_van_Ingen_pvier_Role_NULL_Capability_NULL; jobId=CREAM598295305; status=PROCESSING;

/opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,287 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:2094) - (Worker Thread 11) JOB CREAM598295305 STATUS CHANGED: REGISTERED => PENDING [localUser=pvi032] [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]

/opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,553 INFO org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:232) - (http-8443-Processor19) Job inserted. JobId = CREAM598295305

/opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,661 INFO org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032] [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]

/opt/glite/bin/glite_cream_load_monitor --show:

Threshold for Load Average(1 min): 40 => Detected value for Load Average(1 min): 1.06

Threshold for Load Average(5 min): 40 => Detected value for Load Average(5 min): 0.97

Threshold for Load Average(15 min): 20 => Detected value for Load Average(15 min): 0.69

Threshold for Memory Usage: 95 => Detected value for Memory Usage: 17.57%

Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%

Threshold for Free FD: 500 => Detected value for Free FD: 2386973

Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 269

Threshold for FTP Connection: 30 => Detected value for FTP Connection: 1

Threshold for Number of active jobs: -1 => Detected value for Number of active jobs: 5866

Threshold for Number of pending commands: -1 => Detected value for Number of pending commands: 431

Threshold for Disk Usage: 95% => Detected value for Partition / : 35%

SQL:

mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c where

-> jm.commandId =c.id order by c.creationTime limit 20;

+----------------+---------------------+

| name | creationTime |

+----------------+---------------------+

| SET_JOB_STATUS | 2010-10-13 14:33:02 |

| SET_JOB_STATUS | 2010-10-13 14:33:09 |

| SET_JOB_STATUS | 2010-10-13 14:33:59 |

| SET_JOB_STATUS | 2010-10-13 14:34:00 |

| SET_JOB_STATUS | 2010-10-13 14:35:54 |

| SET_JOB_STATUS | 2010-10-13 14:37:11 |

| SET_JOB_STATUS | 2010-10-13 14:39:57 |

| PROXY_RENEW | 2010-10-13 14:40:41 |

| SET_JOB_STATUS | 2010-10-13 14:45:10 |

| SET_JOB_STATUS | 2010-10-13 14:46:11 |

| SET_JOB_STATUS | 2010-10-13 14:46:13 |

| SET_JOB_STATUS | 2010-10-13 14:46:13 |

| SET_JOB_STATUS | 2010-10-13 14:46:14 |

| SET_JOB_STATUS | 2010-10-13 14:46:17 |

| SET_JOB_STATUS | 2010-10-13 14:48:11 |

| SET_JOB_STATUS | 2010-10-13 14:48:11 |

| SET_JOB_STATUS | 2010-10-13 14:49:15 |

| SET_JOB_STATUS | 2010-10-13 14:49:16 |

| SET_JOB_STATUS | 2010-10-13 14:50:19 |

| JOB_START | 2010-10-13 14:53:37 |

+----------------+---------------------+

20 rows in set (0.00 sec)

mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c where

-> jm.commandId =c.id group by c.name;

+---------------------------+---------------+

| name | count(c.name) |

+---------------------------+---------------+

| COPY_NEW_PROXY_TO_SANDBOX | 3 |

| JOB_PURGE | 61 |

| JOB_START | 190 |

| PROXY_RENEW | 539 |

| SET_JOB_STATUS | 195 |

+---------------------------+---------------+

5 rows in set (0.00 sec)

Cheers,

Maarten

On Wednesday 13 October 2010 16:14:49 Massimo Sgaravatto - INFN Padova wrote:

> What does:

>

> grep -i 598295305 /opt/glite/var/log/glite-ce-cream.log*

>

> report ?

>

> Can you please issue this command on the CREAM CE as user tomcat:

>

> /opt/glite/bin/glite_cream_load_monitor --show

>

> ?

>

> Is there a huge number of "Detected value for Number of pending

> commands" ?

> If so, can you please issue these mysql commands ?

>

> use creamdb;

> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c where

> jm.commandId =c.id order by c.creationTime limit 20;

>

> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c where

> jm.commandId =c.id group by c.name;

>

> Cheers, Massimo

>

> On Wed, 13 Oct 2010, Maarten van Ingen wrote:

> > Hi,

> >

> > One of our creamce keeps jobs in registered state and many will not come

> > out of it.

> > Sometimes they will get through, but this could take some hours.

> >

> > For example this job:

> > maarten$ glite-ce-job-submit -a -r

> > creamce.gina.sara.nl:8443/cream-pbs-infra ./gina

> > 2010-10-13 15:47:25,246 WARN - No configuration file suitable for

> > loading. Using built-in configuration

> > https://creamce.gina.sara.nl:8443/CREAM598295305

> >

> >

> >

> > maarten$ glite-ce-job-status

> > https://creamce.gina.sara.nl:8443/CREAM598295305 2010-10-13 15:49:12,791

> > WARN - No configuration file suitable for loading. Using built-in

> > configuration

> >

> > ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM598295305]

> >

> > Status = [REGISTERED]

> >

> > When I have a look into the logging, all I can find is this:

> > root# grep 598295305 glite-ce-cream.log

> > 13 Oct 2010 15:47:27,553 INFO

> > org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:232) -

> > (http-8443-Processor19) Job inserted. JobId = CREAM598295305

> > 13 Oct 2010 15:47:27,661 INFO

> > org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor

> > (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB

> > CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]

> > [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]

> >

> >

> > The jdl used is the same as I use to submit to a wms (hence the

> > "Requirements" part):

> >

> > Executable = "/bin/env";

> > Arguments = "| /bin/mail -s $(hostname) [log in to unmask]";

> > Stdoutput = "message.txt";

> > StdError = "stderror";

> > Requirements = other.GlueCEUniqueID ==

> > "creamce.gina.sara.nl:8443/cream-pbs- infra";

> > RetryCount=0;

> > ShallowRetryCount=0;

> >

> >

> > Also when I use bogus information for the requested queue it stays in the

> > REGISTERED state.:

> >

> > maarten$ glite-ce-job-submit -a -r creamce.gina.sara.nl:8443/cream-pbs-

> > thisisbogus ./gina

> > 2010-10-13 15:57:55,017 WARN - No configuration file suitable for

> > loading. Using built-in configuration

> > https://creamce.gina.sara.nl:8443/CREAM392820764

> >

> > maarten$ glite-ce-job-status

> > https://creamce.gina.sara.nl:8443/CREAM392820764 2010-10-13 15:58:08,130

> > WARN - No configuration file suitable for loading. Using built-in

> > configuration

> >

> > ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM392820764]

> >

> > Status = [REGISTERED]

> >

> > Anyone got an idea on whats going on?

> > I have the feeling this is something small I am overlooking :-) but it

> > keeps me busy.

> >

> > Cheers,

> > Maarten

> >

> >

> > SARA Computing and Networking Services

> > PO Box 94613

> > 1090 GP Amsterdam, Netherlands

> >

> > Tel: +31 (0)20 592 3000

> > Fax: +31 (0)20 668 3167

>

> \|||/

> -----------0oo----( o o )----oo0-------------------

> (_)

> INFN Sezione di Padova

> Via Marzolo, 8

> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it

> Tel: ++39 0498275908 Skype: massimo.sgaravatto

> Fax: ++39 0498275952

--

ing. M.H. van Ingen, HPC&V Systems Programmer

SARA Computing and Networking Services

PO Box 94613

1090 GP Amsterdam, Netherlands

Tel: +31 (0)20 592 3000

Fax: +31 (0)20 668 3167