JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for LCG-ROLLOUT Archives


LCG-ROLLOUT Archives

LCG-ROLLOUT Archives


LCG-ROLLOUT@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

LCG-ROLLOUT Home

LCG-ROLLOUT Home

LCG-ROLLOUT  October 2010

LCG-ROLLOUT October 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Creamce keeps jobs in "REGISTERED" state

From:

Maarten van Ingen <[log in to unmask]>

Reply-To:

LHC Computer Grid - Rollout <[log in to unmask]>

Date:

Fri, 15 Oct 2010 16:07:46 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (812 lines)

On Friday 15 October 2010 15:55:49 Massimo Sgaravatto - INFN Padova wrote:
> The tracejobs are not done by the commands in the queue, but by the
> blah bupdater process, to get the status of jobs in the blah registry.
> 
> You said that the tracejob is done also on very old jobs.
> What is the status of these jobs wrt torque and wrt cream ?
The example job had ended oh the 11th:

10/11/2010 16:59:51  S    dequeuing from medium, state COMPLETE
and qstat doesnt know about this job anymore.

In cream i get this:
root# grep 466339072 glite-ce-cream.log*
glite-ce-cream.log.17:11 Oct 2010 16:09:30,499 INFO  
org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:232) - 
(http-8443-Processor16) Job inserted. JobId = CREAM466339072
glite-ce-cream.log.17:11 Oct 2010 16:09:30,857 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (http-8443-Processor16) JOB CREAM466339072 
STATUS CHANGED: -- => REGISTERED [localUser=atlprd19] 
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:21,795 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:826) - (Worker Thread 38) 
REMOTE_REQUEST_ADDRESS=128.142.195.157; 
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={ 
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL; 
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL; 
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL; 
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; 
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; 
cmdExecutorName=BLAHExecutor; 
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL; 
jobId=CREAM466339072; status=PROCESSING;
glite-ce-cream.log.17:11 Oct 2010 16:10:21,845 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (Worker Thread 38) JOB CREAM466339072 STATUS 
CHANGED: REGISTERED => PENDING [localUser=atlprd19] 
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:26,965 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (Worker Thread 38) JOB CREAM466339072 STATUS 
CHANGED: PENDING => IDLE [localUser=atlprd19] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:26,966 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:839) - (Worker Thread 38) 
REMOTE_REQUEST_ADDRESS=128.142.195.157; 
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={ 
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL; 
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL; 
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL; 
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; 
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; 
cmdExecutorName=BLAHExecutor; 
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL; 
jobId=CREAM466339072; status=PROCESSING; 
lrmsAbsJobId=pbs/20101011/7477032.ce.gina.sara.nl;
glite-ce-cream.log.17:11 Oct 2010 16:17:45,558 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (Worker Thread 20) JOB CREAM466339072 STATUS 
CHANGED: IDLE => RUNNING [localUser=atlprd19] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:17:46,698 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (Worker Thread 32) JOB CREAM466339072 STATUS 
CHANGED: RUNNING => REALLY-RUNNING [localUser=atlprd19] 
[workerNode=v31-08.gina.sara.nl] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:27:41,076 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:843) - (Worker Thread 13) commandName=PROXY_RENEW; 
cmdExecutorName=BLAHExecutor; 
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL; 
jobId=CREAM466339072; status=PROCESSING; delegationId=1286786608.333592;
glite-ce-cream.log.17:11 Oct 2010 16:59:50,884 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:2094) - (Worker Thread 6) JOB CREAM466339072 STATUS 
CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0] 
[localUser=atlprd19] [workerNode=v31-08.gina.sara.nl] 
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 17:28:14,017 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:814) - (Worker Thread 39) 
REMOTE_REQUEST_ADDRESS=128.142.195.157; 
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={ 
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL; 
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL; 
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL; 
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_PURGE; 
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_PURGE; 
cmdExecutorName=BLAHExecutor; 
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL; 
jobId=CREAM466339072; status=PROCESSING;
glite-ce-cream.log.17:11 Oct 2010 17:28:14,106 INFO  
org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:162) - 
(Worker Thread 39) Job deleted. JobId = CREAM466339072
glite-ce-cream.log.17:11 Oct 2010 17:28:14,287 INFO  
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor 
(AbstractJobExecutor.java:1877) - (Worker Thread 39) purge: purged job 
CREAM466339072


(you gotta love the wordwrap in mail clients....)

Anyway the last status I can find there is that the job is done and purged. 
(Also done on the 11th).

Cheers,
Maarten
> 
>                         Cheers, Massimo
> 
> On Fri, 15 Oct 2010, Maarten van Ingen wrote:
> > Hi again :-)
> > 
> > On another CREAMCE we have something with the same symptoms but it seems
> > to have a different cause.
> > 
> > The output of the things you asked earlier for the other ce, but done on
> > this creamce:
> > 
> > 
> > mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c
> > where
> > 
> >    -> jm.commandId =c.id order by c.creationTime limit 20;
> > 
> > +-----------+---------------------+
> > 
> > | name      | creationTime        |
> > 
> > +-----------+---------------------+
> > 
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:59:23 |
> > | JOB_START | 2010-10-15 10:59:23 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:23:53 |
> > | JOB_START | 2010-10-15 11:29:15 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:29 |
> > | JOB_START | 2010-10-15 11:29:29 |
> > | JOB_START | 2010-10-15 11:33:56 |
> > | JOB_START | 2010-10-15 12:11:50 |
> > 
> > +-----------+---------------------+
> > 20 rows in set (0.00 sec)
> > 
> > mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c
> > where
> > 
> >    -> jm.commandId =c.id group by c.name;
> > 
> > +---------------------------+---------------+
> > 
> > | name                      | count(c.name) |
> > 
> > +---------------------------+---------------+
> > 
> > | COPY_NEW_PROXY_TO_SANDBOX |            26 |
> > | JOB_PURGE                 |             4 |
> > | JOB_START                 |           282 |
> > | PROXY_RENEW               |            18 |
> > | SET_JOB_STATUS            |           107 |
> > 
> > +---------------------------+---------------+
> > 5 rows in set (0.01 sec)
> > 
> > mysql> Bye
> > 
> > 15:28 creamce2.gina.sara.nl:/root
> > root# /opt/glite/bin/glite_cream_load_monitor --show
> > Threshold for Load Average(1 min): 40 => Detected value for Load
> > Average(1 min):  1.47
> > Threshold for Load Average(5 min): 40 => Detected value for Load
> > Average(5 min):  1.55
> > Threshold for Load Average(15 min): 20 => Detected value for Load
> > Average(15 min):  1.60
> > Threshold for Memory Usage: 95 => Detected value for Memory Usage: 10.06%
> > Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
> > Threshold for Free FD: 500 => Detected value for Free FD: 1582627
> > Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 425
> > Threshold for FTP Connection: 30 => Detected value for FTP Connection: 1
> > Threshold for Number of active jobs: -1 => Detected value for Number of
> > active jobs: 420
> > Threshold for Number of pending commands: -1 => Detected value for Number
> > of pending commands: 68
> > Threshold for Disk Usage: 95% => Detected value for Partition / : 26%
> > 
> > 
> > 
> > So no, well a few, proxy renewals here.
> > But what I noticed is that the thing is doing a lot of tracejobs.
> > Not a problem I would say, but it takes a hell of a lot of time and it
> > queries old jobs.
> > e.g.:
> > 
> > Job: 7477032.ce.gina.sara.nl
> > 
> > 10/11/2010 16:10:23  S    enqueuing into medium, state 1 hop 1
> > 10/11/2010 16:10:23  S    Job Queued at request of
> > [log in to unmask], owner = [log in to unmask],
> > job name = crm02_466339072, queue = medium
> > 10/11/2010 16:17:41  S    Job Run at request of [log in to unmask]
> > 10/11/2010 16:17:42  S    child reported success for job after 1 seconds
> > (dest=v31-08), rc=0
> > 10/11/2010 16:59:50  S    obit received - updating final job usage info
> > 10/11/2010 16:59:50  S    job exit status 0 handled
> > 10/11/2010 16:59:50  S    Exit_status=0 resources_used.cput=00:38:42
> > resources_used.mem=1637712kb resources_used.vmem=2673356kb
> > resources_used.walltime=00:42:09
> > 10/11/2010 16:59:50  S    JOB_SUBSTATE_EXITING
> > 10/11/2010 16:59:51  S    dequeuing from medium, state COMPLETE
> > 
> > 15:17 creamce2.gina.sara.nl:/root
> > root# ps -ef | grep tracejob
> > tomcat   10616  4985  0 15:17 ?        00:00:00 /usr/bin/tracejob -p
> > /var/spool/torque -m -l -a 7477186.ce.gina.sara.nl
> > root     10618 27637  0 15:17 pts/1    00:00:00 grep tracejob
> > 
> > 
> > Why would it do a tracejob on old jobs?
> > The other, now fixed ce, has (way) more jobs and no continuously  running
> > tracejob.
> > 
> > I have increased the numer of threads, but that did not help.
> > 
> > Cheers,
> > Maarten
> > 
> > On Thursday 14 October 2010 10:25:35 Maarten van Ingen wrote:
> >> I have increased the number to 100 (it was 50) and no more queue :-)
> >> 
> >> 10:13 creamce.gina.sara.nl:/usr/share/tomcat5
> >> tomcat$ /opt/glite/bin/glite_cream_load_monitor --show
> >> Threshold for Load Average(1 min): 40 => Detected value for Load
> >> Average(1 min):  1.46
> >> Threshold for Load Average(5 min): 40 => Detected value for Load
> >> Average(5 min):  1.75
> >> Threshold for Load Average(15 min): 20 => Detected value for Load
> >> Average(15 min):  1.08
> >> Threshold for Memory Usage: 95 => Detected value for Memory Usage:
> >> 20.39% Threshold for Swap Usage: 95 => Detected value for Swap Usage:
> >> 0.00% Threshold for Free FD: 500 => Detected value for Free FD: 2387483
> >> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 314
> >> Threshold for FTP Connection: 30 => Detected value for FTP Connection:
> >> 2 Threshold for Number of active jobs: -1 => Detected value for Number
> >> of active jobs: 5649
> >> Threshold for Number of pending commands: -1 => Detected value for
> >> Number of pending commands: 0
> >> Threshold for Disk Usage: 95% => Detected value for Partition / : 36%
> >> 
> >> 
> >> The load and memory usage did increase. I have not lowered the
> >> purge_interval, since the node it is currently running on has more than
> >> enough memory, 24GB. A you can see here:
> >> http://ganglia.sara.nl/?c=GINA%20Cluster&h=creamce.gina.sara.nl&m=load_o
> >> ne& r=hour&s=descending&hc=4&mc=2 there is a little bump in the memory
> >> (and cpu) usage but still low in respect to the total amount available.
> >> 
> >> 
> >> Thanks for the help!
> >> 
> >> Cheers,
> >> Maarten
> >> 
> >> On Thursday 14 October 2010 09:54:41 Massimo Sgaravatto - INFN Padova 
wrote:
> >>> On Thu, 14 Oct 2010, Maarten van Ingen wrote:
> >>>> Can we increase the number of threads?
> >>> 
> >>> Yes, the relevant parameter is cream_concurrency_level in
> >>> /opt/glite/etc/glite-ce-cream/cream-config.xml
> >>> 
> >>> Then restart tomcat
> >>> 
> >>> If you are using the new blah blparser and if you are going to increase
> >>> the number of threads, it is also suggested to decrease
> >>> (i.e. set to 2500000) the value for purge_interval in
> >>> /opt/glite/etc/blah.config to reduce memory usage.
> >>> Then kill the bupdater process (a newer one will be started
> >>> automatically)
> >>> 
> >>>                                 Cheers, Massimo
> >>>> 
> >>>> The node has little to no load so there
> >>>> should be room for more threads. If possible, would this help to
> >>>> reduce the number of commands in the queue?
> >>>> 
> >>>> If not, is there any way to work around this problem until the update
> >>>> is available?
> >>>> 
> >>>> Cheers,
> >>>> Maarten
> >>>> 
> >>>> On Wednesday 13 October 2010 16:37:47 Massimo Sgaravatto - INFN Padova
> >> 
> >> wrote:
> >>>>> The submit command is actually divided in 2 operations: the register
> >>>>> and the start.
> >>>>> The former is a synchronos operation, the latter is an asyncronous
> >>>>> operation.
> >>>>> Asynchronous commands are stored in a queue, from which they are
> >>>>> taken and processed by a pool of threades
> >>>>> 
> >>>>> In your CE there is a queue (not too long) of commands to be
> >>>>> processed. This explains why the start operation is not done
> >>>>> immediately at submission time.
> >>>>> 
> >>>>> 
> >>>>> In your CE most of the commands in the queue are proxy renewal
> >>>>> operations.
> >>>>> 
> >>>>> With CREAM CE 1.6.3 (that we plan to certify by the end of this
> >>>>> month) there will the fix for this bug:
> >>>>> 
> >>>>> http://savannah.cern.ch/bugs/?73765
> >>>>> 
> >>>>> and this should help a lot in keeping the queue of commands short.
> >>>>> 
> >>>>>                                 Cheers, Massimo
> >>>>> 
> >>>>> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>>>>> The grep:
> >>>>>> 
> >>>>>> (of course this example does go one step further...)
> >>>>>> 
> >>>>>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,234 INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:826) - (Worker Thread 11)
> >>>>>> REMOTE_REQUEST_ADDRESS=145.100.5.194;
> >>>>>> USER_DN=/O=dutchgrid/O=users/O=sara/CN=Maarten Hendrik van Ingen;
> >>>>>> USER_FQAN={ /pvier/Role=NULL/Capability=NULL;
> >>>>>> /pvier/infra/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START;
> >>>>>> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
> >>>>>> commandName=JOB_START; cmdExecutorName=BLAHExecutor;
> >>>>>> userId=_O_dutchgrid_O_users_O_sara_CN_Maarten_Hendrik_van_Ingen_pvie
> >>>>>> r _R o le_NULL_Capability_NULL; jobId=CREAM598295305;
> >>>>>> status=PROCESSING;
> >>>>>> 
> >>>>>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,287 INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:2094) - (Worker Thread 11) JOB
> >>>>>> CREAM598295305 STATUS CHANGED: REGISTERED => PENDING
> >>>>>> [localUser=pvi032]
> >>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>> 
> >>>>>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,553
> >>>>>> INFO org.glite.ce.cream.jobmanagement.db.table.JobTable
> >>>>>> (JobTable.java:232) - (http-8443-Processor19) Job inserted. JobId =
> >>>>>> CREAM598295305
> >>>>>> 
> >>>>>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,661
> >>>>>> INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>>>>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>> 
> >>>>>> /opt/glite/bin/glite_cream_load_monitor --show:
> >>>>>> 
> >>>>>> Threshold for Load Average(1 min): 40 => Detected value for Load
> >>>>>> Average(1 min): 1.06
> >>>>>> 
> >>>>>> Threshold for Load Average(5 min): 40 => Detected value for Load
> >>>>>> Average(5 min): 0.97
> >>>>>> 
> >>>>>> Threshold for Load Average(15 min): 20 => Detected value for Load
> >>>>>> Average(15 min): 0.69
> >>>>>> 
> >>>>>> Threshold for Memory Usage: 95 => Detected value for Memory Usage:
> >>>>>> 17.57%
> >>>>>> 
> >>>>>> Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
> >>>>>> 
> >>>>>> Threshold for Free FD: 500 => Detected value for Free FD: 2386973
> >>>>>> 
> >>>>>> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 269
> >>>>>> 
> >>>>>> Threshold for FTP Connection: 30 => Detected value for FTP
> >>>>>> Connection: 1
> >>>>>> 
> >>>>>> Threshold for Number of active jobs: -1 => Detected value for Number
> >>>>>> of active jobs: 5866
> >>>>>> 
> >>>>>> Threshold for Number of pending commands: -1 => Detected value for
> >>>>>> Number of pending commands: 431
> >>>>>> 
> >>>>>> Threshold for Disk Usage: 95% => Detected value for Partition / :
> >>>>>> 35%
> >>>>>> 
> >>>>>> SQL:
> >>>>>> 
> >>>>>> mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command
> >>>>>> c where
> >>>>>> 
> >>>>>> -> jm.commandId =c.id order by c.creationTime limit 20;
> >>>>>> 
> >>>>>> +----------------+---------------------+
> >>>>>> 
> >>>>>> | name | creationTime |
> >>>>>> 
> >>>>>> +----------------+---------------------+
> >>>>>> 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:02 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:09 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:59 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:34:00 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:35:54 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:37:11 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:39:57 |
> >>>>>> | 
> >>>>>> | PROXY_RENEW | 2010-10-13 14:40:41 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:45:10 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:11 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:14 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:17 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:49:15 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:49:16 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:50:19 |
> >>>>>> | 
> >>>>>> | JOB_START | 2010-10-13 14:53:37 |
> >>>>>> 
> >>>>>> +----------------+---------------------+
> >>>>>> 
> >>>>>> 20 rows in set (0.00 sec)
> >>>>>> 
> >>>>>> mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command
> >>>>>> c where
> >>>>>> 
> >>>>>> -> jm.commandId =c.id group by c.name;
> >>>>>> 
> >>>>>> +---------------------------+---------------+
> >>>>>> 
> >>>>>> | name | count(c.name) |
> >>>>>> 
> >>>>>> +---------------------------+---------------+
> >>>>>> 
> >>>>>> | COPY_NEW_PROXY_TO_SANDBOX | 3 |
> >>>>>> | 
> >>>>>> | JOB_PURGE | 61 |
> >>>>>> | 
> >>>>>> | JOB_START | 190 |
> >>>>>> | 
> >>>>>> | PROXY_RENEW | 539 |
> >>>>>> | 
> >>>>>> | SET_JOB_STATUS | 195 |
> >>>>>> 
> >>>>>> +---------------------------+---------------+
> >>>>>> 
> >>>>>> 5 rows in set (0.00 sec)
> >>>>>> 
> >>>>>> Cheers,
> >>>>>> 
> >>>>>> Maarten
> >>>>>> 
> >>>>>> On Wednesday 13 October 2010 16:14:49 Massimo Sgaravatto - INFN
> >>>>>> Padova
> >>>> 
> >>>> wrote:
> >>>>>>> What does:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> grep -i 598295305 /opt/glite/var/log/glite-ce-cream.log*
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> report ?
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Can you please issue this command on the CREAM CE as user tomcat:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> /opt/glite/bin/glite_cream_load_monitor --show
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> ?
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Is there a huge number of "Detected value for Number of pending
> >>>>>>> 
> >>>>>>> commands" ?
> >>>>>>> 
> >>>>>>> If so, can you please issue these mysql commands ?
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> use creamdb;
> >>>>>>> 
> >>>>>>> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c
> >>>>>>> where
> >>>>>>> 
> >>>>>>> jm.commandId =c.id order by c.creationTime limit 20;
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c
> >>>>>>> where
> >>>>>>> 
> >>>>>>> jm.commandId =c.id group by c.name;
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Cheers, Massimo
> >>>>>>> 
> >>>>>>> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>>>>>>> Hi,
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> One of our creamce keeps jobs in registered state and many will
> >>>>>>>> not come
> >>>>>>>> 
> >>>>>>>> out of it.
> >>>>>>>> 
> >>>>>>>> Sometimes they will get through, but this could take some hours.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> For example this job:
> >>>>>>>> 
> >>>>>>>> maarten$ glite-ce-job-submit -a -r
> >>>>>>>> 
> >>>>>>>> creamce.gina.sara.nl:8443/cream-pbs-infra ./gina
> >>>>>>>> 
> >>>>>>>> 2010-10-13 15:47:25,246 WARN - No configuration file suitable for
> >>>>>>>> 
> >>>>>>>> loading. Using built-in configuration
> >>>>>>>> 
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM598295305
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> maarten$ glite-ce-job-status
> >>>>>>>> 
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM598295305 2010-10-13
> >>>>>>>> 15:49:12,791
> >>>>>>>> 
> >>>>>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>>>> 
> >>>>>>>> configuration
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM598295305]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Status = [REGISTERED]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> When I have a look into the logging, all I can find is this:
> >>>>>>>> 
> >>>>>>>> root# grep 598295305 glite-ce-cream.log
> >>>>>>>> 
> >>>>>>>> 13 Oct 2010 15:47:27,553 INFO
> >>>>>>>> 
> >>>>>>>> org.glite.ce.cream.jobmanagement.db.table.JobTable
> >>>>>>>> (JobTable.java:232) -
> >>>>>>>> 
> >>>>>>>> (http-8443-Processor19) Job inserted. JobId = CREAM598295305
> >>>>>>>> 
> >>>>>>>> 13 Oct 2010 15:47:27,661 INFO
> >>>>>>>> 
> >>>>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecuto
> >>>>>>>> r
> >>>>>>>> 
> >>>>>>>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>>>>>>> 
> >>>>>>>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>>>>>>> 
> >>>>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> The jdl used is the same as I use to submit to a wms (hence the
> >>>>>>>> 
> >>>>>>>> "Requirements" part):
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Executable = "/bin/env";
> >>>>>>>> 
> >>>>>>>> Arguments = "| /bin/mail -s $(hostname) [log in to unmask]";
> >>>>>>>> 
> >>>>>>>> Stdoutput = "message.txt";
> >>>>>>>> 
> >>>>>>>> StdError = "stderror";
> >>>>>>>> 
> >>>>>>>> Requirements = other.GlueCEUniqueID ==
> >>>>>>>> 
> >>>>>>>> "creamce.gina.sara.nl:8443/cream-pbs- infra";
> >>>>>>>> 
> >>>>>>>> RetryCount=0;
> >>>>>>>> 
> >>>>>>>> ShallowRetryCount=0;
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Also when I use bogus information for the requested queue it stays
> >>>>>>>> in the
> >>>>>>>> 
> >>>>>>>> REGISTERED state.:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> maarten$ glite-ce-job-submit -a -r
> >>>>>>>> creamce.gina.sara.nl:8443/cream-pbs-
> >>>>>>>> 
> >>>>>>>> thisisbogus ./gina
> >>>>>>>> 
> >>>>>>>> 2010-10-13 15:57:55,017 WARN - No configuration file suitable for
> >>>>>>>> 
> >>>>>>>> loading. Using built-in configuration
> >>>>>>>> 
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM392820764
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> maarten$ glite-ce-job-status
> >>>>>>>> 
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM392820764 2010-10-13
> >>>>>>>> 15:58:08,130
> >>>>>>>> 
> >>>>>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>>>> 
> >>>>>>>> configuration
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM392820764]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Status = [REGISTERED]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Anyone got an idea on whats going on?
> >>>>>>>> 
> >>>>>>>> I have the feeling this is something small I am overlooking :-)
> >>>>>>>> but it
> >>>>>>>> 
> >>>>>>>> keeps me busy.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Cheers,
> >>>>>>>> 
> >>>>>>>> Maarten
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> SARA Computing and Networking Services
> >>>>>>>> 
> >>>>>>>> PO Box 94613
> >>>>>>>> 
> >>>>>>>> 1090 GP Amsterdam, Netherlands
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Tel: +31 (0)20 592 3000
> >>>>>>>> 
> >>>>>>>> Fax: +31 (0)20 668 3167
> >>>>>>> 
> >>>>>>> \|||/
> >>>>>>> 
> >>>>>>> -----------0oo----( o o )----oo0-------------------
> >>>>>>> 
> >>>>>>> (_)
> >>>>>>> 
> >>>>>>> INFN Sezione di Padova
> >>>>>>> 
> >>>>>>> Via Marzolo, 8
> >>>>>>> 
> >>>>>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >>>>>>> 
> >>>>>>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >>>>>>> 
> >>>>>>> Fax: ++39 0498275952
> >>>>>> 
> >>>>>> --
> >>>>>> 
> >>>>>> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>>>>> 
> >>>>>> SARA Computing and Networking Services
> >>>>>> 
> >>>>>> PO Box 94613
> >>>>>> 
> >>>>>> 1090 GP Amsterdam, Netherlands
> >>>>>> 
> >>>>>> Tel: +31 (0)20 592 3000
> >>>>>> 
> >>>>>> Fax: +31 (0)20 668 3167
> >>>>>> 
> >>>>>                     \|||/
> >>>>> 
> >>>>> -----------0oo----( o o )----oo0-------------------
> >>>>> 
> >>>>>                      (_)
> >>>>> 
> >>>>> INFN Sezione di Padova
> >>>>> Via Marzolo, 8
> >>>>> 35131 Padova - Italy    E-mail: massimo.sgaravatto [at] pd.infn.it
> >>>>> Tel: ++39 0498275908    Skype: massimo.sgaravatto
> >>>>> Fax: ++39 0498275952
> >>>> 
> >>>> --
> >>>> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>>> 
> >>>> SARA Computing and Networking Services
> >>>> PO Box 94613
> >>>> 1090 GP Amsterdam, Netherlands
> >>>> 
> >>>> Tel: +31 (0)20 592 3000
> >>>> Fax: +31 (0)20 668 3167
> >>>> 
> >>>                     \|||/
> >>> 
> >>> -----------0oo----( o o )----oo0-------------------
> >>> 
> >>>                      (_)
> >>> 
> >>> INFN Sezione di Padova
> >>> Via Marzolo, 8
> >>> 35131 Padova - Italy    E-mail: massimo.sgaravatto [at] pd.infn.it
> >>> Tel: ++39 0498275908    Skype: massimo.sgaravatto
> >>> Fax: ++39 0498275952
> >> 
> >> --
> >> ing. M.H. van Ingen, HPC&V Systems Programmer
> >> 
> >> SARA Computing and Networking Services
> >> PO Box 94613
> >> 1090 GP Amsterdam, Netherlands
> >> 
> >> Tel: +31 (0)20 592 3000
> >> Fax: +31 (0)20 668 3167
> > 
> > --
> > ing. M.H. van Ingen, HPC&V Systems Programmer
> > 
> > SARA Computing and Networking Services
> > PO Box 94613
> > 1090 GP Amsterdam, Netherlands
> > 
> > Tel: +31 (0)20 592 3000
> > Fax: +31 (0)20 668 3167
> 
>                     \|||/
> -----------0oo----( o o )----oo0-------------------
>                      (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy    E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0498275908    Skype: massimo.sgaravatto
> Fax: ++39 0498275952

-- 
ing. M.H. van Ingen, HPC&V Systems Programmer

SARA Computing and Networking Services
PO Box 94613
1090 GP Amsterdam, Netherlands

Tel: +31 (0)20 592 3000
Fax: +31 (0)20 668 3167

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
November 2023
June 2023
May 2023
April 2023
March 2023
February 2023
September 2022
June 2022
May 2022
April 2022
February 2022
December 2021
November 2021
October 2021
September 2021
July 2021
June 2021
May 2021
February 2021
January 2021
November 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
February 2018
January 2018
November 2017
October 2017
September 2017
July 2017
June 2017
May 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager