On Friday 15 October 2010 15:55:49 Massimo Sgaravatto - INFN Padova wrote:
> The tracejobs are not done by the commands in the queue, but by the
> blah bupdater process, to get the status of jobs in the blah registry.
>
> You said that the tracejob is done also on very old jobs.
> What is the status of these jobs wrt torque and wrt cream ?
The example job had ended oh the 11th:
10/11/2010 16:59:51 S dequeuing from medium, state COMPLETE
and qstat doesnt know about this job anymore.
In cream i get this:
root# grep 466339072 glite-ce-cream.log*
glite-ce-cream.log.17:11 Oct 2010 16:09:30,499 INFO
org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:232) -
(http-8443-Processor16) Job inserted. JobId = CREAM466339072
glite-ce-cream.log.17:11 Oct 2010 16:09:30,857 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (http-8443-Processor16) JOB CREAM466339072
STATUS CHANGED: -- => REGISTERED [localUser=atlprd19]
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:21,795 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:826) - (Worker Thread 38)
REMOTE_REQUEST_ADDRESS=128.142.195.157;
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL;
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL;
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL;
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START;
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START;
cmdExecutorName=BLAHExecutor;
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL;
jobId=CREAM466339072; status=PROCESSING;
glite-ce-cream.log.17:11 Oct 2010 16:10:21,845 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (Worker Thread 38) JOB CREAM466339072 STATUS
CHANGED: REGISTERED => PENDING [localUser=atlprd19]
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:26,965 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (Worker Thread 38) JOB CREAM466339072 STATUS
CHANGED: PENDING => IDLE [localUser=atlprd19] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:10:26,966 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:839) - (Worker Thread 38)
REMOTE_REQUEST_ADDRESS=128.142.195.157;
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL;
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL;
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL;
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START;
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START;
cmdExecutorName=BLAHExecutor;
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL;
jobId=CREAM466339072; status=PROCESSING;
lrmsAbsJobId=pbs/20101011/7477032.ce.gina.sara.nl;
glite-ce-cream.log.17:11 Oct 2010 16:17:45,558 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (Worker Thread 20) JOB CREAM466339072 STATUS
CHANGED: IDLE => RUNNING [localUser=atlprd19] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:17:46,698 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (Worker Thread 32) JOB CREAM466339072 STATUS
CHANGED: RUNNING => REALLY-RUNNING [localUser=atlprd19]
[workerNode=v31-08.gina.sara.nl] [delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 16:27:41,076 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:843) - (Worker Thread 13) commandName=PROXY_RENEW;
cmdExecutorName=BLAHExecutor;
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL;
jobId=CREAM466339072; status=PROCESSING; delegationId=1286786608.333592;
glite-ce-cream.log.17:11 Oct 2010 16:59:50,884 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:2094) - (Worker Thread 6) JOB CREAM466339072 STATUS
CHANGED: REALLY-RUNNING => DONE-OK [failureReason=reason=0]
[localUser=atlprd19] [workerNode=v31-08.gina.sara.nl]
[delegationId=1286786608.333592]
glite-ce-cream.log.17:11 Oct 2010 17:28:14,017 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:814) - (Worker Thread 39)
REMOTE_REQUEST_ADDRESS=128.142.195.157;
USER_DN=/C=DE/O=GermanGrid/OU=LMU/CN=Rodney Walker; USER_FQAN={
/atlas/Role=production/Capability=NULL; /atlas/Role=NULL/Capability=NULL;
/atlas/alarm/Role=NULL/Capability=NULL; /atlas/de/Role=NULL/Capability=NULL;
/atlas/lcg1/Role=NULL/Capability=NULL; /atlas/team/Role=NULL/Capability=NULL;
/atlas/uk/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_PURGE;
CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_PURGE;
cmdExecutorName=BLAHExecutor;
userId=_C_DE_O_GermanGrid_OU_LMU_CN_Rodney_Walker_atlas_Role_production_Capability_NULL;
jobId=CREAM466339072; status=PROCESSING;
glite-ce-cream.log.17:11 Oct 2010 17:28:14,106 INFO
org.glite.ce.cream.jobmanagement.db.table.JobTable (JobTable.java:162) -
(Worker Thread 39) Job deleted. JobId = CREAM466339072
glite-ce-cream.log.17:11 Oct 2010 17:28:14,287 INFO
org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
(AbstractJobExecutor.java:1877) - (Worker Thread 39) purge: purged job
CREAM466339072
(you gotta love the wordwrap in mail clients....)
Anyway the last status I can find there is that the job is done and purged.
(Also done on the 11th).
Cheers,
Maarten
>
> Cheers, Massimo
>
> On Fri, 15 Oct 2010, Maarten van Ingen wrote:
> > Hi again :-)
> >
> > On another CREAMCE we have something with the same symptoms but it seems
> > to have a different cause.
> >
> > The output of the things you asked earlier for the other ce, but done on
> > this creamce:
> >
> >
> > mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c
> > where
> >
> > -> jm.commandId =c.id order by c.creationTime limit 20;
> >
> > +-----------+---------------------+
> >
> > | name | creationTime |
> >
> > +-----------+---------------------+
> >
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:57:48 |
> > | JOB_START | 2010-10-15 10:59:23 |
> > | JOB_START | 2010-10-15 10:59:23 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:22:48 |
> > | JOB_START | 2010-10-15 11:23:53 |
> > | JOB_START | 2010-10-15 11:29:15 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:18 |
> > | JOB_START | 2010-10-15 11:29:29 |
> > | JOB_START | 2010-10-15 11:29:29 |
> > | JOB_START | 2010-10-15 11:33:56 |
> > | JOB_START | 2010-10-15 12:11:50 |
> >
> > +-----------+---------------------+
> > 20 rows in set (0.00 sec)
> >
> > mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c
> > where
> >
> > -> jm.commandId =c.id group by c.name;
> >
> > +---------------------------+---------------+
> >
> > | name | count(c.name) |
> >
> > +---------------------------+---------------+
> >
> > | COPY_NEW_PROXY_TO_SANDBOX | 26 |
> > | JOB_PURGE | 4 |
> > | JOB_START | 282 |
> > | PROXY_RENEW | 18 |
> > | SET_JOB_STATUS | 107 |
> >
> > +---------------------------+---------------+
> > 5 rows in set (0.01 sec)
> >
> > mysql> Bye
> >
> > 15:28 creamce2.gina.sara.nl:/root
> > root# /opt/glite/bin/glite_cream_load_monitor --show
> > Threshold for Load Average(1 min): 40 => Detected value for Load
> > Average(1 min): 1.47
> > Threshold for Load Average(5 min): 40 => Detected value for Load
> > Average(5 min): 1.55
> > Threshold for Load Average(15 min): 20 => Detected value for Load
> > Average(15 min): 1.60
> > Threshold for Memory Usage: 95 => Detected value for Memory Usage: 10.06%
> > Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
> > Threshold for Free FD: 500 => Detected value for Free FD: 1582627
> > Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 425
> > Threshold for FTP Connection: 30 => Detected value for FTP Connection: 1
> > Threshold for Number of active jobs: -1 => Detected value for Number of
> > active jobs: 420
> > Threshold for Number of pending commands: -1 => Detected value for Number
> > of pending commands: 68
> > Threshold for Disk Usage: 95% => Detected value for Partition / : 26%
> >
> >
> >
> > So no, well a few, proxy renewals here.
> > But what I noticed is that the thing is doing a lot of tracejobs.
> > Not a problem I would say, but it takes a hell of a lot of time and it
> > queries old jobs.
> > e.g.:
> >
> > Job: 7477032.ce.gina.sara.nl
> >
> > 10/11/2010 16:10:23 S enqueuing into medium, state 1 hop 1
> > 10/11/2010 16:10:23 S Job Queued at request of
> > [log in to unmask], owner = [log in to unmask],
> > job name = crm02_466339072, queue = medium
> > 10/11/2010 16:17:41 S Job Run at request of [log in to unmask]
> > 10/11/2010 16:17:42 S child reported success for job after 1 seconds
> > (dest=v31-08), rc=0
> > 10/11/2010 16:59:50 S obit received - updating final job usage info
> > 10/11/2010 16:59:50 S job exit status 0 handled
> > 10/11/2010 16:59:50 S Exit_status=0 resources_used.cput=00:38:42
> > resources_used.mem=1637712kb resources_used.vmem=2673356kb
> > resources_used.walltime=00:42:09
> > 10/11/2010 16:59:50 S JOB_SUBSTATE_EXITING
> > 10/11/2010 16:59:51 S dequeuing from medium, state COMPLETE
> >
> > 15:17 creamce2.gina.sara.nl:/root
> > root# ps -ef | grep tracejob
> > tomcat 10616 4985 0 15:17 ? 00:00:00 /usr/bin/tracejob -p
> > /var/spool/torque -m -l -a 7477186.ce.gina.sara.nl
> > root 10618 27637 0 15:17 pts/1 00:00:00 grep tracejob
> >
> >
> > Why would it do a tracejob on old jobs?
> > The other, now fixed ce, has (way) more jobs and no continuously running
> > tracejob.
> >
> > I have increased the numer of threads, but that did not help.
> >
> > Cheers,
> > Maarten
> >
> > On Thursday 14 October 2010 10:25:35 Maarten van Ingen wrote:
> >> I have increased the number to 100 (it was 50) and no more queue :-)
> >>
> >> 10:13 creamce.gina.sara.nl:/usr/share/tomcat5
> >> tomcat$ /opt/glite/bin/glite_cream_load_monitor --show
> >> Threshold for Load Average(1 min): 40 => Detected value for Load
> >> Average(1 min): 1.46
> >> Threshold for Load Average(5 min): 40 => Detected value for Load
> >> Average(5 min): 1.75
> >> Threshold for Load Average(15 min): 20 => Detected value for Load
> >> Average(15 min): 1.08
> >> Threshold for Memory Usage: 95 => Detected value for Memory Usage:
> >> 20.39% Threshold for Swap Usage: 95 => Detected value for Swap Usage:
> >> 0.00% Threshold for Free FD: 500 => Detected value for Free FD: 2387483
> >> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 314
> >> Threshold for FTP Connection: 30 => Detected value for FTP Connection:
> >> 2 Threshold for Number of active jobs: -1 => Detected value for Number
> >> of active jobs: 5649
> >> Threshold for Number of pending commands: -1 => Detected value for
> >> Number of pending commands: 0
> >> Threshold for Disk Usage: 95% => Detected value for Partition / : 36%
> >>
> >>
> >> The load and memory usage did increase. I have not lowered the
> >> purge_interval, since the node it is currently running on has more than
> >> enough memory, 24GB. A you can see here:
> >> http://ganglia.sara.nl/?c=GINA%20Cluster&h=creamce.gina.sara.nl&m=load_o
> >> ne& r=hour&s=descending&hc=4&mc=2 there is a little bump in the memory
> >> (and cpu) usage but still low in respect to the total amount available.
> >>
> >>
> >> Thanks for the help!
> >>
> >> Cheers,
> >> Maarten
> >>
> >> On Thursday 14 October 2010 09:54:41 Massimo Sgaravatto - INFN Padova
wrote:
> >>> On Thu, 14 Oct 2010, Maarten van Ingen wrote:
> >>>> Can we increase the number of threads?
> >>>
> >>> Yes, the relevant parameter is cream_concurrency_level in
> >>> /opt/glite/etc/glite-ce-cream/cream-config.xml
> >>>
> >>> Then restart tomcat
> >>>
> >>> If you are using the new blah blparser and if you are going to increase
> >>> the number of threads, it is also suggested to decrease
> >>> (i.e. set to 2500000) the value for purge_interval in
> >>> /opt/glite/etc/blah.config to reduce memory usage.
> >>> Then kill the bupdater process (a newer one will be started
> >>> automatically)
> >>>
> >>> Cheers, Massimo
> >>>>
> >>>> The node has little to no load so there
> >>>> should be room for more threads. If possible, would this help to
> >>>> reduce the number of commands in the queue?
> >>>>
> >>>> If not, is there any way to work around this problem until the update
> >>>> is available?
> >>>>
> >>>> Cheers,
> >>>> Maarten
> >>>>
> >>>> On Wednesday 13 October 2010 16:37:47 Massimo Sgaravatto - INFN Padova
> >>
> >> wrote:
> >>>>> The submit command is actually divided in 2 operations: the register
> >>>>> and the start.
> >>>>> The former is a synchronos operation, the latter is an asyncronous
> >>>>> operation.
> >>>>> Asynchronous commands are stored in a queue, from which they are
> >>>>> taken and processed by a pool of threades
> >>>>>
> >>>>> In your CE there is a queue (not too long) of commands to be
> >>>>> processed. This explains why the start operation is not done
> >>>>> immediately at submission time.
> >>>>>
> >>>>>
> >>>>> In your CE most of the commands in the queue are proxy renewal
> >>>>> operations.
> >>>>>
> >>>>> With CREAM CE 1.6.3 (that we plan to certify by the end of this
> >>>>> month) there will the fix for this bug:
> >>>>>
> >>>>> http://savannah.cern.ch/bugs/?73765
> >>>>>
> >>>>> and this should help a lot in keeping the queue of commands short.
> >>>>>
> >>>>> Cheers, Massimo
> >>>>>
> >>>>> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>>>>> The grep:
> >>>>>>
> >>>>>> (of course this example does go one step further...)
> >>>>>>
> >>>>>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,234 INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:826) - (Worker Thread 11)
> >>>>>> REMOTE_REQUEST_ADDRESS=145.100.5.194;
> >>>>>> USER_DN=/O=dutchgrid/O=users/O=sara/CN=Maarten Hendrik van Ingen;
> >>>>>> USER_FQAN={ /pvier/Role=NULL/Capability=NULL;
> >>>>>> /pvier/infra/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START;
> >>>>>> CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING;
> >>>>>> commandName=JOB_START; cmdExecutorName=BLAHExecutor;
> >>>>>> userId=_O_dutchgrid_O_users_O_sara_CN_Maarten_Hendrik_van_Ingen_pvie
> >>>>>> r _R o le_NULL_Capability_NULL; jobId=CREAM598295305;
> >>>>>> status=PROCESSING;
> >>>>>>
> >>>>>> /opt/glite/var/log/glite-ce-cream.log:13 Oct 2010 16:13:51,287 INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:2094) - (Worker Thread 11) JOB
> >>>>>> CREAM598295305 STATUS CHANGED: REGISTERED => PENDING
> >>>>>> [localUser=pvi032]
> >>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>>
> >>>>>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,553
> >>>>>> INFO org.glite.ce.cream.jobmanagement.db.table.JobTable
> >>>>>> (JobTable.java:232) - (http-8443-Processor19) Job inserted. JobId =
> >>>>>> CREAM598295305
> >>>>>>
> >>>>>> /opt/glite/var/log/glite-ce-cream.log.1:13 Oct 2010 15:47:27,661
> >>>>>> INFO
> >>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor
> >>>>>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>>>>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>>
> >>>>>> /opt/glite/bin/glite_cream_load_monitor --show:
> >>>>>>
> >>>>>> Threshold for Load Average(1 min): 40 => Detected value for Load
> >>>>>> Average(1 min): 1.06
> >>>>>>
> >>>>>> Threshold for Load Average(5 min): 40 => Detected value for Load
> >>>>>> Average(5 min): 0.97
> >>>>>>
> >>>>>> Threshold for Load Average(15 min): 20 => Detected value for Load
> >>>>>> Average(15 min): 0.69
> >>>>>>
> >>>>>> Threshold for Memory Usage: 95 => Detected value for Memory Usage:
> >>>>>> 17.57%
> >>>>>>
> >>>>>> Threshold for Swap Usage: 95 => Detected value for Swap Usage: 0.00%
> >>>>>>
> >>>>>> Threshold for Free FD: 500 => Detected value for Free FD: 2386973
> >>>>>>
> >>>>>> Threshold for tomcat FD: 800 => Detected value for Tomcat FD: 269
> >>>>>>
> >>>>>> Threshold for FTP Connection: 30 => Detected value for FTP
> >>>>>> Connection: 1
> >>>>>>
> >>>>>> Threshold for Number of active jobs: -1 => Detected value for Number
> >>>>>> of active jobs: 5866
> >>>>>>
> >>>>>> Threshold for Number of pending commands: -1 => Detected value for
> >>>>>> Number of pending commands: 431
> >>>>>>
> >>>>>> Threshold for Disk Usage: 95% => Detected value for Partition / :
> >>>>>> 35%
> >>>>>>
> >>>>>> SQL:
> >>>>>>
> >>>>>> mysql> select c.name, c.creationTime from JOB_MANAGEMENT jm, command
> >>>>>> c where
> >>>>>>
> >>>>>> -> jm.commandId =c.id order by c.creationTime limit 20;
> >>>>>>
> >>>>>> +----------------+---------------------+
> >>>>>>
> >>>>>> | name | creationTime |
> >>>>>>
> >>>>>> +----------------+---------------------+
> >>>>>>
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:02 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:09 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:33:59 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:34:00 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:35:54 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:37:11 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:39:57 |
> >>>>>> |
> >>>>>> | PROXY_RENEW | 2010-10-13 14:40:41 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:45:10 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:11 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:13 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:14 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:46:17 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:48:11 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:49:15 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:49:16 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 2010-10-13 14:50:19 |
> >>>>>> |
> >>>>>> | JOB_START | 2010-10-13 14:53:37 |
> >>>>>>
> >>>>>> +----------------+---------------------+
> >>>>>>
> >>>>>> 20 rows in set (0.00 sec)
> >>>>>>
> >>>>>> mysql> select c.name, count(c.name) from JOB_MANAGEMENT jm, command
> >>>>>> c where
> >>>>>>
> >>>>>> -> jm.commandId =c.id group by c.name;
> >>>>>>
> >>>>>> +---------------------------+---------------+
> >>>>>>
> >>>>>> | name | count(c.name) |
> >>>>>>
> >>>>>> +---------------------------+---------------+
> >>>>>>
> >>>>>> | COPY_NEW_PROXY_TO_SANDBOX | 3 |
> >>>>>> |
> >>>>>> | JOB_PURGE | 61 |
> >>>>>> |
> >>>>>> | JOB_START | 190 |
> >>>>>> |
> >>>>>> | PROXY_RENEW | 539 |
> >>>>>> |
> >>>>>> | SET_JOB_STATUS | 195 |
> >>>>>>
> >>>>>> +---------------------------+---------------+
> >>>>>>
> >>>>>> 5 rows in set (0.00 sec)
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Maarten
> >>>>>>
> >>>>>> On Wednesday 13 October 2010 16:14:49 Massimo Sgaravatto - INFN
> >>>>>> Padova
> >>>>
> >>>> wrote:
> >>>>>>> What does:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> grep -i 598295305 /opt/glite/var/log/glite-ce-cream.log*
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> report ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Can you please issue this command on the CREAM CE as user tomcat:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> /opt/glite/bin/glite_cream_load_monitor --show
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Is there a huge number of "Detected value for Number of pending
> >>>>>>>
> >>>>>>> commands" ?
> >>>>>>>
> >>>>>>> If so, can you please issue these mysql commands ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> use creamdb;
> >>>>>>>
> >>>>>>> select c.name, c.creationTime from JOB_MANAGEMENT jm, command c
> >>>>>>> where
> >>>>>>>
> >>>>>>> jm.commandId =c.id order by c.creationTime limit 20;
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> select c.name, count(c.name) from JOB_MANAGEMENT jm, command c
> >>>>>>> where
> >>>>>>>
> >>>>>>> jm.commandId =c.id group by c.name;
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers, Massimo
> >>>>>>>
> >>>>>>> On Wed, 13 Oct 2010, Maarten van Ingen wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> One of our creamce keeps jobs in registered state and many will
> >>>>>>>> not come
> >>>>>>>>
> >>>>>>>> out of it.
> >>>>>>>>
> >>>>>>>> Sometimes they will get through, but this could take some hours.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> For example this job:
> >>>>>>>>
> >>>>>>>> maarten$ glite-ce-job-submit -a -r
> >>>>>>>>
> >>>>>>>> creamce.gina.sara.nl:8443/cream-pbs-infra ./gina
> >>>>>>>>
> >>>>>>>> 2010-10-13 15:47:25,246 WARN - No configuration file suitable for
> >>>>>>>>
> >>>>>>>> loading. Using built-in configuration
> >>>>>>>>
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM598295305
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> maarten$ glite-ce-job-status
> >>>>>>>>
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM598295305 2010-10-13
> >>>>>>>> 15:49:12,791
> >>>>>>>>
> >>>>>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>>>>
> >>>>>>>> configuration
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM598295305]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Status = [REGISTERED]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> When I have a look into the logging, all I can find is this:
> >>>>>>>>
> >>>>>>>> root# grep 598295305 glite-ce-cream.log
> >>>>>>>>
> >>>>>>>> 13 Oct 2010 15:47:27,553 INFO
> >>>>>>>>
> >>>>>>>> org.glite.ce.cream.jobmanagement.db.table.JobTable
> >>>>>>>> (JobTable.java:232) -
> >>>>>>>>
> >>>>>>>> (http-8443-Processor19) Job inserted. JobId = CREAM598295305
> >>>>>>>>
> >>>>>>>> 13 Oct 2010 15:47:27,661 INFO
> >>>>>>>>
> >>>>>>>> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecuto
> >>>>>>>> r
> >>>>>>>>
> >>>>>>>> (AbstractJobExecutor.java:2094) - (http-8443-Processor19) JOB
> >>>>>>>>
> >>>>>>>> CREAM598295305 STATUS CHANGED: -- => REGISTERED [localUser=pvi032]
> >>>>>>>>
> >>>>>>>> [delegationId=ce2ca4874b98dd5f6b55c9e6b3b4a4a1f852d36c]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The jdl used is the same as I use to submit to a wms (hence the
> >>>>>>>>
> >>>>>>>> "Requirements" part):
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Executable = "/bin/env";
> >>>>>>>>
> >>>>>>>> Arguments = "| /bin/mail -s $(hostname) [log in to unmask]";
> >>>>>>>>
> >>>>>>>> Stdoutput = "message.txt";
> >>>>>>>>
> >>>>>>>> StdError = "stderror";
> >>>>>>>>
> >>>>>>>> Requirements = other.GlueCEUniqueID ==
> >>>>>>>>
> >>>>>>>> "creamce.gina.sara.nl:8443/cream-pbs- infra";
> >>>>>>>>
> >>>>>>>> RetryCount=0;
> >>>>>>>>
> >>>>>>>> ShallowRetryCount=0;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Also when I use bogus information for the requested queue it stays
> >>>>>>>> in the
> >>>>>>>>
> >>>>>>>> REGISTERED state.:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> maarten$ glite-ce-job-submit -a -r
> >>>>>>>> creamce.gina.sara.nl:8443/cream-pbs-
> >>>>>>>>
> >>>>>>>> thisisbogus ./gina
> >>>>>>>>
> >>>>>>>> 2010-10-13 15:57:55,017 WARN - No configuration file suitable for
> >>>>>>>>
> >>>>>>>> loading. Using built-in configuration
> >>>>>>>>
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM392820764
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> maarten$ glite-ce-job-status
> >>>>>>>>
> >>>>>>>> https://creamce.gina.sara.nl:8443/CREAM392820764 2010-10-13
> >>>>>>>> 15:58:08,130
> >>>>>>>>
> >>>>>>>> WARN - No configuration file suitable for loading. Using built-in
> >>>>>>>>
> >>>>>>>> configuration
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ****** JobID=[https://creamce.gina.sara.nl:8443/CREAM392820764]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Status = [REGISTERED]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Anyone got an idea on whats going on?
> >>>>>>>>
> >>>>>>>> I have the feeling this is something small I am overlooking :-)
> >>>>>>>> but it
> >>>>>>>>
> >>>>>>>> keeps me busy.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Maarten
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> SARA Computing and Networking Services
> >>>>>>>>
> >>>>>>>> PO Box 94613
> >>>>>>>>
> >>>>>>>> 1090 GP Amsterdam, Netherlands
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Tel: +31 (0)20 592 3000
> >>>>>>>>
> >>>>>>>> Fax: +31 (0)20 668 3167
> >>>>>>>
> >>>>>>> \|||/
> >>>>>>>
> >>>>>>> -----------0oo----( o o )----oo0-------------------
> >>>>>>>
> >>>>>>> (_)
> >>>>>>>
> >>>>>>> INFN Sezione di Padova
> >>>>>>>
> >>>>>>> Via Marzolo, 8
> >>>>>>>
> >>>>>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >>>>>>>
> >>>>>>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >>>>>>>
> >>>>>>> Fax: ++39 0498275952
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>>>>>
> >>>>>> SARA Computing and Networking Services
> >>>>>>
> >>>>>> PO Box 94613
> >>>>>>
> >>>>>> 1090 GP Amsterdam, Netherlands
> >>>>>>
> >>>>>> Tel: +31 (0)20 592 3000
> >>>>>>
> >>>>>> Fax: +31 (0)20 668 3167
> >>>>>>
> >>>>> \|||/
> >>>>>
> >>>>> -----------0oo----( o o )----oo0-------------------
> >>>>>
> >>>>> (_)
> >>>>>
> >>>>> INFN Sezione di Padova
> >>>>> Via Marzolo, 8
> >>>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >>>>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >>>>> Fax: ++39 0498275952
> >>>>
> >>>> --
> >>>> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>>>
> >>>> SARA Computing and Networking Services
> >>>> PO Box 94613
> >>>> 1090 GP Amsterdam, Netherlands
> >>>>
> >>>> Tel: +31 (0)20 592 3000
> >>>> Fax: +31 (0)20 668 3167
> >>>>
> >>> \|||/
> >>>
> >>> -----------0oo----( o o )----oo0-------------------
> >>>
> >>> (_)
> >>>
> >>> INFN Sezione di Padova
> >>> Via Marzolo, 8
> >>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> >>> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> >>> Fax: ++39 0498275952
> >>
> >> --
> >> ing. M.H. van Ingen, HPC&V Systems Programmer
> >>
> >> SARA Computing and Networking Services
> >> PO Box 94613
> >> 1090 GP Amsterdam, Netherlands
> >>
> >> Tel: +31 (0)20 592 3000
> >> Fax: +31 (0)20 668 3167
> >
> > --
> > ing. M.H. van Ingen, HPC&V Systems Programmer
> >
> > SARA Computing and Networking Services
> > PO Box 94613
> > 1090 GP Amsterdam, Netherlands
> >
> > Tel: +31 (0)20 592 3000
> > Fax: +31 (0)20 668 3167
>
> \|||/
> -----------0oo----( o o )----oo0-------------------
> (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> Fax: ++39 0498275952
--
ing. M.H. van Ingen, HPC&V Systems Programmer
SARA Computing and Networking Services
PO Box 94613
1090 GP Amsterdam, Netherlands
Tel: +31 (0)20 592 3000
Fax: +31 (0)20 668 3167
|