Hi Derek,
OK how can that be avoided? I'd rather not have any jobs cancelled.
Cheers,
Gustav
2011/1/27 Derek Ross <[log in to unmask]>:
>
> On 27 Jan 2011, at 10:31, Gustav Wikström wrote:
>
>> Hi Daniela,
>>
>> here's one job that got cancelled:
>>
>> https://wmslb02.grid.hep.ph.ic.ac.uk:9000/uN8hkxSl32QRf1R9NqciUw
>> Current Status: Cancelled
>> Logged Reason(s):
>> -
>> Destination: lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M
>> Submitted: Thu Jan 27 02:25:35 2011 CET
>>
>>
>> Since the job has a destination it might not have failed at the wms
>> (?), but since it's cancelled I have no more info on the job. Aborted
>> means fail at wms and cancelled fail later?
>> During the night a much smaller fraction got cancelled, about 7% of the jobs.
>>
>> Cheers,
>> Gustav
>>
>
> Hi,
>
> That job got cancelled by the batch system at RAL due to hitting the memory limit of the queue it was submitted to.
>
> Derek
>
> 27 Jan 2011 01:25:38,615 org.glite.ce.cream.jobmanagement.db.table.JobTable - Job inserted. JobId = CREAM700591149
> 27 Jan 2011 01:25:38,876 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - REMOTE_REQUEST_ADDRESS=146.179.246.244; USER_DN=DC=ch,DC=cern,OU=Organic Units,OU=Users,CN=lwikstro,CN=627993,CN=Gustav Wikstrom; USER_FQAN={ /t2k.org/Role=production/Capability=NULL; /t2k.org/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_REGISTER; CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_REGISTER; userId=DC_ch_DC_cern_OU_Organic_Units_OU_Users_CN_lwikstro_CN_627993_CN_Gustav_Wikstrom_t2k.org_Role_production_Capability_NULL; status=PROCESSING; localUser=t2k028; jobId=CREAM700591149; gridJobId=https://wmslb02.grid.hep.ph.ic.ac.uk:9000/uN8hkxSl32QRf1R9NqciUw; delegationId=12960521262E516674wms022Egrid2Ehep2Eph2Eic2Eac2Euk;
> 27 Jan 2011 01:25:39,057 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - REMOTE_REQUEST_ADDRESS=146.179.246.244; USER_DN=DC=ch,DC=cern,OU=Organic Units,OU=Users,CN=lwikstro,CN=627993,CN=Gustav Wikstrom; USER_FQAN={ /t2k.org/Role=production/Capability=NULL; /t2k.org/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; cmdExecutorName=BLAHExecutor; userId=DC_ch_DC_cern_OU_Organic_Units_OU_Users_CN_lwikstro_CN_627993_CN_Gustav_Wikstrom_t2k.org_Role_production_Capability_NULL; jobId=CREAM700591149; status=PROCESSING;
> 27 Jan 2011 01:25:39,064 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM700591149 STATUS CHANGED: REGISTERED => PENDING
> 27 Jan 2011 01:25:46,882 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM700591149 STATUS CHANGED: PENDING => IDLE [lrmsJobId=11462851]
> 27 Jan 2011 01:25:49,093 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - REMOTE_REQUEST_ADDRESS=146.179.246.244; USER_DN=DC=ch,DC=cern,OU=Organic Units,OU=Users,CN=lwikstro,CN=627993,CN=Gustav Wikstrom; USER_FQAN={ /t2k.org/Role=production/Capability=NULL; /t2k.org/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; cmdExecutorName=BLAHExecutor; userId=DC_ch_DC_cern_OU_Organic_Units_OU_Users_CN_lwikstro_CN_627993_CN_Gustav_Wikstrom_t2k.org_Role_production_Capability_NULL; jobId=CREAM700591149; status=PROCESSING; lrmsAbsJobId=pbs/20110127/11462851.lcgbatch01.gridpp.rl.ac.uk;
> 27 Jan 2011 01:25:49,094 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - REMOTE_REQUEST_ADDRESS=146.179.246.244; USER_DN=DC=ch,DC=cern,OU=Organic Units,OU=Users,CN=lwikstro,CN=627993,CN=Gustav Wikstrom; USER_FQAN={ /t2k.org/Role=production/Capability=NULL; /t2k.org/Role=NULL/Capability=NULL; }; CMD_NAME=JOB_START; CMD_CATEGORY=JOB_MANAGEMENT; CMD_STATUS=PROCESSING; commandName=JOB_START; cmdExecutorName=BLAHExecutor; userId=DC_ch_DC_cern_OU_Organic_Units_OU_Users_CN_lwikstro_CN_627993_CN_Gustav_Wikstrom_t2k.org_Role_production_Capability_NULL; jobId=CREAM700591149; status=SUCCESSFULL;
> 27 Jan 2011 01:26:47,036 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM700591149 STATUS CHANGED: IDLE => RUNNING [workerNode=N/A]
> 27 Jan 2011 01:26:55,958 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM700591149 STATUS CHANGED: RUNNING => REALLY-RUNNING [workerNode=lcg1166.gridpp.rl.ac.uk]
> 27 Jan 2011 03:12:14,274 org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor - JOB CREAM700591149 STATUS CHANGED: REALLY-RUNNING => CANCELLED
>
>
> 01/27/2011 01:25:46;Q;11462851.lcgbatch01.gridpp.rl.ac.uk;queue=grid500M
> 01/27/2011 01:26:46;S;11462851.lcgbatch01.gridpp.rl.ac.uk;user=t2k028 group=t2k jobname=cre05_700591149 queue=grid500M ctime=1296091546 qtime=1296091546 etime=1296091546 start=1296091606 [log in to unmask] exec_host=lcg1166.gridpp.rl.ac.uk/4 Resource_List.cput=60:00:00 Resource_List.neednodes=1:sl5 Resource_List.nodect=1 Resource_List.nodes=1:sl5 Resource_List.opsys=sl5 Resource_List.pcput=60:00:00 Resource_List.pmem=500mb Resource_List.walltime=72:00:00
> 01/27/2011 03:12:13;D;11462851.lcgbatch01.gridpp.rl.ac.uk;[log in to unmask]
> 01/27/2011 03:12:17;E;11462851.lcgbatch01.gridpp.rl.ac.uk;user=t2k028 group=t2k jobname=cre05_700591149 queue=grid500M ctime=1296091546 qtime=1296091546 etime=1296091546 start=1296091606 [log in to unmask] exec_host=lcg1166.gridpp.rl.ac.uk/4 Resource_List.cput=60:00:00 Resource_List.neednodes=1:sl5 Resource_List.nodect=1 Resource_List.nodes=1:sl5 Resource_List.opsys=sl5 Resource_List.pcput=60:00:00 Resource_List.pmem=500mb Resource_List.walltime=72:00:00 session=319 end=1296097937 Exit_status=271 resources_used.cput=03:20:29 resources_used.mem=617516kb resources_used.vmem=1624416kb resources_used.walltime=03:52:08
>
>
>
|