Hi Catalin,
I checked that, but most of the jobs that went to the 1000M queue also
failed. But for sure, the 5% that made it went to that queue. This is
surprising, since last week most jobs of the same kind were passing
fine. Is the memory limit applied stricter now?
A more general question: when a job exceeds memory limits, shouldn't
it fail instead of being cancelled? At least then the user could get
the reason for the failure.
Cheers,
Gustav
2011/6/2 Catalin Condurache <[log in to unmask]>:
> Hi Gustav,
>
> I had a look at that specific job. It was cancelled when attempted to use more than 500M (resources_used.mem, see below). It is possible that the logging available to you hides the real cause of abortion. You should choose a larger queue for your jobs, 700M or 1000M.
>
> 06/02/2011 04:52:23;E;15490141.lcgbatch01.gridpp.rl.ac.uk;user=t2k028 group=t2k jobname=cre05_840367115 queue=grid500M
> ctime=1306986008 qtime=1306986008 etime=1306986008 start=1306986159 [log in to unmask] exec_host=lcg1278.gridpp.rl.ac.uk/5 Resource_List.cput=60:00:00 Resource_List.neednodes=lcg1278.gridpp.rl.ac.uk Resource_List.opsys=sl5 Resource_List.pcput=60:00:00 Resource_List.pmem=500mb Resource_List.walltime=72:00:00 session=11047 end=1306986743 Exit_status=271 resources_used.cput=00:22:18 resources_used.mem=548468kb resources_used.vmem=1563948kb resources_used.walltime=00:28:07
>
> Regards,
> Catalin Condurache
> RAL Tier1 Grid Services
>
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Gustav Wikström
>> Sent: 02 June 2011 10:54
>> To: [log in to unmask]
>> Subject: Re: 95% of jobs getting cancelled
>>
>> Hi Stuart,
>>
>> yes it turns out to be quite a lot of information available. One
>> example job with glite-wms-job-logging-info -verbosity 3 below.
>>
>> There seems to be some retrying going on, but in the final step (at
>> the bottom) the job runs for 13 mins before being cancelled by the
>> LogMonitor, but by then the status of the job is DONE.
>>
>> Maybe you can make more out of this?
>>
>> Gustav
>>
>>
>> ===================== glite-job-logging-info Success
>> =====================
>>
>> LOGGING INFORMATION:
>>
>> Printing info for the Job :
>> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
>>
>> ---
>> Event: RegJob
>> - Arrived = Thu Jun 2 05:39:57 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Jobtype = SIMPLE
>> - Level = SYSTEM
>> - Ns =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Nsubjobs = 0
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp = Thu Jun 2 05:39:57 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Jdl =
>> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
>> QBohlunQlXPw/JDLToStart
>> ---
>> Event: RegJob
>> - Arrived = Thu Jun 2 05:39:58 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Jobtype = SIMPLE
>> - Level = SYSTEM
>> - Ns =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Nsubjobs = 0
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp = Thu Jun 2 05:39:57 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Jdl =
>> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
>> QBohlunQlXPw/JDLToStart
>> ---
>> Event: Accepted
>> - Arrived = Thu Jun 2 05:40:02 2011 CEST
>> - From = NetworkServer
>> - From host = atlas009.unige.ch
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp = Thu Jun 2 05:40:01 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> ---
>> Event: EnQueued
>> - Arrived = Thu Jun 2 05:40:02 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Queue = /var/glite/workload_manager/jobdir
>> - Result = START
>> - Seqcode =
>> UI=000000:NS=0000000003:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Job =
>> /var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2
>> fZIhUvnWm77QBohlunQlXPw/JDLToStart
>> ---
>> Event: EnQueued
>> - Arrived = Thu Jun 2 05:40:02 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Queue = /var/glite/workload_manager/jobdir
>> - Result = OK
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Job =
>>
>> [
>> RetryCount = 3;
>> LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>> edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>> Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>> JobType = "normal";
>> Executable = "ND280Raw_process.py";
>> VirtualOrganisation = "t2k.org";
>> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>> InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py"
>> };
>> StdOutput = "ND280Raw.out";
>> ShallowRetryCount = 10;
>> InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>> OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>> requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
>> DataRequirements = {
>> [
>> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>> InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>> DataCatalogType = "DLI"
>> ] };
>> rank = -other.GlueCEStateEstimatedResponseTime;
>> Type = "job";
>> OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>> StdError = "ND280Raw.err";
>> DataAccessProtocol = "gsiftp";
>> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
>> WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>> AllowZippedISB = true;
>> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>> X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>> InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>> OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>> ]
>> ---
>> Event: DeQueued
>> - Arrived = Thu Jun 2 05:40:02 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Queue = /var/glite/workload_manager/jobdir
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000001:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = WorkloadManager
>> - Src instance = 10391
>> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: Match
>> - Arrived = Thu Jun 2 05:40:02 2011 CEST
>> - Dest id =
>> lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000002:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = WorkloadManager
>> - Src instance = 10391
>> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: EnQueued
>> - Arrived = Thu Jun 2 05:40:03 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Queue = /var/glite/ice/jobdir
>> - Result = START
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000003:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = WorkloadManager
>> - Src instance = 10391
>> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: EnQueued
>> - Arrived = Thu Jun 2 05:40:03 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Queue = /var/glite/ice/jobdir
>> - Result = OK
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = WorkloadManager
>> - Src instance = 10391
>> - Timestamp = Thu Jun 2 05:40:03 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job =
>>
>> [
>> Arguments =
>> [
>> JobAd =
>> [
>> RetryCount = 3;
>> LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>> ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>> edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>> lrms_type = "torque";
>> CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>> Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>> QueueName = "grid500M";
>> JobType = "normal";
>> Executable = "ND280Raw_process.py";
>> VirtualOrganisation = "t2k.org";
>> SignificantAttributes = { "Requirements","Rank","FuzzyRank"
>> };
>> InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>> StdOutput = "ND280Raw.out";
>> ShallowRetryCount = 10;
>> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>> InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>> OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>> requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
>> DataRequirements = {
>> [
>> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>> InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>> DataCatalogType = "DLI"
>> ] };
>> rank = -other.GlueCEStateEstimatedResponseTime;
>> Type = "job";
>> OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>> StdError = "ND280Raw.err";
>> DataAccessProtocol = "gsiftp";
>> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
>> WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>> AllowZippedISB = true;
>> X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>> GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>> InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>> OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>> ]
>> ];
>> Command = "Submit";
>> Source = 2;
>> Protocol = "1.0.0"
>> ]
>> ---
>> Event: DeQueued
>> - Arrived = Thu Jun 2 05:40:04 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Local jobid =
>> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
>> - Priority = synchronous
>> - Queue = /var/glite/ice/jobdir
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000001:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = JobController
>> - Timestamp = Thu Jun 2 05:40:03 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: Transfer
>> - Arrived = Thu Jun 2 05:40:04 2011 CEST
>> - Dest host =
>> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
>> - Dest instance = unavailable
>> - Dest jobid = unavailable
>> - Destination = LRMS
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Reason = unavailable
>> - Result = START
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000001:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Timestamp = Thu Jun 2 05:40:04 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job =
>>
>> [
>> RetryCount = 3;
>> LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>> ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>> edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>> lrms_type = "torque";
>> CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>> Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>> QueueName = "grid500M";
>> JobType = "normal";
>> Executable = "ND280Raw_process.py";
>> VirtualOrganisation = "t2k.org";
>> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>> InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>> StdOutput = "ND280Raw.out";
>> ShallowRetryCount = 10;
>> InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>> OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>> requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
>> DataRequirements = {
>> [
>> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>> InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>> DataCatalogType = "DLI"
>> ] };
>> rank = -other.GlueCEStateEstimatedResponseTime;
>> Type = "job";
>> OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>> StdError = "ND280Raw.err";
>> DataAccessProtocol = "gsiftp";
>> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
>> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>> WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>> AllowZippedISB = true;
>> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>> X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>> GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>> InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>> OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>> ]
>> ---
>> Event: Running
>> - Arrived = Thu Jun 2 05:42:39 2011 CEST
>> - Host = lcg1278.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Node = lcg1278.gridpp.rl.ac.uk
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000001:APP=000000:LBS=000000
>> - Source = LRMS
>> - Timestamp = Thu Jun 2 05:42:39 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> ---
>> Event: ReallyRunning
>> - Arrived = Thu Jun 2 05:42:46 2011 CEST
>> - Host = lcg1278.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000003:APP=000000:LBS=000000
>> - Source = LRMS
>> - Timestamp = Thu Jun 2 05:42:46 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> ---
>> Event: Transfer
>> - Arrived = Thu Jun 2 05:40:05 2011 CEST
>> - Dest host =
>> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
>> - Dest instance = unavailable
>> - Dest jobid =
>> https://lcgce05.gridpp.rl.ac.uk:8443/CREAM840367115
>> - Destination = LRMS
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Reason = unavailable
>> - Result = OK
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000003:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Timestamp = Thu Jun 2 05:40:04 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job =
>>
>> [
>> RetryCount = 3;
>> LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:L
>> RMS=000000:APP=000000:LBS=000000";
>> ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>> edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>> lrms_type = "torque";
>> CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>> Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>> QueueName = "grid500M";
>> JobType = "normal";
>> Executable = "ND280Raw_process.py";
>> VirtualOrganisation = "t2k.org";
>> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>> InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>> StdOutput = "ND280Raw.out";
>> ShallowRetryCount = 10;
>> InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>> OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>> requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
>> DataRequirements = {
>> [
>> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>> InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>> DataCatalogType = "DLI"
>> ] };
>> rank = -other.GlueCEStateEstimatedResponseTime;
>> Type = "job";
>> OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>> StdError = "ND280Raw.err";
>> DataAccessProtocol = "gsiftp";
>> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
>> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>> WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>> AllowZippedISB = true;
>> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>> X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>> GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>> InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>> OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>> ]
>> ---
>> Event: Running
>> - Arrived = Thu Jun 2 05:50:58 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Node = lcg1278.gridpp.rl.ac.uk
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000005:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Timestamp = Thu Jun 2 05:50:58 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: ReallyRunning
>> - Arrived = Thu Jun 2 05:50:59 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000007:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Timestamp = Thu Jun 2 05:50:58 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Wn seq =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000000:APP=000000:LBS=000000
>> ---
>> Event: Cancel
>> - Arrived = Thu Jun 2 06:03:11 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000009:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Status code = DONE
>> - Timestamp = Thu Jun 2 06:03:11 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: Done
>> - Arrived = Thu Jun 2 06:03:11 2011 CEST
>> - Exit code = 0
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Seqcode =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000010:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = LogMonitor
>> - Status code = CANCELLED
>> - Timestamp = Thu Jun 2 06:03:11 2011 CEST
>> - User = /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> ---
>> Event: Clear
>> - Arrived = Thu Jun 2 06:03:12 2011 CEST
>> - Host = lcgwms03.gridpp.rl.ac.uk
>> - Level = SYSTEM
>> - Priority = synchronous
>> - Reason = 1
>> - Seqcode =
>> UI=000009:NS=0000096670:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source = NetworkServer
>> - Src instance = 22543
>> - Timestamp = Thu Jun 2 06:03:12 2011 CEST
>> - User =
>> /C=UK/O=eScience/OU=CLRC/L=RAL/CN=lcgwms03.gridpp.rl.ac.uk/Email=tier1a
>> [log in to unmask]
>> =======================================================================
>> ===
>>
>>
>>
>> 2011/6/2 Stuart Purdie <[log in to unmask]>:
>> >
>> > On 2 Jun 2011, at 09:49, Gustav Wikström wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm having serious problems with running my VO t2k.org jobs,
>> currently
>> >> 95% of them are being cancelled by the WMSs
>> (lcgwms03.gridpp.rl.ac.uk
>> >> and wms02.grid.hep.ic.ac.uk) or the CEs. As I understand it, when a
>> >> WMS stops a job, it is labeled Aborted, and then Cancelled is when a
>> >> CE stops a job? The bad thing is that there is no information about
>> a
>> >> job after it has been stopped unless it failed.
>> >>
>> >> So, what could cause a job to be cancelled? Is memory usage one of
>> the reasons?
>> >
>> > Not the most likely culprit, as it's not the most strongly enforced
>> constriant across all sites, but it is possible. It does have a bit of
>> a site dependance, so if the 5% that don't get cancelled end up on a
>> different site, that's useful data. Job CPU use and Wall time are more
>> strongly enforced; but it could also be missing input files causing the
>> jobs to die on start up.
>> >
>> > If it's (apparently) randomly distributed across all sites, the first
>> thing I'd be checking is proxy lifespans, job queueing time and
>> myproxy stuff (if used).
>> >
>> > There might be more information lurking around, which, if you've not
>> tried already, can be released with 'glite-wms-job-status --verbosity 3
>> <jid>', and 'glite-wms-job-logging-info --verbosity 3 <jid>'
>> > which might give more idea on where to poke at next. In particular,
>> the WMS (by default) will try re-submitting a failed job a couple of
>> times, and walking through that process might be informative. The
>> amount of time jobs spend running might also help identify the root
>> problem.
>> >
>> >
>> >
>
|