Hi Gustav,
I had a look at that specific job. It was cancelled when attempted to use more than 500M (resources_used.mem, see below). It is possible that the logging available to you hides the real cause of abortion. You should choose a larger queue for your jobs, 700M or 1000M.
06/02/2011 04:52:23;E;15490141.lcgbatch01.gridpp.rl.ac.uk;user=t2k028 group=t2k jobname=cre05_840367115 queue=grid500M
ctime=1306986008 qtime=1306986008 etime=1306986008 start=1306986159 [log in to unmask] exec_host=lcg1278.gridpp.rl.ac.uk/5 Resource_List.cput=60:00:00 Resource_List.neednodes=lcg1278.gridpp.rl.ac.uk Resource_List.opsys=sl5 Resource_List.pcput=60:00:00 Resource_List.pmem=500mb Resource_List.walltime=72:00:00 session=11047 end=1306986743 Exit_status=271 resources_used.cput=00:22:18 resources_used.mem=548468kb resources_used.vmem=1563948kb resources_used.walltime=00:28:07
Regards,
Catalin Condurache
RAL Tier1 Grid Services
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Gustav Wikström
> Sent: 02 June 2011 10:54
> To: [log in to unmask]
> Subject: Re: 95% of jobs getting cancelled
>
> Hi Stuart,
>
> yes it turns out to be quite a lot of information available. One
> example job with glite-wms-job-logging-info -verbosity 3 below.
>
> There seems to be some retrying going on, but in the final step (at
> the bottom) the job runs for 13 mins before being cancelled by the
> LogMonitor, but by then the status of the job is DONE.
>
> Maybe you can make more out of this?
>
> Gustav
>
>
> ===================== glite-job-logging-info Success
> =====================
>
> LOGGING INFORMATION:
>
> Printing info for the Job :
> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
>
> ---
> Event: RegJob
> - Arrived = Thu Jun 2 05:39:57 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Jobtype = SIMPLE
> - Level = SYSTEM
> - Ns =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Nsubjobs = 0
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Timestamp = Thu Jun 2 05:39:57 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> - Jdl =
> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
> QBohlunQlXPw/JDLToStart
> ---
> Event: RegJob
> - Arrived = Thu Jun 2 05:39:58 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Jobtype = SIMPLE
> - Level = SYSTEM
> - Ns =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Nsubjobs = 0
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Timestamp = Thu Jun 2 05:39:57 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> - Jdl =
> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
> QBohlunQlXPw/JDLToStart
> ---
> Event: Accepted
> - Arrived = Thu Jun 2 05:40:02 2011 CEST
> - From = NetworkServer
> - From host = atlas009.unige.ch
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Timestamp = Thu Jun 2 05:40:01 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> ---
> Event: EnQueued
> - Arrived = Thu Jun 2 05:40:02 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Queue = /var/glite/workload_manager/jobdir
> - Result = START
> - Seqcode =
> UI=000000:NS=0000000003:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> - Job =
> /var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2
> fZIhUvnWm77QBohlunQlXPw/JDLToStart
> ---
> Event: EnQueued
> - Arrived = Thu Jun 2 05:40:02 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Queue = /var/glite/workload_manager/jobdir
> - Result = OK
> - Seqcode =
> UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance =
> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> - Job =
>
> [
> RetryCount = 3;
> LB_sequence_code =
> "UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:L
> RMS=000000:APP=000000:LBS=000000";
> edg_jobid =
> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
> Arguments = "-v v9r7p9 -i
> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
> -e cosmic -p 4C -t rdp -m oaAnalysis";
> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
> JobType = "normal";
> Executable = "ND280Raw_process.py";
> VirtualOrganisation = "t2k.org";
> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
> InputSandbox = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
> process.py"
> };
> StdOutput = "ND280Raw.out";
> ShallowRetryCount = 10;
> InputSandboxDestFileName = {
> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
> ect.py","ND280Raw_process.py"
> };
> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
> OutputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/output";
> requirements = ( (
> Member("VO-t2k.org-ND280-
> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
> DataRequirements = {
> [
> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
> InputData = {
> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
> };
> DataCatalogType = "DLI"
> ] };
> rank = -other.GlueCEStateEstimatedResponseTime;
> Type = "job";
> OutputSandboxDestURI = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
> };
> StdError = "ND280Raw.err";
> DataAccessProtocol = "gsiftp";
> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
> WMPInputSandboxBaseURI =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
> AllowZippedISB = true;
> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
> X509UserProxy =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
> InputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/input";
> OutputSandbox = {
> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
> ]
> ---
> Event: DeQueued
> - Arrived = Thu Jun 2 05:40:02 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Queue = /var/glite/workload_manager/jobdir
> - Seqcode =
> UI=000000:NS=0000000004:WM=000001:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = WorkloadManager
> - Src instance = 10391
> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: Match
> - Arrived = Thu Jun 2 05:40:02 2011 CEST
> - Dest id =
> lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000002:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = WorkloadManager
> - Src instance = 10391
> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: EnQueued
> - Arrived = Thu Jun 2 05:40:03 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Queue = /var/glite/ice/jobdir
> - Result = START
> - Seqcode =
> UI=000000:NS=0000000004:WM=000003:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = WorkloadManager
> - Src instance = 10391
> - Timestamp = Thu Jun 2 05:40:02 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: EnQueued
> - Arrived = Thu Jun 2 05:40:03 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Queue = /var/glite/ice/jobdir
> - Result = OK
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = WorkloadManager
> - Src instance = 10391
> - Timestamp = Thu Jun 2 05:40:03 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> - Job =
>
> [
> Arguments =
> [
> JobAd =
> [
> RetryCount = 3;
> LB_sequence_code =
> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:L
> RMS=000000:APP=000000:LBS=000000";
> ReallyRunningToken =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
> edg_jobid =
> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
> lrms_type = "torque";
> CeRequirements = "true && ( true && (
> Member(\"VO-t2k.org-ND280-
> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) )";
> Arguments = "-v v9r7p9 -i
> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
> -e cosmic -p 4C -t rdp -m oaAnalysis";
> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
> QueueName = "grid500M";
> JobType = "normal";
> Executable = "ND280Raw_process.py";
> VirtualOrganisation = "t2k.org";
> SignificantAttributes = { "Requirements","Rank","FuzzyRank"
> };
> InputSandbox = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
> XPw/input/.BrokerInfo"
> };
> StdOutput = "ND280Raw.out";
> ShallowRetryCount = 10;
> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
> InputSandboxDestFileName = {
> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
> ect.py","ND280Raw_process.py"
> };
> OutputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/output";
> requirements = ( (
> Member("VO-t2k.org-ND280-
> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
> DataRequirements = {
> [
> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
> InputData = {
> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
> };
> DataCatalogType = "DLI"
> ] };
> rank = -other.GlueCEStateEstimatedResponseTime;
> Type = "job";
> OutputSandboxDestURI = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
> };
> StdError = "ND280Raw.err";
> DataAccessProtocol = "gsiftp";
> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
> WMPInputSandboxBaseURI =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
> AllowZippedISB = true;
> X509UserProxy =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
> GlobusResourceContactString =
> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
> InputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/input";
> OutputSandbox = {
> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
> ]
> ];
> Command = "Submit";
> Source = 2;
> Protocol = "1.0.0"
> ]
> ---
> Event: DeQueued
> - Arrived = Thu Jun 2 05:40:04 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Local jobid =
> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
> - Priority = synchronous
> - Queue = /var/glite/ice/jobdir
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000001:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = JobController
> - Timestamp = Thu Jun 2 05:40:03 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: Transfer
> - Arrived = Thu Jun 2 05:40:04 2011 CEST
> - Dest host =
> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
> - Dest instance = unavailable
> - Dest jobid = unavailable
> - Destination = LRMS
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Reason = unavailable
> - Result = START
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000001:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Timestamp = Thu Jun 2 05:40:04 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> - Job =
>
> [
> RetryCount = 3;
> LB_sequence_code =
> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000000:L
> RMS=000000:APP=000000:LBS=000000";
> ReallyRunningToken =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
> edg_jobid =
> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
> lrms_type = "torque";
> CeRequirements = "true && ( true && (
> Member(\"VO-t2k.org-ND280-
> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) )";
> Arguments = "-v v9r7p9 -i
> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
> -e cosmic -p 4C -t rdp -m oaAnalysis";
> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
> QueueName = "grid500M";
> JobType = "normal";
> Executable = "ND280Raw_process.py";
> VirtualOrganisation = "t2k.org";
> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
> InputSandbox = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
> XPw/input/.BrokerInfo"
> };
> StdOutput = "ND280Raw.out";
> ShallowRetryCount = 10;
> InputSandboxDestFileName = {
> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
> ect.py","ND280Raw_process.py"
> };
> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
> OutputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/output";
> requirements = ( (
> Member("VO-t2k.org-ND280-
> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
> DataRequirements = {
> [
> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
> InputData = {
> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
> };
> DataCatalogType = "DLI"
> ] };
> rank = -other.GlueCEStateEstimatedResponseTime;
> Type = "job";
> OutputSandboxDestURI = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
> };
> StdError = "ND280Raw.err";
> DataAccessProtocol = "gsiftp";
> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
> WMPInputSandboxBaseURI =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
> AllowZippedISB = true;
> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
> X509UserProxy =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
> GlobusResourceContactString =
> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
> InputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/input";
> OutputSandbox = {
> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
> ]
> ---
> Event: Running
> - Arrived = Thu Jun 2 05:42:39 2011 CEST
> - Host = lcg1278.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Node = lcg1278.gridpp.rl.ac.uk
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
> MS=000001:APP=000000:LBS=000000
> - Source = LRMS
> - Timestamp = Thu Jun 2 05:42:39 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> ---
> Event: ReallyRunning
> - Arrived = Thu Jun 2 05:42:46 2011 CEST
> - Host = lcg1278.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
> MS=000003:APP=000000:LBS=000000
> - Source = LRMS
> - Timestamp = Thu Jun 2 05:42:46 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
> ---
> Event: Transfer
> - Arrived = Thu Jun 2 05:40:05 2011 CEST
> - Dest host =
> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
> - Dest instance = unavailable
> - Dest jobid =
> https://lcgce05.gridpp.rl.ac.uk:8443/CREAM840367115
> - Destination = LRMS
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Reason = unavailable
> - Result = OK
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000003:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Timestamp = Thu Jun 2 05:40:04 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> - Job =
>
> [
> RetryCount = 3;
> LB_sequence_code =
> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:L
> RMS=000000:APP=000000:LBS=000000";
> ReallyRunningToken =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
> edg_jobid =
> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
> lrms_type = "torque";
> CeRequirements = "true && ( true && (
> Member(\"VO-t2k.org-ND280-
> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) )";
> Arguments = "-v v9r7p9 -i
> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
> -e cosmic -p 4C -t rdp -m oaAnalysis";
> CertificateSubject = "/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
> MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
> ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
> QueueName = "grid500M";
> JobType = "normal";
> Executable = "ND280Raw_process.py";
> VirtualOrganisation = "t2k.org";
> SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
> InputSandbox = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
> XPw/input/.BrokerInfo"
> };
> StdOutput = "ND280Raw.out";
> ShallowRetryCount = 10;
> InputSandboxDestFileName = {
> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
> ect.py","ND280Raw_process.py"
> };
> VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
> OutputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/output";
> requirements = ( (
> Member("VO-t2k.org-ND280-
> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
> && other.GlueCEPolicyMaxCPUTime > 600 &&
> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
> == "Production" ) ) && !RegExp(".*sdj$",other.GlueCEUniqueID);
> DataRequirements = {
> [
> DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
> InputData = {
> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
> };
> DataCatalogType = "DLI"
> ] };
> rank = -other.GlueCEStateEstimatedResponseTime;
> Type = "job";
> OutputSandboxDestURI = {
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
> };
> StdError = "ND280Raw.err";
> DataAccessProtocol = "gsiftp";
> DefaultRank = -other.GlueCEStateEstimatedResponseTime;
> CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
> WMPInputSandboxBaseURI =
> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
> AllowZippedISB = true;
> ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
> X509UserProxy =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
> GlobusResourceContactString =
> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
> InputSandboxPath =
> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
> 2fZIhUvnWm77QBohlunQlXPw/input";
> OutputSandbox = {
> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
> ]
> ---
> Event: Running
> - Arrived = Thu Jun 2 05:50:58 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Node = lcg1278.gridpp.rl.ac.uk
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000005:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Timestamp = Thu Jun 2 05:50:58 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: ReallyRunning
> - Arrived = Thu Jun 2 05:50:59 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000007:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Timestamp = Thu Jun 2 05:50:58 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> - Wn seq =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
> MS=000000:APP=000000:LBS=000000
> ---
> Event: Cancel
> - Arrived = Thu Jun 2 06:03:11 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000009:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Status code = DONE
> - Timestamp = Thu Jun 2 06:03:11 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: Done
> - Arrived = Thu Jun 2 06:03:11 2011 CEST
> - Exit code = 0
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Seqcode =
> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000010:LR
> MS=000000:APP=000000:LBS=000000
> - Source = LogMonitor
> - Status code = CANCELLED
> - Timestamp = Thu Jun 2 06:03:11 2011 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
> Wikstrom/CN=proxy/CN=proxy
> ---
> Event: Clear
> - Arrived = Thu Jun 2 06:03:12 2011 CEST
> - Host = lcgwms03.gridpp.rl.ac.uk
> - Level = SYSTEM
> - Priority = synchronous
> - Reason = 1
> - Seqcode =
> UI=000009:NS=0000096670:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
> MS=000000:APP=000000:LBS=000000
> - Source = NetworkServer
> - Src instance = 22543
> - Timestamp = Thu Jun 2 06:03:12 2011 CEST
> - User =
> /C=UK/O=eScience/OU=CLRC/L=RAL/CN=lcgwms03.gridpp.rl.ac.uk/Email=tier1a
> [log in to unmask]
> =======================================================================
> ===
>
>
>
> 2011/6/2 Stuart Purdie <[log in to unmask]>:
> >
> > On 2 Jun 2011, at 09:49, Gustav Wikström wrote:
> >
> >> Hi all,
> >>
> >> I'm having serious problems with running my VO t2k.org jobs,
> currently
> >> 95% of them are being cancelled by the WMSs
> (lcgwms03.gridpp.rl.ac.uk
> >> and wms02.grid.hep.ic.ac.uk) or the CEs. As I understand it, when a
> >> WMS stops a job, it is labeled Aborted, and then Cancelled is when a
> >> CE stops a job? The bad thing is that there is no information about
> a
> >> job after it has been stopped unless it failed.
> >>
> >> So, what could cause a job to be cancelled? Is memory usage one of
> the reasons?
> >
> > Not the most likely culprit, as it's not the most strongly enforced
> constriant across all sites, but it is possible. It does have a bit of
> a site dependance, so if the 5% that don't get cancelled end up on a
> different site, that's useful data. Job CPU use and Wall time are more
> strongly enforced; but it could also be missing input files causing the
> jobs to die on start up.
> >
> > If it's (apparently) randomly distributed across all sites, the first
> thing I'd be checking is proxy lifespans, job queueing time and
> myproxy stuff (if used).
> >
> > There might be more information lurking around, which, if you've not
> tried already, can be released with 'glite-wms-job-status --verbosity 3
> <jid>', and 'glite-wms-job-logging-info --verbosity 3 <jid>'
> > which might give more idea on where to poke at next. In particular,
> the WMS (by default) will try re-submitting a failed job a couple of
> times, and walking through that process might be informative. The
> amount of time jobs spend running might also help identify the root
> problem.
> >
> >
> >
|