Print

Print


Hi Catalin,

I checked that, but most of the jobs that went to the 1000M queue also
failed. But for sure, the 5% that made it went to that queue. This is
surprising, since last week most jobs of the same kind were passing
fine. Is the memory limit applied stricter now?

A more general question: when a job exceeds memory limits, shouldn't
it fail instead of being cancelled? At least then the user could get
the reason for the failure.

Cheers,
Gustav

2011/6/2 Catalin Condurache <[log in to unmask]>:
> Hi Gustav,
>
> I had a look at that specific job. It was cancelled when attempted to use more than 500M (resources_used.mem, see below). It is possible that the logging available to you hides the real cause of abortion. You should choose a larger queue for your jobs, 700M or 1000M.
>
> 06/02/2011 04:52:23;E;15490141.lcgbatch01.gridpp.rl.ac.uk;user=t2k028 group=t2k jobname=cre05_840367115 queue=grid500M
> ctime=1306986008 qtime=1306986008 etime=1306986008 start=1306986159 [log in to unmask] exec_host=lcg1278.gridpp.rl.ac.uk/5 Resource_List.cput=60:00:00 Resource_List.neednodes=lcg1278.gridpp.rl.ac.uk Resource_List.opsys=sl5 Resource_List.pcput=60:00:00 Resource_List.pmem=500mb Resource_List.walltime=72:00:00 session=11047 end=1306986743 Exit_status=271 resources_used.cput=00:22:18 resources_used.mem=548468kb resources_used.vmem=1563948kb resources_used.walltime=00:28:07
>
> Regards,
> Catalin Condurache
> RAL Tier1 Grid Services
>
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Gustav Wikström
>> Sent: 02 June 2011 10:54
>> To: [log in to unmask]
>> Subject: Re: 95% of jobs getting cancelled
>>
>> Hi Stuart,
>>
>> yes it turns out to be quite a lot of information available. One
>> example job with glite-wms-job-logging-info -verbosity 3 below.
>>
>> There seems to be some retrying going on, but in the final step (at
>> the bottom) the job runs for 13 mins before being cancelled by the
>> LogMonitor, but by then the status of the job is DONE.
>>
>> Maybe you can make more out of this?
>>
>> Gustav
>>
>>
>> ===================== glite-job-logging-info Success
>> =====================
>>
>> LOGGING INFORMATION:
>>
>> Printing info for the Job :
>> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
>>
>>       ---
>> Event: RegJob
>> - Arrived                    =    Thu Jun  2 05:39:57 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Jobtype                    =    SIMPLE
>> - Level                      =    SYSTEM
>> - Ns                         =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Nsubjobs                   =    0
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp                  =    Thu Jun  2 05:39:57 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Jdl            =
>> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
>> QBohlunQlXPw/JDLToStart
>>       ---
>> Event: RegJob
>> - Arrived                    =    Thu Jun  2 05:39:58 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Jobtype                    =    SIMPLE
>> - Level                      =    SYSTEM
>> - Ns                         =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Nsubjobs                   =    0
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp                  =    Thu Jun  2 05:39:57 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Jdl            =
>> SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77
>> QBohlunQlXPw/JDLToStart
>>       ---
>> Event: Accepted
>> - Arrived                    =    Thu Jun  2 05:40:02 2011 CEST
>> - From                       =    NetworkServer
>> - From host                  =    atlas009.unige.ch
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp                  =    Thu Jun  2 05:40:01 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>>       ---
>> Event: EnQueued
>> - Arrived                    =    Thu Jun  2 05:40:02 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/workload_manager/jobdir
>> - Result                     =    START
>> - Seqcode                    =
>> UI=000000:NS=0000000003:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp                  =    Thu Jun  2 05:40:02 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Job            =
>> /var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2
>> fZIhUvnWm77QBohlunQlXPw/JDLToStart
>>       ---
>> Event: EnQueued
>> - Arrived                    =    Thu Jun  2 05:40:02 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/workload_manager/jobdir
>> - Result                     =    OK
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =
>> https://lcgwms03.gridpp.rl.ac.uk:7443/glite_wms_wmproxy_server
>> - Timestamp                  =    Thu Jun  2 05:40:02 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>> - Job            =
>>
>>        [
>>         RetryCount = 3;
>>         LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000000:BH=0000000000:JSS=000000:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>>         edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>>         Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>>         CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>>         MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>>         JobType = "normal";
>>         Executable = "ND280Raw_process.py";
>>         VirtualOrganisation = "t2k.org";
>>         SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>>         InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py"
>> };
>>         StdOutput = "ND280Raw.out";
>>         ShallowRetryCount = 10;
>>         InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>>         VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>>         OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>>         requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) &&  !RegExp(".*sdj$",other.GlueCEUniqueID);
>>         DataRequirements = {
>>          [
>>           DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>>           InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>>           DataCatalogType = "DLI"
>>          ] };
>>         rank =  -other.GlueCEStateEstimatedResponseTime;
>>         Type = "job";
>>         OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>>         StdError = "ND280Raw.err";
>>         DataAccessProtocol = "gsiftp";
>>         DefaultRank =  -other.GlueCEStateEstimatedResponseTime;
>>         WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>>         AllowZippedISB = true;
>>         ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>>         X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>>         InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>>         OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>>        ]
>>       ---
>> Event: DeQueued
>> - Arrived                    =    Thu Jun  2 05:40:02 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/workload_manager/jobdir
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000001:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    WorkloadManager
>> - Src instance               =    10391
>> - Timestamp                  =    Thu Jun  2 05:40:02 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: Match
>> - Arrived                    =    Thu Jun  2 05:40:02 2011 CEST
>> - Dest id                    =
>> lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000002:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    WorkloadManager
>> - Src instance               =    10391
>> - Timestamp                  =    Thu Jun  2 05:40:02 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: EnQueued
>> - Arrived                    =    Thu Jun  2 05:40:03 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/ice/jobdir
>> - Result                     =    START
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000003:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    WorkloadManager
>> - Src instance               =    10391
>> - Timestamp                  =    Thu Jun  2 05:40:02 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: EnQueued
>> - Arrived                    =    Thu Jun  2 05:40:03 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/ice/jobdir
>> - Result                     =    OK
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    WorkloadManager
>> - Src instance               =    10391
>> - Timestamp                  =    Thu Jun  2 05:40:03 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job            =
>>
>>        [
>>         Arguments =
>>          [
>>           JobAd =
>>            [
>>             RetryCount = 3;
>>             LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000000:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>>             ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>>             edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>>             lrms_type = "torque";
>>             CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>>             Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>>             CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>>             MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>>             ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>>             QueueName = "grid500M";
>>             JobType = "normal";
>>             Executable = "ND280Raw_process.py";
>>             VirtualOrganisation = "t2k.org";
>>             SignificantAttributes = { "Requirements","Rank","FuzzyRank"
>> };
>>             InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>>             StdOutput = "ND280Raw.out";
>>             ShallowRetryCount = 10;
>>             VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>>             InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>>             OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>>             requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) &&  !RegExp(".*sdj$",other.GlueCEUniqueID);
>>             DataRequirements = {
>>              [
>>               DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>>               InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>>               DataCatalogType = "DLI"
>>              ] };
>>             rank =  -other.GlueCEStateEstimatedResponseTime;
>>             Type = "job";
>>             OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>>             StdError = "ND280Raw.err";
>>             DataAccessProtocol = "gsiftp";
>>             DefaultRank =  -other.GlueCEStateEstimatedResponseTime;
>>             WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>>             CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>>             ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>>             AllowZippedISB = true;
>>             X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>>             GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>>             InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>>             OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>>            ]
>>          ];
>>         Command = "Submit";
>>         Source = 2;
>>         Protocol = "1.0.0"
>>        ]
>>       ---
>> Event: DeQueued
>> - Arrived                    =    Thu Jun  2 05:40:04 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Local jobid                =
>> https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw
>> - Priority                   =    synchronous
>> - Queue                      =    /var/glite/ice/jobdir
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000001:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    JobController
>> - Timestamp                  =    Thu Jun  2 05:40:03 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: Transfer
>> - Arrived                    =    Thu Jun  2 05:40:04 2011 CEST
>> - Dest host                  =
>> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
>> - Dest instance              =    unavailable
>> - Dest jobid                 =    unavailable
>> - Destination                =    LRMS
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Reason                     =    unavailable
>> - Result                     =    START
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000001:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Timestamp                  =    Thu Jun  2 05:40:04 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job            =
>>
>>        [
>>         RetryCount = 3;
>>         LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000000:L
>> RMS=000000:APP=000000:LBS=000000";
>>         ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>>         edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>>         lrms_type = "torque";
>>         CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>>         Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>>         CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>>         MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>>         ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>>         QueueName = "grid500M";
>>         JobType = "normal";
>>         Executable = "ND280Raw_process.py";
>>         VirtualOrganisation = "t2k.org";
>>         SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>>         InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>>         StdOutput = "ND280Raw.out";
>>         ShallowRetryCount = 10;
>>         InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>>         VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>>         OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>>         requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) &&  !RegExp(".*sdj$",other.GlueCEUniqueID);
>>         DataRequirements = {
>>          [
>>           DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>>           InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>>           DataCatalogType = "DLI"
>>          ] };
>>         rank =  -other.GlueCEStateEstimatedResponseTime;
>>         Type = "job";
>>         OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>>         StdError = "ND280Raw.err";
>>         DataAccessProtocol = "gsiftp";
>>         DefaultRank =  -other.GlueCEStateEstimatedResponseTime;
>>         CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>>         WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>>         AllowZippedISB = true;
>>         ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>>         X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>>         GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>>         InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>>         OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>>        ]
>>       ---
>> Event: Running
>> - Arrived                    =    Thu Jun  2 05:42:39 2011 CEST
>> - Host                       =    lcg1278.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Node                       =    lcg1278.gridpp.rl.ac.uk
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000001:APP=000000:LBS=000000
>> - Source                     =    LRMS
>> - Timestamp                  =    Thu Jun  2 05:42:39 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>>       ---
>> Event: ReallyRunning
>> - Arrived                    =    Thu Jun  2 05:42:46 2011 CEST
>> - Host                       =    lcg1278.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000003:APP=000000:LBS=000000
>> - Source                     =    LRMS
>> - Timestamp                  =    Thu Jun  2 05:42:46 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom
>>       ---
>> Event: Transfer
>> - Arrived                    =    Thu Jun  2 05:40:05 2011 CEST
>> - Dest host                  =
>> https://lcgce05.gridpp.rl.ac.uk:8443/ce-cream/services/CREAM2
>> - Dest instance              =    unavailable
>> - Dest jobid                 =
>> https://lcgce05.gridpp.rl.ac.uk:8443/CREAM840367115
>> - Destination                =    LRMS
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Reason                     =    unavailable
>> - Result                     =    OK
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000003:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Timestamp                  =    Thu Jun  2 05:40:04 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Job            =
>>
>>        [
>>         RetryCount = 3;
>>         LB_sequence_code =
>> "UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:L
>> RMS=000000:APP=000000:LBS=000000";
>>         ReallyRunningToken =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk/var/glite/SandboxDir/ZI/https_3a_2f_
>> 2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/token.txt";
>>         edg_jobid =
>> "https://lcglb01.gridpp.rl.ac.uk:9000/ZIhUvnWm77QBohlunQlXPw";
>>         lrms_type = "torque";
>>         CeRequirements = "true && ( true && (
>> Member(\"VO-t2k.org-ND280-
>> v9r7p9\",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) )";
>>         Arguments = "-v v9r7p9 -i
>> lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/rec
>> o/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root
>> -e cosmic -p 4C -t rdp -m oaAnalysis";
>>         CertificateSubject = "/DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav Wikstrom";
>>         MyProxyServer = "lcgrbp01.gridpp.rl.ac.uk";
>>         ce_id = "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs-grid500M";
>>         QueueName = "grid500M";
>>         JobType = "normal";
>>         Executable = "ND280Raw_process.py";
>>         VirtualOrganisation = "t2k.org";
>>         SignificantAttributes = { "Requirements","Rank","FuzzyRank" };
>>         InputSandbox = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND
>> 280Configs.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/Sandbo
>> xDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlu
>> nQlXPw/input/ND280GRID.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/
>> glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhU
>> vnWm77QBohlunQlXPw/input/ND280Job.py","gsiftp://lcgwms03.gridpp.rl.ac.u
>> k:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a
>> 9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Software.py","gsiftp://lcgwms0
>> 3.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gr
>> idpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/pexpect.py","gsiftp
>> ://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3a_2f_2f
>> lcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/input/ND280Raw_
>> process.py","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDi
>> r/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQl
>> XPw/input/.BrokerInfo"
>> };
>>         StdOutput = "ND280Raw.out";
>>         ShallowRetryCount = 10;
>>         InputSandboxDestFileName = {
>> "ND280Configs.py","ND280GRID.py","ND280Job.py","ND280Software.py","pexp
>> ect.py","ND280Raw_process.py"
>> };
>>         VOMS_FQAN = "/t2k.org/Role=production/Capability=NULL";
>>         OutputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/output";
>>         requirements = ( (
>> Member("VO-t2k.org-ND280-
>> v9r7p9",other.GlueHostApplicationSoftwareRunTimeEnvironment)
>> && other.GlueCEPolicyMaxCPUTime > 600 &&
>> other.GlueHostMainMemoryRAMSize >= 512 ) && ( other.GlueCEStateStatus
>> == "Production" ) ) &&  !RegExp(".*sdj$",other.GlueCEUniqueID);
>>         DataRequirements = {
>>          [
>>           DataCatalog = "http://lfc.gridpp.rl.ac.uk:8085/";
>>           InputData = {
>> "lfn:/grid/t2k.org/nd280/production004/B/rdp/ND280/00006000_00006999/re
>> co/oa_nd_cos_00006945-0166_muaf23qoshfp_reco_000_v9r7p5.root"
>> };
>>           DataCatalogType = "DLI"
>>          ] };
>>         rank =  -other.GlueCEStateEstimatedResponseTime;
>>         Type = "job";
>>         OutputSandboxDestURI = {
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw/output/N
>> D280Raw.out","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxD
>> ir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQ
>> lXPw/output/ND280Raw.err","gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/g
>> lite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUv
>> nWm77QBohlunQlXPw/output/Raw_cosmic_00006945-0166_v9r7p9.cfg"
>> };
>>         StdError = "ND280Raw.err";
>>         DataAccessProtocol = "gsiftp";
>>         DefaultRank =  -other.GlueCEStateEstimatedResponseTime;
>>         CeApplicationDir = "/stage/sl3-lcg-exp/t2ksgm";
>>         WMPInputSandboxBaseURI =
>> "gsiftp://lcgwms03.gridpp.rl.ac.uk:2811/var/glite/SandboxDir/ZI/https_3
>> a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_2fZIhUvnWm77QBohlunQlXPw";
>>         AllowZippedISB = true;
>>         ZippedISB = { "ISBfiles_avupuyXqSP-D8K3jUCJpCA_0.tar.gz" };
>>         X509UserProxy =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/user.proxy";
>>         GlobusResourceContactString =
>> "lcgce05.gridpp.rl.ac.uk:8443/cream-pbs";
>>         InputSandboxPath =
>> "/var/glite/SandboxDir/ZI/https_3a_2f_2flcglb01.gridpp.rl.ac.uk_3a9000_
>> 2fZIhUvnWm77QBohlunQlXPw/input";
>>         OutputSandbox = {
>> "ND280Raw.out","ND280Raw.err","Raw_cosmic_00006945-0166_v9r7p9.cfg" }
>>        ]
>>       ---
>> Event: Running
>> - Arrived                    =    Thu Jun  2 05:50:58 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Node                       =    lcg1278.gridpp.rl.ac.uk
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000005:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Timestamp                  =    Thu Jun  2 05:50:58 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: ReallyRunning
>> - Arrived                    =    Thu Jun  2 05:50:59 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000007:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Timestamp                  =    Thu Jun  2 05:50:58 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>> - Wn seq                     =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000002:LR
>> MS=000000:APP=000000:LBS=000000
>>       ---
>> Event: Cancel
>> - Arrived                    =    Thu Jun  2 06:03:11 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000009:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Status code                =    DONE
>> - Timestamp                  =    Thu Jun  2 06:03:11 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: Done
>> - Arrived                    =    Thu Jun  2 06:03:11 2011 CEST
>> - Exit code                  =    0
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Seqcode                    =
>> UI=000000:NS=0000000004:WM=000004:BH=0000000000:JSS=000002:LM=000010:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    LogMonitor
>> - Status code                =    CANCELLED
>> - Timestamp                  =    Thu Jun  2 06:03:11 2011 CEST
>> - User                       =    /DC=ch/DC=cern/OU=Organic
>> Units/OU=Users/CN=lwikstro/CN=627993/CN=Gustav
>> Wikstrom/CN=proxy/CN=proxy
>>       ---
>> Event: Clear
>> - Arrived                    =    Thu Jun  2 06:03:12 2011 CEST
>> - Host                       =    lcgwms03.gridpp.rl.ac.uk
>> - Level                      =    SYSTEM
>> - Priority                   =    synchronous
>> - Reason                     =    1
>> - Seqcode                    =
>> UI=000009:NS=0000096670:WM=000000:BH=0000000000:JSS=000000:LM=000000:LR
>> MS=000000:APP=000000:LBS=000000
>> - Source                     =    NetworkServer
>> - Src instance               =    22543
>> - Timestamp                  =    Thu Jun  2 06:03:12 2011 CEST
>> - User                       =
>> /C=UK/O=eScience/OU=CLRC/L=RAL/CN=lcgwms03.gridpp.rl.ac.uk/Email=tier1a
>> [log in to unmask]
>> =======================================================================
>> ===
>>
>>
>>
>> 2011/6/2 Stuart Purdie <[log in to unmask]>:
>> >
>> > On 2 Jun 2011, at 09:49, Gustav Wikström wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm having serious problems with running my VO t2k.org jobs,
>> currently
>> >> 95% of them are being cancelled by the WMSs
>> (lcgwms03.gridpp.rl.ac.uk
>> >> and wms02.grid.hep.ic.ac.uk) or the CEs. As I understand it, when a
>> >> WMS stops a job, it is labeled Aborted, and then Cancelled is when a
>> >> CE stops a job? The bad thing is that there is no information about
>> a
>> >> job after it has been stopped unless it failed.
>> >>
>> >> So, what could cause a job to be cancelled? Is memory usage one of
>> the reasons?
>> >
>> > Not the most likely culprit, as it's not the most strongly enforced
>> constriant across all sites, but it is possible.  It does have a bit of
>> a site dependance, so if the 5% that don't get cancelled end up on a
>> different site, that's useful data.  Job CPU use and Wall time are more
>> strongly enforced; but it could also be missing input files causing the
>> jobs to die on start up.
>> >
>> > If it's (apparently) randomly distributed across all sites, the first
>> thing I'd be checking is proxy lifespans, job queueing time and
>>  myproxy stuff (if used).
>> >
>> > There might be more information lurking around, which, if you've not
>> tried already, can be released with 'glite-wms-job-status --verbosity 3
>> <jid>', and 'glite-wms-job-logging-info --verbosity 3 <jid>'
>> >  which might give more idea on where to poke at next.  In particular,
>> the WMS (by default) will try re-submitting a failed job a couple of
>> times, and walking through that process might be informative. The
>> amount of time jobs spend running might also help identify the root
>> problem.
>> >
>> >
>> >
>