Have not clue on this. Maybe you can try globus-url-copy from WN to CE
with your proxy to check if there is something between WN and CE.
Di
Alessandro Paolini wrote:
> Hi all,
> since our CE has been migrated to sl4 and gLite3.1 (WNs already migrated
> to gLite 3.1 since some months) there is a job submission problem: it
> seems that every job submitted never arrives on the CE (the SAM tests
> are failing for "proxy expired"), the command
> $ globus-job-run gridit-ce-001.cnaf.infn.it/jobmanager-lcgpbs -queue
> cert /bin/pwd
>
> returns nothing, but ssh from WNs to CE and qsub are working
>
> Do you have any hints? I'm also attaching the file gram_job_mgr_24157
> related to my last "globus-job-run" attempt
>
> Best Regards,
> ALessandro
>
>
> 4/30 08:04:15 JM: Security context imported
> 4/30 08:04:15 JM: Adding new callback contact
> (url=https://lcg-ui.cnaf.infn.it:20002/, mask=1048575)
> 4/30 08:04:15 JM: Added successfully
> 4/30 08:04:15 Pre-parsed RSL string: &("rsl_substitution" =
> ("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
> )("queue" = "cert" )
> 4/30 08:04:15
> <<<<<Job Request RSL
> &("rsl_substitution" = ("GLOBUSRUN_GASS_URL"
> "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
> )("queue" = "cert" )
> >>>>>Job Request RSL
> 4/30 08:04:15
> <<<<<Job Request RSL (canonical)
> &("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
> "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
> )("queue" = "cert" )
> >>>>>Job Request RSL (canonical)
> 4/30 08:04:15 JM: Evaluating RSL Value4/30 08:04:15 JM: Evaluated RSL
> Value to GLOBUSRUN_GASS_URL4/30 08:04:15 JM: Evaluating RSL Value4/30
> 08:04:15 JM: Evaluated RSL Value to
> https://lcg-ui.cnaf.infn.it:200014/30 08:04:15 Job Manager State Machine
> (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_MAKE_SCRATCHDIR
> 4/30 08:04:15
> <<<<<Job RSL
> &("environment" = ("HOME" "/home/infngrid014" ) ("LOGNAME" "infngrid014"
> ) )("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
> "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
> $("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
> )("queue" = "cert" )
> >>>>>Job RSL
> 4/30 08:04:15
> <<<<<Job RSL (post-eval)
> &("environment" = ("HOME" "/home/infngrid014" ) ("LOGNAME" "infngrid014"
> ) )("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
> "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
> "/bin/pwd" )("queue" = "cert" )
> >>>>>Job RSL (post-eval)
> Adding default RSL of proxy_timeout = 60
> Adding default RSL of dry_run = no
> Adding default RSL of gram_my_job = collective
> Adding default RSL of job_type = multiple
> Adding default RSL of count = 1
> Adding default RSL of stdin = /dev/null
> Adding default RSL of directory = $(HOME)
> 4/30 08:04:15
> <<<<<Job RSL (post-validation)
> &("directory" = $("HOME") )("stdin" = "/dev/null" )("count" = "1"
> )("job_type" = "multiple" )("gram_my_job" = "collective" )("dry_run" =
> "no" )("proxy_timeout" = "60" )("environment" = ("HOME"
> "/home/infngrid014" ) ("LOGNAME" "infngrid014" ) )("rslsubstitution" =
> ("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
> "/bin/pwd" )("queue" = "cert" )
> >>>>>Job RSL (post-validation)
> 4/30 08:04:15
> <<<<<Job RSL (post-validation-eval)
> &("directory" = "/home/infngrid014" )("stdin" = "/dev/null" )("count" =
> "1" )("job_type" = "multiple" )("gram_my_job" = "collective" )("dry_run"
> = "no" )("proxy_timeout" = "60" )("environment" = ("HOME"
> "/home/infngrid014" ) ("LOGNAME" "infngrid014" ) )("rslsubstitution" =
> ("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
> "https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
> "/bin/pwd" )("queue" = "cert" )
> >>>>>Job RSL (post-validation-eval)
> 4/30 08:04:15 JMI: Getting RSL output value
> 4/30 08:04:15 JMI: Processing output positions
> 4/30 08:04:15 JMI: Getting RSL output value
> 4/30 08:04:15 JMI: Processing output positions
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_REMOTE_IO_FILE_CREATE
> 4/30 08:04:15 JM: Opening output destinations
> 4/30 08:04:15 JM: stdout goes to
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout
>
> 4/30 08:04:15 JM: stderr goes to
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr
>
> 4/30 08:04:15 JM: Opening https://lcg-ui.cnaf.infn.it:20001/dev/stdout
> 4/30 08:04:15 JM: Opened GASS handle 1.
> 4/30 08:04:15 JM: exiting
> globus_l_gram_job_manager_output_destination_open()
> 4/30 08:04:15 JM: Opening https://lcg-ui.cnaf.infn.it:20001/dev/stderr
> 4/30 08:04:15 JM: Opened GASS handle 2.
> 4/30 08:04:15 JM: exiting
> globus_l_gram_job_manager_output_destination_open()
> 4/30 08:04:15 stdout or stderr is being used, starting to poll
> 4/30 08:04:15 JM: Finished opening output destinations
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_OPEN_OUTPUT
> 4/30 08:04:15 JM: GSSAPI type is GSI.. relocating proxy
> 4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:15 JMI: in globus_gram_job_manager_script_proxy_relocate()
> 4/30 08:04:15 JMI: cmd = proxy_relocate
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: proxy_relocate(enter)
> 4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_X509_USER_PROXY =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/x509_up
>
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_PROXY_RELOCATE
> 4/30 08:04:15 JM: Relocated Proxy to
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/x509_up
>
> 4/30 08:04:15 JM: before sending to client: rc=0 (Success)
> 4/30 08:04:15 Job Manager State Machine (exiting):
> GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE_COMMITTED
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_STAGE_IN
> 4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:15 JMI: in globus_gram_job_manager_submit()
> 4/30 08:04:15 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:15 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:15 JMI: cmd = submit
> 4/30 08:04:15 JMI: returning with success
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Entering Job Manager submit-helper
> implementation of rewrite_urls
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Leaving Job Manager submit-helper
> implementation of rewrite_urls
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Entering pbs submit
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Determining job max time cpu from
> job description
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: using queue default
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Determining job max wall time limit
> from job description
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: using queue default
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Leaving pbs submit
> 4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_ID =
> 1209535455:lcgpbs:internal_1592770278:24157.1209535455
> 4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
> 4/30 08:04:15 JM: in globus_gram_job_manager_reporting_file_create()
> 4/30 08:04:15 JM: not reporting job information
> 4/30 08:04:15 JM: in globus_gram_job_manager_history_file_create()
> 4/30 08:04:15 JM: NOT empty client callback list.
> 4/30 08:04:15 JM: sending callback of status 1 (failure code 0) to
> https://lcg-ui.cnaf.infn.it:20002/.
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:15 JMI: in globus_gram_job_manager_poll()
> 4/30 08:04:15 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:15 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:15 JMI: poll: seeking:
> https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
> 4/30 08:04:15 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 4/30 08:04:15 JMI: cmd = poll
> 4/30 08:04:15 JMI: returning with success
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Will start a batch system poll
> process in the background
> Wed Apr 30 08:04:15 2008 JM_SCRIPT: Cache too old for this job (55260):
> make_a_poll_query() returning 0
> 4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
> 4/30 08:04:15 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 4/30 08:04:25 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 4/30 08:04:25 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:25 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:25 JMI: in globus_gram_job_manager_poll()
> 4/30 08:04:25 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:25 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:25 JMI: poll: seeking:
> https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
> 4/30 08:04:25 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 4/30 08:04:25 JMI: cmd = poll
> 4/30 08:04:25 JMI: returning with success
> Wed Apr 30 08:04:25 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:25 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:26 2008 JM_SCRIPT: Cache too old for this job (11):
> make_a_poll_query() returning 0
> 4/30 08:04:26 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
> 4/30 08:04:26 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 4/30 08:04:36 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 4/30 08:04:36 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:36 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:36 JMI: in globus_gram_job_manager_poll()
> 4/30 08:04:36 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:36 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:36 JMI: poll: seeking:
> https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
> 4/30 08:04:36 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 4/30 08:04:36 JMI: cmd = poll
> 4/30 08:04:36 JMI: returning with success
> Wed Apr 30 08:04:36 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:36 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:36 2008 JM_SCRIPT: Cache too old for this job (21):
> make_a_poll_query() returning 0
> 4/30 08:04:36 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
> 4/30 08:04:36 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 4/30 08:04:46 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 4/30 08:04:46 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:46 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:46 JMI: in globus_gram_job_manager_poll()
> 4/30 08:04:46 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:46 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:46 JMI: poll: seeking:
> https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
> 4/30 08:04:46 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 4/30 08:04:46 JMI: cmd = poll
> 4/30 08:04:46 JMI: returning with success
> Wed Apr 30 08:04:46 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:46 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:46 2008 JM_SCRIPT: Cache too old for this job (31):
> make_a_poll_query() returning 0
> 4/30 08:04:46 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
> 4/30 08:04:46 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 4/30 08:04:56 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:56 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:56 JMI: in globus_gram_job_manager_poll()
> 4/30 08:04:56 JMI: local stdout filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
>
> 4/30 08:04:56 JMI: local stderr filename =
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
>
> 4/30 08:04:56 JMI: poll: seeking:
> https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
> 4/30 08:04:56 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 4/30 08:04:56 JMI: cmd = poll
> 4/30 08:04:56 JMI: returning with success
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: Cache too old for this job (41):
> make_a_poll_query() returning 0
> 4/30 08:04:56 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 4
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 4/30 08:04:56 JM: in globus_gram_job_manager_history_file_create()
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
> 4/30 08:04:56 closing destination
> https://lcg-ui.cnaf.infn.it:20001/dev/stdout
> 4/30 08:04:56 JM: exiting
> globus_l_gram_job_manager_output_destination_close()
> 4/30 08:04:56 closing destination
> https://lcg-ui.cnaf.infn.it:20001/dev/stderr
> 4/30 08:04:56 JM: exiting
> globus_l_gram_job_manager_output_destination_close()
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
> 4/30 08:04:56 JM: NOT empty client callback list.
> 4/30 08:04:56 JM: sending callback of status 4 (failure code 0) to
> https://lcg-ui.cnaf.infn.it:20002/.
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
> 4/30 08:04:56 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
> 4/30 08:04:56 JMI: testing job manager scripts for type lcgpbs exist and
> permissions are ok.
> 4/30 08:04:56 JMI: completed script validation: job manager type is lcgpbs.
> 4/30 08:04:56 JMI: cmd = cache_cleanup
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: New Perl JobManager created.
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
> Wed Apr 30 08:04:56 2008 JM_SCRIPT: Entering Job Manager submit-helper
> implementation of cache_cleanup
> 4/30 08:04:57 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CACHE_CLEAN_UP
> 4/30 08:04:57 JM: in globus_gram_job_manager_reporting_file_remove()
> 4/30 08:04:57 JM: exiting globus_gram_job_manager.
>
>
|