Hi all,
since our CE has been migrated to sl4 and gLite3.1 (WNs already migrated
to gLite 3.1 since some months) there is a job submission problem: it
seems that every job submitted never arrives on the CE (the SAM tests
are failing for "proxy expired"), the command
$ globus-job-run gridit-ce-001.cnaf.infn.it/jobmanager-lcgpbs -queue
cert /bin/pwd
returns nothing, but ssh from WNs to CE and qsub are working
Do you have any hints? I'm also attaching the file gram_job_mgr_24157
related to my last "globus-job-run" attempt
Best Regards,
ALessandro
4/30 08:04:15 JM: Security context imported
4/30 08:04:15 JM: Adding new callback contact
(url=https://lcg-ui.cnaf.infn.it:20002/, mask=1048575)
4/30 08:04:15 JM: Added successfully
4/30 08:04:15 Pre-parsed RSL string: &("rsl_substitution" =
("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
$("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
$("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
)("queue" = "cert" )
4/30 08:04:15
<<<<<Job Request RSL
&("rsl_substitution" = ("GLOBUSRUN_GASS_URL"
"https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
$("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
$("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
)("queue" = "cert" )
>>>>>Job Request RSL
4/30 08:04:15
<<<<<Job Request RSL (canonical)
&("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
"https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
$("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
$("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
)("queue" = "cert" )
>>>>>Job Request RSL (canonical)
4/30 08:04:15 JM: Evaluating RSL Value4/30 08:04:15 JM: Evaluated RSL
Value to GLOBUSRUN_GASS_URL4/30 08:04:15 JM: Evaluating RSL Value4/30
08:04:15 JM: Evaluated RSL Value to
https://lcg-ui.cnaf.infn.it:200014/30 08:04:15 Job Manager State Machine
(entering): GLOBUS_GRAM_JOB_MANAGER_STATE_MAKE_SCRATCHDIR
4/30 08:04:15
<<<<<Job RSL
&("environment" = ("HOME" "/home/infngrid014" ) ("LOGNAME" "infngrid014"
) )("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
"https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
$("GLOBUSRUN_GASS_URL") # "/dev/stderr" )("stdout" =
$("GLOBUSRUN_GASS_URL") # "/dev/stdout" )("executable" = "/bin/pwd"
)("queue" = "cert" )
>>>>>Job RSL
4/30 08:04:15
<<<<<Job RSL (post-eval)
&("environment" = ("HOME" "/home/infngrid014" ) ("LOGNAME" "infngrid014"
) )("rslsubstitution" = ("GLOBUSRUN_GASS_URL"
"https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
"/bin/pwd" )("queue" = "cert" )
>>>>>Job RSL (post-eval)
Adding default RSL of proxy_timeout = 60
Adding default RSL of dry_run = no
Adding default RSL of gram_my_job = collective
Adding default RSL of job_type = multiple
Adding default RSL of count = 1
Adding default RSL of stdin = /dev/null
Adding default RSL of directory = $(HOME)
4/30 08:04:15
<<<<<Job RSL (post-validation)
&("directory" = $("HOME") )("stdin" = "/dev/null" )("count" = "1"
)("job_type" = "multiple" )("gram_my_job" = "collective" )("dry_run" =
"no" )("proxy_timeout" = "60" )("environment" = ("HOME"
"/home/infngrid014" ) ("LOGNAME" "infngrid014" ) )("rslsubstitution" =
("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
"/bin/pwd" )("queue" = "cert" )
>>>>>Job RSL (post-validation)
4/30 08:04:15
<<<<<Job RSL (post-validation-eval)
&("directory" = "/home/infngrid014" )("stdin" = "/dev/null" )("count" =
"1" )("job_type" = "multiple" )("gram_my_job" = "collective" )("dry_run"
= "no" )("proxy_timeout" = "60" )("environment" = ("HOME"
"/home/infngrid014" ) ("LOGNAME" "infngrid014" ) )("rslsubstitution" =
("GLOBUSRUN_GASS_URL" "https://lcg-ui.cnaf.infn.it:20001" ) )("stderr" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stderr" )("stdout" =
"https://lcg-ui.cnaf.infn.it:20001/dev/stdout" )("executable" =
"/bin/pwd" )("queue" = "cert" )
>>>>>Job RSL (post-validation-eval)
4/30 08:04:15 JMI: Getting RSL output value
4/30 08:04:15 JMI: Processing output positions
4/30 08:04:15 JMI: Getting RSL output value
4/30 08:04:15 JMI: Processing output positions
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_REMOTE_IO_FILE_CREATE
4/30 08:04:15 JM: Opening output destinations
4/30 08:04:15 JM: stdout goes to
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout
4/30 08:04:15 JM: stderr goes to
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr
4/30 08:04:15 JM: Opening https://lcg-ui.cnaf.infn.it:20001/dev/stdout
4/30 08:04:15 JM: Opened GASS handle 1.
4/30 08:04:15 JM: exiting
globus_l_gram_job_manager_output_destination_open()
4/30 08:04:15 JM: Opening https://lcg-ui.cnaf.infn.it:20001/dev/stderr
4/30 08:04:15 JM: Opened GASS handle 2.
4/30 08:04:15 JM: exiting
globus_l_gram_job_manager_output_destination_open()
4/30 08:04:15 stdout or stderr is being used, starting to poll
4/30 08:04:15 JM: Finished opening output destinations
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_OPEN_OUTPUT
4/30 08:04:15 JM: GSSAPI type is GSI.. relocating proxy
4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:15 JMI: in globus_gram_job_manager_script_proxy_relocate()
4/30 08:04:15 JMI: cmd = proxy_relocate
Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:15 2008 JM_SCRIPT: proxy_relocate(enter)
4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_X509_USER_PROXY =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/x509_up
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_PROXY_RELOCATE
4/30 08:04:15 JM: Relocated Proxy to
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/x509_up
4/30 08:04:15 JM: before sending to client: rc=0 (Success)
4/30 08:04:15 Job Manager State Machine (exiting):
GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE_COMMITTED
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_STAGE_IN
4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:15 JMI: in globus_gram_job_manager_submit()
4/30 08:04:15 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:15 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:15 JMI: cmd = submit
4/30 08:04:15 JMI: returning with success
Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Entering Job Manager submit-helper
implementation of rewrite_urls
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Leaving Job Manager submit-helper
implementation of rewrite_urls
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Entering pbs submit
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Determining job max time cpu from
job description
Wed Apr 30 08:04:15 2008 JM_SCRIPT: using queue default
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Determining job max wall time limit
from job description
Wed Apr 30 08:04:15 2008 JM_SCRIPT: using queue default
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Leaving pbs submit
4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_ID =
1209535455:lcgpbs:internal_1592770278:24157.1209535455
4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
4/30 08:04:15 JM: in globus_gram_job_manager_reporting_file_create()
4/30 08:04:15 JM: not reporting job information
4/30 08:04:15 JM: in globus_gram_job_manager_history_file_create()
4/30 08:04:15 JM: NOT empty client callback list.
4/30 08:04:15 JM: sending callback of status 1 (failure code 0) to
https://lcg-ui.cnaf.infn.it:20002/.
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
4/30 08:04:15 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:15 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:15 JMI: in globus_gram_job_manager_poll()
4/30 08:04:15 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:15 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:15 JMI: poll: seeking:
https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
4/30 08:04:15 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
scripts)
4/30 08:04:15 JMI: cmd = poll
4/30 08:04:15 JMI: returning with success
Wed Apr 30 08:04:15 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Will start a batch system poll
process in the background
Wed Apr 30 08:04:15 2008 JM_SCRIPT: Cache too old for this job (55260):
make_a_poll_query() returning 0
4/30 08:04:15 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
4/30 08:04:15 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
4/30 08:04:25 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
4/30 08:04:25 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:25 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:25 JMI: in globus_gram_job_manager_poll()
4/30 08:04:25 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:25 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:25 JMI: poll: seeking:
https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
4/30 08:04:25 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
scripts)
4/30 08:04:25 JMI: cmd = poll
4/30 08:04:25 JMI: returning with success
Wed Apr 30 08:04:25 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:25 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:26 2008 JM_SCRIPT: Cache too old for this job (11):
make_a_poll_query() returning 0
4/30 08:04:26 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
4/30 08:04:26 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
4/30 08:04:36 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
4/30 08:04:36 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:36 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:36 JMI: in globus_gram_job_manager_poll()
4/30 08:04:36 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:36 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:36 JMI: poll: seeking:
https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
4/30 08:04:36 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
scripts)
4/30 08:04:36 JMI: cmd = poll
4/30 08:04:36 JMI: returning with success
Wed Apr 30 08:04:36 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:36 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:36 2008 JM_SCRIPT: Cache too old for this job (21):
make_a_poll_query() returning 0
4/30 08:04:36 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
4/30 08:04:36 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
4/30 08:04:46 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
4/30 08:04:46 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:46 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:46 JMI: in globus_gram_job_manager_poll()
4/30 08:04:46 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:46 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:46 JMI: poll: seeking:
https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
4/30 08:04:46 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
scripts)
4/30 08:04:46 JMI: cmd = poll
4/30 08:04:46 JMI: returning with success
Wed Apr 30 08:04:46 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:46 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:46 2008 JM_SCRIPT: Cache too old for this job (31):
make_a_poll_query() returning 0
4/30 08:04:46 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 1
4/30 08:04:46 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
4/30 08:04:56 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:56 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:56 JMI: in globus_gram_job_manager_poll()
4/30 08:04:56 JMI: local stdout filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stdout.
4/30 08:04:56 JMI: local stderr filename =
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455/stderr.
4/30 08:04:56 JMI: poll: seeking:
https://gridit-ce-001.cnaf.infn.it:20001/24157/1209535455/
4/30 08:04:56 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
scripts)
4/30 08:04:56 JMI: cmd = poll
4/30 08:04:56 JMI: returning with success
Wed Apr 30 08:04:56 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:56 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:56 2008 JM_SCRIPT: Cache too old for this job (41):
make_a_poll_query() returning 0
4/30 08:04:56 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 4
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
4/30 08:04:56 JM: in globus_gram_job_manager_history_file_create()
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
4/30 08:04:56 closing destination
https://lcg-ui.cnaf.infn.it:20001/dev/stdout
4/30 08:04:56 JM: exiting
globus_l_gram_job_manager_output_destination_close()
4/30 08:04:56 closing destination
https://lcg-ui.cnaf.infn.it:20001/dev/stderr
4/30 08:04:56 JM: exiting
globus_l_gram_job_manager_output_destination_close()
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
4/30 08:04:56 JM: NOT empty client callback list.
4/30 08:04:56 JM: sending callback of status 4 (failure code 0) to
https://lcg-ui.cnaf.infn.it:20002/.
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
4/30 08:04:56 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
4/30 08:04:56 JMI: testing job manager scripts for type lcgpbs exist and
permissions are ok.
4/30 08:04:56 JMI: completed script validation: job manager type is lcgpbs.
4/30 08:04:56 JMI: cmd = cache_cleanup
Wed Apr 30 08:04:56 2008 JM_SCRIPT: New Perl JobManager created.
Wed Apr 30 08:04:56 2008 JM_SCRIPT: Using jm supplied job dir:
/home/infngrid014/.globus/job/gridit-ce-001.cnaf.infn.it/24157.1209535455
Wed Apr 30 08:04:56 2008 JM_SCRIPT: Entering Job Manager submit-helper
implementation of cache_cleanup
4/30 08:04:57 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CACHE_CLEAN_UP
4/30 08:04:57 JM: in globus_gram_job_manager_reporting_file_remove()
4/30 08:04:57 JM: exiting globus_gram_job_manager.
--
Dr. Alessandro Paolini
INFN - CNAF
Viale Berti Pichat 6/2
40127 Bologna
Italy
tel: +39 051 6092723
fax: +39 051 6092746
ICQ: 192172027
skype: alex.paolini
**********************
"credo nel potere del riso e delle lacrime"
"come antidoto all'odio ed al terrore"
"un giorno senza un sorriso"
"è un giorno perso" >>> Charlie Chaplin
|