Indeed the value of GLOBUS_TCP_PORT_RANGE is not correctly set on the
CE which is installed with SL3 (coma separated) , while the WNs are
installed with SL4.
I will make the change, and keep you in touch ^^
Le 22 nov. 07 à 14:59, Maarten Litmaath a écrit :
> Jean Salzemann wrote:
>
>> Dear all,
>> We've set up a site in Vietnam, and i've experienced some
>> behaviors i've never seen when submitting jobs. The jobs are
>> failing on the CE with a dreadful "Got a job held event, reason:
>> Unspecified gridmanager error", but i can't figure out why. qsub
>> submissions work, globus-job-run (/bin/hostname) seem to work with
>> fork (im not sure as for lcgpbs because the call prompts back
>> without any output), pbs acl seem correct. However in /var/log/
>> messages i have this, whenever the user is mapped on a local
>> account and the job supposed to be sent to pbs : *Nov 22 19:43:03
>> ce gridinfo: [10770-10924] Job
>> 1195735288:lcgpbs:internal_2961450261:10714.1195735287 FAILED
>> during submission to batch system lcgpbs*
>> But i have absolutely no idea of the possible causes for this.
>> Any idea ? :)
>
> On the CE you can rename /usr/bin/qsub to /usr/bin/qsub.real
> and put the following script in place of /usr/bin/qsub:
>
> ------------------------------------------------------------
> #!/bin/sh
>
> err=/tmp/qsub-`date +%Y%m%d_%H%M%S`-$$.err
>
> exec 9>&1
>
> status=`
> exec 8>&1
> (qsub.real "$@"; echo $? >&8) 2>&1 >&9 | tee $err >&2
> `
>
> exit $status
> ------------------------------------------------------------
>
> Also check the values of GLOBUS_TCP_PORT_RANGE on CE and WN.
> The syntax depends on the gLite version:
>
> gLite 3.0 (SL3): GLOBUS_TCP_PORT_RANGE='20000 25000'
> gLite 3.1 (SL4): GLOBUS_TCP_PORT_RANGE='20000,25000'
|