Maarten Litmaath wrote:
> Bonjour Jean-Michel,
>
>> I noticed recently that the job's working directory is not always a
>> new directory named after the job id. I am not sure if it is related
>
> Do you have examples?
I am using a script $GLITE_LOCAL_CUSTOMIZATION_DIR/cp_1.sh that
redirect to a local directory on WN named job-XXXXXXX (mktemp).
In these '/dlocal/job-XXXXXXX' I can have a mix of jobs using
a directory 'https...' coming from the job ID and jobs working
directly under the /dlocal/job-XXXXXXX directory :
ls -als /dlocal/job-*
/dlocal/job-gKlnk16866:
total 12
4 drwx------ 3 sgmali020 alicesgm 4096 Apr 20 10:57 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:57
https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fetlK_5fBRPh5X5egTpONtqVg
/dlocal/job-mkXrc16987:
total 52
4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:57 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
4 drwxr-xr-x 2 sgmali020 alicesgm 4096 Apr 20 10:57 .alien
4 drwxr-x--- 2 sgmali020 alicesgm 4096 Apr 20 11:13 alien-job-26025171
4 -rw-r--r-- 1 sgmali020 alicesgm 1001 Apr 20 10:57 .BrokerInfo
4 -rw-r--r-- 1 sgmali020 alicesgm 834 Apr 20 10:57 dg-submit.7193.sh
4 -rw------- 1 sgmali020 alicesgm 115 Apr 20 10:57
https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fRhY-QMyTeOlCqm5P4UDiqw.output
4 -rw-r----- 1 sgmali020 alicesgm 21 Apr 20 10:57 .root_hist
12 -rw-r--r-- 1 sgmali020 alicesgm 11286 Apr 20 11:12 std.err
8 -rw-r--r-- 1 sgmali020 alicesgm 6065 Apr 20 10:57 std.out
0 -rw------- 1 sgmali020 alicesgm 0 Apr 20 10:57 tmp.imlUm17103
/dlocal/job-sDeMvt4461:
total 8
4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
/dlocal/job-sUxTqQ4479:
total 8
4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:05 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
/dlocal/job-TFtDbN4528:
total 8
4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
/dlocal/job-wLSRy16215:
total 60
4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:56 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
4 drwxr-xr-x 2 sgmali020 alicesgm 4096 Apr 20 10:55 .alien
4 drwxr-x--- 2 sgmali020 alicesgm 4096 Apr 20 11:12 alien-job-26025092
4 -rw-r--r-- 1 sgmali020 alicesgm 1001 Apr 20 10:55 .BrokerInfo
4 -rw-r--r-- 1 sgmali020 alicesgm 834 Apr 20 10:55 dg-submit.7193.sh
4 -rw------- 1 sgmali020 alicesgm 115 Apr 20 10:55
https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fA9h1BAG8bLb22FYffnuamQ.output
4 -rw-r----- 1 sgmali020 alicesgm 21 Apr 20 10:56 .root_hist
20 -rw-r--r-- 1 sgmali020 alicesgm 18505 Apr 20 11:11 std.err
8 -rw-r--r-- 1 sgmali020 alicesgm 6110 Apr 20 10:56 std.out
0 -rw------- 1 sgmali020 alicesgm 0 Apr 20 10:55 tmp.jhuwl16280
/dlocal/job-yZoShU4459:
total 8
4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 .
4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 ..
Some job-XXXX directories are empty because they are not removed
after the job ends.
> That is normal. The EDG_WL_JOBID is only set for jobs sent by RB or WMS
> nodes and directed to the batch system. The RB/WMS also sends
> "grid_monitor"
> jobs running on the lcg-CE itself, and requests to clean up jobs that have
> finished.
All jobs are supposed to come through a WMS.
You mean that the grid_monitor jobs have the variable EDG_WL_JOBID
empty ?
JM
PS: I have another problem that I am about to describe here and
I am not sure if there can be a link between the two :
Some jobs have misbehaved and tried to remove files recursively
starting from /. I have evidence of this in undelivered PBS jobs.
--
------------------------------------------------------------------------
Jean-michel BARBET | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: [log in to unmask]
------------------------------------------------------------------------
|