Jean-Michel Barbet wrote: > I am using a script $GLITE_LOCAL_CUSTOMIZATION_DIR/cp_1.sh that > redirect to a local directory on WN named job-XXXXXXX (mktemp). > In these '/dlocal/job-XXXXXXX' I can have a mix of jobs using > a directory 'https...' coming from the job ID and jobs working > directly under the /dlocal/job-XXXXXXX directory : > > > ls -als /dlocal/job-* > /dlocal/job-gKlnk16866: > total 12 > 4 drwx------ 3 sgmali020 alicesgm 4096 Apr 20 10:57 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > 4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:57 > https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fetlK_5fBRPh5X5egTpONtqVg > > /dlocal/job-mkXrc16987: > total 52 > 4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:57 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > 4 drwxr-xr-x 2 sgmali020 alicesgm 4096 Apr 20 10:57 .alien > 4 drwxr-x--- 2 sgmali020 alicesgm 4096 Apr 20 11:13 alien-job-26025171 > 4 -rw-r--r-- 1 sgmali020 alicesgm 1001 Apr 20 10:57 .BrokerInfo > 4 -rw-r--r-- 1 sgmali020 alicesgm 834 Apr 20 10:57 dg-submit.7193.sh > 4 -rw------- 1 sgmali020 alicesgm 115 Apr 20 10:57 > https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fRhY-QMyTeOlCqm5P4UDiqw.output > 4 -rw-r----- 1 sgmali020 alicesgm 21 Apr 20 10:57 .root_hist > 12 -rw-r--r-- 1 sgmali020 alicesgm 11286 Apr 20 11:12 std.err > 8 -rw-r--r-- 1 sgmali020 alicesgm 6065 Apr 20 10:57 std.out > 0 -rw------- 1 sgmali020 alicesgm 0 Apr 20 10:57 tmp.imlUm17103 > > /dlocal/job-sDeMvt4461: > total 8 > 4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > > /dlocal/job-sUxTqQ4479: > total 8 > 4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:05 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > > /dlocal/job-TFtDbN4528: > total 8 > 4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > > /dlocal/job-wLSRy16215: > total 60 > 4 drwx------ 4 sgmali020 alicesgm 4096 Apr 20 10:56 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. > 4 drwxr-xr-x 2 sgmali020 alicesgm 4096 Apr 20 10:55 .alien > 4 drwxr-x--- 2 sgmali020 alicesgm 4096 Apr 20 11:12 alien-job-26025092 > 4 -rw-r--r-- 1 sgmali020 alicesgm 1001 Apr 20 10:55 .BrokerInfo > 4 -rw-r--r-- 1 sgmali020 alicesgm 834 Apr 20 10:55 dg-submit.7193.sh > 4 -rw------- 1 sgmali020 alicesgm 115 Apr 20 10:55 > https_3a_2f_2fgrid02.lal.in2p3.fr_3a9000_2fA9h1BAG8bLb22FYffnuamQ.output > 4 -rw-r----- 1 sgmali020 alicesgm 21 Apr 20 10:56 .root_hist > 20 -rw-r--r-- 1 sgmali020 alicesgm 18505 Apr 20 11:11 std.err > 8 -rw-r--r-- 1 sgmali020 alicesgm 6110 Apr 20 10:56 std.out > 0 -rw------- 1 sgmali020 alicesgm 0 Apr 20 10:55 tmp.jhuwl16280 > > /dlocal/job-yZoShU4459: > total 8 > 4 drwx------ 2 sgmali016 alicesgm 4096 Dec 17 16:17 . > 4 drwxrwxrwt 10 root root 4096 Apr 20 10:57 .. The WMS job wrapper always tries a mkdir and then cd into the directory, but will continue when either operation fails. Does /var/log/messages show any problems for the "/dlocal" file system? Are there any errors under ~sgmali020/.globus/job/*/*? In theory it is also possible that the user payload moved everything to the parent directory and deleted the original directory. > Some job-XXXX directories are empty because they are not removed > after the job ends. That could be done by a cron job. >> That is normal. The EDG_WL_JOBID is only set for jobs sent by RB or WMS >> nodes and directed to the batch system. The RB/WMS also sends >> "grid_monitor" >> jobs running on the lcg-CE itself, and requests to clean up jobs that >> have >> finished. > > > All jobs are supposed to come through a WMS. > You mean that the grid_monitor jobs have the variable EDG_WL_JOBID > empty ? Correct. > PS: I have another problem that I am about to describe here and > I am not sure if there can be a link between the two : > Some jobs have misbehaved and tried to remove files recursively > starting from /. I have evidence of this in undelivered PBS jobs. That looks like a user payload error. The WMS wrapper does the following when it ends: rm -rf "../${newdir}" Here ${newdir} looks like "https_3a_2f_2f.....". I hope $GLITE_LOCAL_CUSTOMIZATION_DIR/cp_1.sh does not redefine it?!