Hi,
We are experiencing an interesting problem with our RB (recently updated to
LCG-2.3.1), regarding the permissions of the sandboxdirs and the
user/pool accounts mapping.
When a user submits a job through the RB, he is mapped to a pool account
(for example, dteam004) and the sandbox of his job is copied into
/var/edgwl/SandboxDir, in a directory named like:
/var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA
which has permissions like these:
[root@rb root]# ls -ld /var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA
drwxrwx--- 4 dteam004 edguser 4096 Mar 3 06:27 /var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA
[root@rb root]# ls -lR /var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA
/var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA:
total 20
-rw-rw---- 1 dteam004 edguser 20 Mar 3 06:27 Maradona.output
drwxrwx--- 2 dteam004 edguser 4096 Mar 3 04:40 input
drwxrwx--- 2 dteam004 edguser 4096 Mar 3 06:27 output
-rw------- 1 edguser edguser 4907 Mar 3 04:40 user.proxy
/var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA/input:
total 4
-rw-rw---- 1 dteam004 edguser 1085 Mar 3 04:40 sleep
/var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEjZVDTMwx5rQT5FgFCIvDA/output:
total 12
-rw-rw---- 1 dteam004 edguser 61 Mar 3 06:27 sleep.err
-rw-rw---- 1 dteam004 edguser 5289 Mar 3 06:27 sleep.out
However, YAIM has setup a cronjob that expires the pool account mappings:
[root@rb root]# cat /var/spool/cron/root | grep expire
0 5 * * * /opt/edg/sbin/lcg-expiregridmapdir.pl -v 1>>/var/log/lcg-expiregridmapdir.log 2>&1
which I think deletes a mapping if the user has remained inactive for
more than 48 (?) hours.
So, in case someone submits a job, and then remains inactive for a
number of days, it is very possible that when he comes back to retrieve
the output of his job, he will be mapped to a *different* pool account
than the one he was mapped to when he submitted his job. As a result of
this, he will be unable to retrieve the output of his job, due to the
permissions of the sandbox directory, and and he will get cryptic error messages
like the following:
[gef@ui01 gef]$ edg-job-get-output https://rb.isabella.grnet.gr:9000/Ej6PCGthGeXYOC9Lqa0ztQ
Retrieving files from host: rb.isabella.grnet.gr ( for
https://rb.isabella.grnet.gr:9000/Ej6PCGthGeXYOC9Lqa0ztQ )
error: the server sent an error response: 550 550
/var/edgwl/SandboxDir/Ej/https_3a_2f_2frb.isabella.grnet.gr_3a9000_2fEj6PCGthGeXYOC9Lqa0ztQ/output/sleep.out:
not a plain file.
**** Error: NS_FILE_RETRIEVAL_FAIL ****
Unable to retrieve any OutputSandbox files for job:
"https://rb.isabella.grnet.gr:9000/Ej6PCGthGeXYOC9Lqa0ztQ"
In the above case, the user finally retrieved successfully the output of his
job(s), when we mapped him temporarily back to his original account (which
happened to be free) by meddling a bit in /etc/grid-security/gridmapdir...
Now the question: Is there something wrong with our RB or is this the way things are
supposed to work, e.g. a user is expected to retrieve the output of his
job within a limited period of time?
Cheers,
--
Kyriakos Ginis, PhD Candidate
Software Engineering Laboratory
National Technical University of Athens
|