On Fri, 14 Nov 2008, Maarten Litmaath wrote:
> Hallo Torsten,
> first of all the problem is with _stageout_, not stagein:
>
>> Unable to copy file /var/spool/pbs/spool/449656.grid.OU to
>> [log in to unmask]:/home/atlas006/.lcgjm/globus-cache-export.z14835/batch.out
>> >>> error from copy
>> scp: /home/atlas006/.lcgjm/globus-cache-export.z14835/batch.out: No such
>> file or directory
>
> The file did not exist on the WN, so could not be copied to the CE.
The directory could be removed on the CE side too early.
In really this is my guess. We saw the same diagnostics too.
A main reason of my guess is, only one CE from our two suffer
from this problem and the machine is a bit old and slow in disk
access.
>
> Does this happen for multiple users? If so, anything in common?
>
> Might the problem come from a race condition in your epilogue script?
>
> Note that a job could accidentally remove its own stdout/stderr files...
>
--
Best regards,
Valery Mitsyn
|