Kostas Georgiou wrote:
> On Tue, Jul 26, 2005 at 10:07:36AM +0200, Jeff Templon wrote:
>
>
>>[root@tbn20 root]# strace -p 1493
>>Process 1493 attached - interrupt to quit
>
> ...
>
>>read(15, 0x80cd6d0, 4096) = -1 ESTALE (Stale NFS file handle)
>
> ...
>
>>Process 1493 detached
>>
>>my guess is that it is supposed to read something in a file, and that
>>file will tell it when the process should die, but the file is gone and
>>so the process does not know that it should have terminated itself.
>>
>>My guess: somehow the script/process manages to wait long enough between
>>reads that the job's home directory mount (autofs) 'expires' and gets
>>unmounted.
>
>
> Can you find which file it is trying to read from? ls -al /proc/1493/fd/15
> From the strace it seems that the file is open so the autofs mount shouldn't
> go away. I think what happens is that a cleanup script removed the file
> and you get the stale nfs handle messages.
Good point. Jeff, are there any cleanup jobs running on the NFS server?
Remember that Globus creates files whose modification times are set to
Jan 1, 1970, so if a cleanup job looks at the "mtime", it would wrongly
conclude that such files can be removed...
Aside: a cleanup job should look at the "ctime", not the "mtime".
|