On Fri, 27 May 2005, Ian Fisk wrote:
> We are seeing the the lock corruption problem again. After stable running
> for 5 days we have two corrupted Grid-manager-monitor lock files. The
[...]
Hello Ian,
I've been trying to perform some remote analysis, to find the problem. I
see some unexpected entires in the file system. For instance, if you look
on your CE now you should be able to see a file:
/uscms_data/d1/grid_home/cms001/.lcgjm/jm-lcgcondor-submit.list.locked
with the following stat info:
Size: 0 Blocks: 0 IO Block: 4096 Regular File
Device: bh/11d Inode: 24525758 Links: 4
Access: (0644/-rw-r--r--) Uid: (12336/ cms001) Gid: ( 9617/ cms)
Access: 2005-05-27 11:04:36.000000000 -0500
Modify: 2005-05-27 11:04:36.000000000 -0500
Change: 2005-05-27 11:04:37.000000000 -0500
I didn't expect this file to have a link count of 4 - and indeed I cann't
find any other entry in the cms001 home directory with above inode number.
The locking that we're using for the JM files uses the link count, so if
there is something unexpected happening it could well cause problems. Do
you know why the above file has this link count?
Yours,
David
--
-------------------------------------------------------------------------
David Smith e-mail: [log in to unmask] tel: +41 22 76 74462
Address: D. Smith, CERN G06610, Bat 28 R-007, 1211 Geneva 23, Switzerland
-------------------------------------------------------------------------
|