Recently, I installed SGE 6.2u2 cluster on 3 Mac Pro workstations. The OS is
10.5.6. Each machine has two network ports. I setup the host_aliase, NFS
share points for the data & SGE_ROOT folders and launch demons. I tested the
bedpostx on it. Everything is working fine. No error message from qmaster.
The job was finished. But I found the exec nodes have similar messages like
this:
03/11/2009 12:10:43| main|ccdhpc01|E|removing unreferenced job 42.43 without
job report from ptf
03/11/2009 12:11:43| main|ccdhpc01|E|removing unreferenced job 42.66 without
job report from ptf
03/11/2009 12:12:38| main|ccdhpc01|W|reaping job "42" ptf complains: Job
does not exist
03/11/2009 12:13:20| main|ccdhpc01|W|reaping job "42" ptf complains: Job
does not exist
I knew the 43 and 66 actually corresponding to the slice numbers in the DTI
images. I ran the bedpostx again and found these problem slices are random.
It seems the issue is not related to bedposting. So the question is: is the
job done actually? How to test the result data is reliable? If the data is
reliable means I can ignore these messages. The further question is: did
anybody encounter this issue? I appreciate if anyone can give me the way to
get rid of these messages. Thanks.
Wayne
__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3933 (20090313) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
|