Hi!
The good news is, SGE is up and running, completing jobs for FSL. However, there are still some minor issues.
Inserting your email address in fsl_sub AND changing MailOpts to 'ea' for instance (to get mails when jobs are finished OR aborted) results in mail delivery errors, since FSL tries to send mails to [log in to unmask], a nonexistent address. On the other hand, when the last job is done, you get the mail to the specified address anyhow. So mail delivery works and doesn't work simultaneously! I checked it everywhere in the SGE config files and everything seems fine there.
There are still some warnings and errors in the SGE message files. The host tells me just before starting up:
11/21/2007 01:52:15|execd|amd64|E|commlib error: can't connect to service (Network is unreachable)
11/21/2007 01:52:17|execd|amd64|E|getting configuration: unable to contact qmaster using port 6444 on host "amd64"
11/21/2007 01:52:20|execd|amd64|E|can't get configuration from qmaster -- backgrounding
11/21/2007 01:52:21|execd|amd64|I|starting up GE 6.1u2 (lx24-amd64)
11/21/2007 02:21:23|execd|amd64|W|reaping job "65" ptf complains: Job does not exist
I guess the reaping warning is not a problem.
The qmaster log:
11/21/2007 01:52:21|qmaster|amd64|I|starting up GE 6.1u2 (lx24-amd64)
11/21/2007 02:21:24|qmaster|amd64|E|commlib error: got read error (closing "amd64/qstat/5")
The commlib error appears from time to time, sometimes /qstat/2, sometimes /qstat/5. Any idea, what that means?
Thanks
Georg
|