On 26 Sep 2011, at 14:29, Stephen Jones wrote:
> We have a separate TORQUE server and CREAMCE setup, where the TORQUE job
> log files reside on a shared NFS server. I've been playing around with
> the BLAH parsers a little bit. This is what I suspect:
>
> It makes no odds whether you use the old or new BLAH parser if you put
> the log files on an NFS share (so the CREAMCE can read them directly).
The 'old' parser reads log files. The 'new' parser asks the batch system (via qstat etc, as specialised by LRMS type).
> Whatever BLAH parser you use is irrelevant/unused, because CREAM reads
> its own state-tracking data from the shared log files. It would only
> resort to the BLAH parser if the NFS share got broken. Does anyone know
> the truth about this. I'm tempted to use no BLAH parser at all, to keep
> things simple.
There are 2 components here - the Java part that talks X509 and GSIFTP to the outside world, and the local part that does shell scripts and ssh to the batch system.
Although it's correct to call both CREAM, I usually refer to the Java part as CREAM, and the shell scripty bit as BLAH.
Whilst I've not checked explicitly, I'm don't believe that the Java part interacts with the batch system logs, or the batch system in any way, _except_ via asking Blah. Certainly, CREAM can't submit a job except via BLAH. I've just combed over my cream configs, and I can't find anywhere that tells where the logs are (we keep them in a not-100% standard location) - and lsof thinks that the only processes looking to this directory is BUpdaterPBS. Not can I find any trace of configuration on what the batch system _is_, except in the BLAH configuration.
The dependancy on BLAH is such that the java process will start up the BLAH parser if it's not running. I'm not sure how to avoid that, and I'm pretty sure that if you did try that then the jobs would (appear to) sit in a Pending state, as the Java part would be unable to determine the current state of the job. Note that because of this, the UID of the process looking at the log files / poking the pbs server will be 'tomcat', but that's inhereited from the tomcat process that launched them.
I've not walked through the source code to confirm, but because of the above, I don't think the Java part picks up job information.
|