On Thu, 25 Aug 2005, Antun Balaz wrote:
> Hi,
> if I want to check if jobs arriving at my site are succesfully completed, is
> it enough just to look at the appropriate records in log files
> in /var/spool/pbs/server_priv/accounting (like the one given below), or some
> further cheks are possible/needed (i.e. other then looking for Exit_status=0
> in the record below)?
On the site itself you can only find evidence for jobs that clearly failed.
If the batch system says the job succeeded, it means the batch system wrapper
successfully executed the WP1 job wrapper, while the latter may have failed
to execute the user payload. Only the user (or the RB sysadmin) can see if
a particular user payload was correctly executed. (Note: the exit status of
the payload is irrelevant - the middleware only needs to deliver the payload.)
> 08/25/2005 17:24:43;E;2884.ce.phy.bg.ac.yu;user=atlas001 group=atlas
> jobname=STDIN queue=atlas ctime=1124878252 qtime=1124878252 etime=1124878
> 252 start=1124878252 exec_host=wn02.phy.bg.ac.yu/0
> Resource_List.cput=200:00:00 Resource_List.neednodes=1
> Resource_List.nodect=1 Resource_List.nodes=1
> Resource_List.walltime=200:00:00 session=6793 end=1124983483 Exit_status=0
> resources_used.cput=14:31:42 resources_used.mem=74380kb
> resources_used.vmem=199464kb resources_used.walltime=29:13:51
>
|