Hello all,
we had a failed job on our CREAM CE reported by NGI nagios this early morning. After that
another jobs were run successfully.
Is there a way how to get the id of the failed job?
I tried various sources but without success:
- nagios log reports only "CRITICAL: Job was aborted."
- myegi running on the same node as nagios does not keep history of CREAM problems (only lcg-CE nodes have history there)
- the torque server logs do not show any problem with jobs submitted from the CREAM CE and run by the specific user
- there are some jobs in CREAM sql database (tables job and job_status) with non NULL exitCode but without further description
Thank you for any help,
--
Tomas Kouba
Institute of Physics, Academy of sciences of the Czech Republic
|