Hello
I got some problem at site RO-14-ITIM.
In Nagios I get the site as if it would not have any functional system.
https://ngi-ro-nagios.ici.ro/nagios/cgi-bin/extinfo.cgi?type=2&host=ecream.itim-cj.ro&service=emi.cream.CREAMCE-JobState-ops
The logs on the worknode says:
03/07/2014 10:35:05;0001; pbs_mom;Job;TMomFinalizeJob3;job
487331.ecream.itim-cj.ro started, pid = 1890
03/07/2014 10:35:22;0080;
pbs_mom;Job;487331.ecream.itim-cj.ro;scan_for_terminated: job
487331.ecream.itim-cj.ro task 1 terminated, sid=1890
03/07/2014 10:35:22;0008; pbs_mom;Job;487331.ecream.itim-cj.ro;job was
terminated
03/07/2014 10:35:22;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
03/07/2014 10:35:22;0080;
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top
of while loop
03/07/2014 10:35:22;0080; pbs_mom;Svr;preobit_reply;in while loop, no
error from job stat
03/07/2014 10:35:22;0080; pbs_mom;Job;487331.ecream.itim-cj.ro;obit sent
to server
03/07/2014 10:35:23;0080; pbs_mom;Job;487331.ecream.itim-cj.ro;removed
job script
03/07/2014 10:36:39;0002; pbs_mom;Svr;pbs_mom;Torque Mom Version =
2.5.7, loglevel = 0
and on cream I do have: (torque/server_logs)
03/07/2014 10:51:56;0008;PBS_Server;Job;487356.ecream.itim-cj.ro;Job
Queued at request of [log in to unmask], owner =
[log in to unmask], job name = cream_428782624, queue = ops
03/07/2014
10:52:21;0100;PBS_Server;Job;487356.ecream.itim-cj.ro;dequeuing from
ops, state COMPLETE
03/07/2014
10:55:04;0100;PBS_Server;Job;487364.ecream.itim-cj.ro;enqueuing into
ops, state 1 hop 1
03/07/2014 10:55:04;0008;PBS_Server;Job;487364.ecream.itim-cj.ro;Job
Queued at request of [log in to unmask], owner =
[log in to unmask], job name = cream_260075834, queue = ops
03/07/2014
10:55:23;0100;PBS_Server;Job;487364.ecream.itim-cj.ro;dequeuing from
ops, state COMPLETE
03/07/2014
10:57:34;0100;PBS_Server;Job;487365.ecream.itim-cj.ro;enqueuing into
ops, state 1 hop 1
03/07/2014 10:57:34;0008;PBS_Server;Job;487365.ecream.itim-cj.ro;Job
Queued at request of [log in to unmask], owner =
[log in to unmask], job name = cream_704898335, queue = ops
03/07/2014
10:57:53;0100;PBS_Server;Job;487365.ecream.itim-cj.ro;dequeuing from
ops, state COMPLETE
03/07/2014
10:58:13;0100;PBS_Server;Job;487366.ecream.itim-cj.ro;enqueuing into
ops, state 1 hop 1
03/07/2014 10:58:13;0008;PBS_Server;Job;487366.ecream.itim-cj.ro;Job
Queued at request of [log in to unmask], owner =
[log in to unmask], job name = cream_813310149, queue = ops
03/07/2014
10:58:38;0100;PBS_Server;Job;487366.ecream.itim-cj.ro;dequeuing from
ops, state COMPLETE
On SE in srmv2.2 I get: (log-20140307)
03/07 03:17:33.282 20258,0 Ls: SRM98 - Ls
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops
03/07 03:17:33.752 20258,0 Ls: SRM98 - Ls
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops
03/07 03:17:34.169 20258,0 PrepareToPut: SRM98 - PrepareToPut 0
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
and (log)
03/07 10:15:27.390 20258,0 Ls: SRM98 - Ls
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops
03/07 10:15:27.904 20258,0 Ls: SRM98 - Ls
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops
03/07 10:15:28.342 20258,0 PrepareToPut: SRM98 - PrepareToPut 0
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops/testfile-put-1394180127-d16d7fc0ebfa.txt
On SE I have just this in dpm:
03/07 02:17:37.486 20081,3 dpm_srv_rm: DP098 - rm 0
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops/testfile-put-1394151452-8e96e8562c2a.txt
03/07 03:17:34.441 20081,24 dpm_srv_proc_put: TURL info: gsiftp
cn-se1.itim-cj.ro
cn-se1.itim-cj.ro:/st06/ops/2014-03-07/testfile-put-1394155052-cad17cf56ad7.txt.5604986.0
03/07 03:17:37.329 20081,3 dpm_srv_rm: DP098 - rm 0
srm://cn-se1.itim-cj.ro:8446/srm/managerv2?SFN=/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
on SE in dpns I have:
03/07 03:17:33.323 19867,0 Cns_srv_statg: NS098 - statg
/dpm/itim-cj.ro/home/ops
03/07 03:17:33.794 19867,0 Cns_srv_statg: NS098 - statg
/dpm/itim-cj.ro/home/ops
03/07 03:17:34.340 19867,0 Cns_srv_stat: NS098 - stat 0
/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
03/07 03:17:34.394 19867,0 Cns_srv_creat: NS098 - creat
/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt 664 0
03/07 03:17:34.439 19867,0 Cns_srv_addreplica: NS098 - addreplica
cn-se1.itim-cj.ro
cn-se1.itim-cj.ro:/st06/ops/2014-03-07/testfile-put-1394155052-cad17cf56ad7.txt.5604986.0
03/07 03:17:36.949 19867,0 Cns_srv_statr: NS098 - statr
cn-se1.itim-cj.ro:/st06/ops/2014-03-07/testfile-put-1394155052-cad17cf56ad7.txt.5604986.0
03/07 03:17:36.991 19867,0 Cns_srv_delreplica: NS098 - delreplica
cn-se1.itim-cj.ro:/st06/ops/2014-03-07/testfile-put-1394155052-cad17cf56ad7.txt.5604986.0
03/07 03:17:36.994 19867,0 Cns_srv_getreplicax: NS098 - getreplicax
/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
03/07 03:17:36.995 19867,0 Cns_srv_unlink: NS098 - unlink
/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
03/07 03:17:37.370 19867,0 Cns_srv_delete: NS098 - delete
/dpm/itim-cj.ro/home/ops/testfile-put-1394155052-cad17cf56ad7.txt
In dashboard the site is ok but no availability or reliability is present.
Somewhere is a miss but I do not know where.
Thank you for any advice
Felix
--
Dr. Ing. Farcas Felix
National Institute of Research and Development
of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
yahoo id: felixfarcas
skype id: felix.farcas
mobile: +40-742-195323
|