Hi all,
We have some (about 10000) jobs lost from 21th and 22th of Janury 2010.
Those jobs are in torque logs but not in ce's grid-jobmap_ file.
For example, one:
[account@account accounting]$ grep 8173665.pbs02.pic.es 20100122
01/22/2010 11:28:53;Q;8173665.pbs02.pic.es;queue=glong_sl5
01/22/2010 11:31:30;S;8173665.pbs02.pic.es;user=atprd001 group=atprd jobname=STDIN queue=glong_sl5 ctime=1264156133 qtime=1264156133 etime=1264156133 start=1264156290 [log in to unmask] exec_host=td128.pic.es/2 Resource_List.cput=48:00:00 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=72:00:00
01/22/2010 11:33:19;E;8173665.pbs02.pic.es;user=atprd001 group=atprd jobname=STDIN queue=glong_sl5 ctime=1264156133 qtime=1264156133 etime=1264156133 start=1264156290 [log in to unmask] exec_host=td128.pic.es/2 Resource_List.cput=48:00:00 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=72:00:00 session=29923 end=1264156399 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=23848kb resources_used.vmem=650104kb resources_used.walltime=00:03:07
[account@account accounting]$ grep 8173665.pbs02.pic.es ce05/grid-jobmap_2010012*
[account@account accounting]$
*account is a rsync server where why sync torque/ce's logs.
I first thought those jobs were submitted locally from ce05, but after
looking at old ce messsages:
[ ce05]# ls -lsa messages
49650 -rw-r-----+ 1 root logs 50778035 Jan 23 00:00 messages
[root@coresrv07 ce05]# grep 8173665 messages
Jan 22 11:28:54 ce05 gridinfo: [12645-19448] Submitted job 1264155788:lcgpbs:internal_113977276:24360.1264155783 to batch system lcgpbs with ID 8173665.pbs02.pic.es
Jan 22 11:54:56 ce05 gridinfo: [12645-28030] Job 1264155788:lcgpbs:internal_113977276:24360.1264155783 (ID 8173665.pbs02.pic.es) has finished
[ ce05]# grep 24360.1264155783 messages
Jan 22 11:23:08 ce05 gridinfo[24360]: JMA 2010/01/22 11:23:08 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 has GRAM_SCRIPT_JOB_ID 1264155788:lcgpbs:internal_113977276:24360.1264155783 manager type lcgpbs
Jan 22 11:28:54 ce05 gridinfo: [12645-19448] Submitted job 1264155788:lcgpbs:internal_113977276:24360.1264155783 to batch system lcgpbs with ID 8173665.pbs02.pic.es
Jan 22 11:54:56 ce05 gridinfo: [12645-28030] Job 1264155788:lcgpbs:internal_113977276:24360.1264155783 (ID 8173665.pbs02.pic.es) has finished
[ ce05]# grep 0000024347.0000000000 messages
Jan 22 11:23:03 ce05 GRAM gatekeeper[24347]: JMA 2010/01/22 11:23:03 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 has EDG_WL_JOBID ''
Jan 22 11:23:08 ce05 gridinfo[24360]: JMA 2010/01/22 11:23:08 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 for /DC=es/DC=irisgrid/O=pic/CN=xavier-espinal on 193.109.175.150
Jan 22 11:23:08 ce05 gridinfo[24360]: JMA 2010/01/22 11:23:08 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 mapped to atprd001 (42001, 50045)
Jan 22 11:23:08 ce05 gridinfo[24360]: JMA 2010/01/22 11:23:08 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 has GRAM_SCRIPT_JOB_ID 1264155788:lcgpbs:internal_113977276:24360.1264155783 manager type lcgpbs
Jan 22 11:23:09 ce05 gridinfo[24360]: JMA 2010/01/22 11:23:09 GATEKEEPER_JM_ID 2010-01-22.11:23:02.0000024347.0000000000 JM exiting
Jan 22 15:35:28 ce05 GRAM gatekeeper[24347]: JMA 2010/01/22 15:35:27 GATEKEEPER_JM_ID 2010-01-22.15:35:27.0000024347.0000000000 has EDG_WL_JOBID 'https://lb006.cnaf.infn.it:9000/z76UjV-DCbIR6J_bKLhTWg'
Jan 22 15:35:33 ce05 gridinfo[24405]: JMA 2010/01/22 15:35:33 GATEKEEPER_JM_ID 2010-01-22.15:35:27.0000024347.0000000000 for /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atsygano/CN=647936/CN=Andrey Tsyganov on 131.154.101.5
Jan 22 15:35:33 ce05 gridinfo[24405]: JMA 2010/01/22 15:35:33 GATEKEEPER_JM_ID 2010-01-22.15:35:27.0000024347.0000000000 mapped to cmprd008 (24008, 50051)
Jan 22 15:35:33 ce05 gridinfo[24405]: JMA 2010/01/22 15:35:33 GATEKEEPER_JM_ID 2010-01-22.15:35:27.0000024347.0000000000 has GRAM_SCRIPT_JOB_ID 1264170933:lcgpbs:internal_2518430373:24405.1264170927 manager type lcgpbs
Jan 22 15:35:37 ce05 gridinfo[24405]: JMA 2010/01/22 15:35:37 GATEKEEPER_JM_ID 2010-01-22.15:35:27.0000024347.0000000000 JM exiting
*this messages file is stored in remote syslog server.
we see that this is a grid job and should be taken in account for apel
accounting.
So, I'm wondering why those jobs are not logged in grid-jobmap and how
may I add them again (if possible).
TIA,
Arnau
|