As I already replied to the relevant ticket:
- no: globus-gma is not used at all in the CREAM CE
- If the job is in runnning wrt wms/lb while the job finished, the first
thing to check is what is the status of job wrt CREAM, to understand if
the issue is in the cream side or id the wms (ice) side
For this purpose please check the glite-ce-cream.log* files or refer to
the glite-ce-job-status command
Cheers, Massimo
On Mon, 6 Sep 2010, Alessandro Paolini wrote:
> Hi,
> on some CEs (lcg or cream) it happens that jobs belonging to just one user
> never ends, for example:
>
> https://mon-it.cnaf.infn.it/myegee/wizardstatushistory/index?datasource=statushistory&start_type=yesterday&start_date=02%2F04%2F2009&end_type=now&end_date=02%2F11%2F2009&site=on&site_527=on&profilesource=3&service=on&service_1=on#detail%3Fresource_id%3D475%26profile_id%3D3%26service_id%3D1%26time%3D1283752909
>
> in the logging-info the job results finished with exitcode=0, but it seems
> that this information doesn't reach the WMS
>
> ---
> Event: Done
> - Arrived = Mon Sep 6 00:04:40 2010 CEST
> - Exit code = 0
> - Host = unict-diit-wn-21.ct.pi2s2.it
> - Source = LRMS
> - Status code = OK
> - Timestamp = Mon Sep 6 00:04:16 2010 CEST
> - User = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe misurelli
> ---
> Event: Accepted
> - Arrived = Mon Sep 6 00:02:23 2010 CEST
> - From = JobController
> - From host = localhost
> - From instance = unavailable
> - Host = egee-wms-01.cnaf.infn.it
> - Local jobid = 2958204
> - Source = LogMonitor
> - Src instance = unique
> - Timestamp = Mon Sep 6 00:02:23 2010 CEST
> - User = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe
> misurelli/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
> ---
> Event: Transfer
> - Arrived = Mon Sep 6 00:02:29 2010 CEST
> - Dest host =
> unict-diit-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf
> - Dest instance =
> /var/glite/logmonitor/CondorG.log/CondorG.1283708907.log
> - Dest jobid = unavailable
> - Destination = LRMS
> - Host = egee-wms-01.cnaf.infn.it
> - Reason = Job successfully submitted to Globus
> - Result = OK
> - Source = LogMonitor
> - Src instance = unique
> - Timestamp = Mon Sep 6 00:02:29 2010 CEST
> - User = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe
> misurelli/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
> ---
> Event: Cancel
> - Arrived = Mon Sep 6 05:33:01 2010 CEST
> - Host = egee-wms-01.cnaf.infn.it
> - Reason = Cancelled by user
> - Source = NetworkServer
> - Src instance =
> https://egee-wms-01.cnaf.infn.it:7443/glite_wms_wmproxy_server
> - Status code = REQ
> - Timestamp = Mon Sep 6 05:33:01 2010 CEST
> - User = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe misurelli
>
>
> Same situation for a cream CE (infn-ce-01.ct.pi2s2.it):
> https://mon-it.cnaf.infn.it/myegee/wizardstatushistory/index?datasource=statushistory&start_type=yesterday&start_date=02%2F04%2F2009&end_type=now&end_date=02%2F11%2F2009&site=on&site_528=on&profilesource=3&service=on&service_3=on#detail%3Fresource_id%3D4789%26profile_id%3D3%26service_id%3D3%26time%3D1283705686
>
> ---
> Event: RegJob
> - Arrived = Sun Sep 5
> 07:58:18 2010 CEST
> - Host =
> glite-rb-00.cnaf.infn.it
> - Ns
> =https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server
> -
> Nsubjobs = 0
> - Source =
> NetworkServer
> - Src instance
> =https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server
> -
> Timestamp = Sun Sep 5 07:58:18 2010 CEST
> - User
> = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe misurelli
> ---
> [...]
>
> ---
> Event: ReallyRunning
> - Arrived = Sun
> Sep 5 07:59:42 2010 CEST
> - Host =
> infn-wn-07.ct.pi2s2.it
> - Source = LRMS
> -
> Timestamp = Sun Sep 5 07:59:42 2010 CEST
> - User
> = /C=IT/O=INFN/OU=Personal
> Certificate/L=CNAF/CN=giuseppe misurelli
> ---
> Event: Done
> -
> Arrived = Sun Sep 5 08:00:48 2010 CEST
> - Exit
> code = 0
> - Host =
> infn-wn-07.ct.pi2s2.it
> - Reason = job
> completed
> - Source = LRMS
> - Status code
> = OK
> - Timestamp = Sun Sep 5
> 08:00:47 2010 CEST
> - User =
> /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli
>
> ---
> Event: Cancel
> - Arrived
> = Sun Sep 5 13:33:16 2010 CEST
> - Host =
> glite-rb-00.cnaf.infn.it
> - Reason =
> Cancelled by user
> - Source = NetworkServer
> -
> Src instance
> =https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server
> -
> Status code = REQ
> - Timestamp =
> Sun Sep 5 13:33:16 2010 CEST
> - User =
> /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli
>
> ---
>
> is it always globus-gma the responsible? or is there anything of weird in the
> batch system configuration? why is it happening just for one user?
>
> thanks,
> Alessandro
>
> --
> Dr. Alessandro Paolini
> INFN - CNAF
> Viale Berti Pichat 6/2
> 40127 Bologna
> Italy
> tel: +39 051 6092723
> fax: +39 051 6092916
> ICQ: 192172027
> skype: alex.paolini
> **********************
> "credo nel potere del riso e delle lacrime"
> "come antidoto all'odio ed al terrore"
> "un giorno senza un sorriso"
> "è un giorno perso">>> Charlie Chaplin
>
\\\|///
\\ ~ ~ //
(/ @ @ /)
-------oOOo-(_)-oOOo----------------------------------
Massimo Sgaravatto
INFN Sezione di Padova
Via Marzolo, 8
35131 Padova - Italy
Tel: ++39 0498275908 Fax: ++39 0498275952
oooO E-mail: massimo.sgaravatto [at] pd.infn.it
( ) Oooo Home page: http://www.pd.infn.it/~sgaravat
--------\ (----( )----------------------------------
\_) ) /
(_/
|