Print

Print


Il 13/05/2010 10.07, Tomas Kouba ha scritto:
> Hello,
>
> occasionally we get the following error in SAM and Nagios tests:
>
> Event: Done
> - Arrived = Thu May 13 08:40:35 2010 CEST
> - Exit code = 0
> - Host = salix03.farm.particle.cz
> - Reason = Cannot move ISB (retry_copy ${globus_transfer_cmd}
> gsiftp://wms209.cern.ch:2811/var/glite/SandboxDir/QE/https_3a_2f_2fwms209.cern.ch_3a9000_2fQEPNaNZQNgpW67ZWLTTnFg/input/nagrun.sh
> file:///scratch/home_pool/sgmops004/home_cream_810327382/CREAM810327382/nagrun.sh):
>
> error: globus_ftp_client: the server responded with an error
> 500 Command failed. : globus_xio: An end of file occurred
> - Source = LRMS
> - Status code = FAILED
> - Timestamp = Thu May 13 08:40:34 2010 CEST
> - User = /DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=wlapka/CN=623537/CN=Wojciech Lapka
>
>
> I do not understand the error, mainly if it is a problem on our side (WN
> or cream CE itself?) or on the side of WMS
> submitting the job (wms209.cern.ch).
> Can somebody shed some light on it for me?
>
> Thank you for any help.
>
> --
> Tomas Kouba
>    
hi thomas,
it seems that wms209.cern.ch is overload for too many ftp connection. 
Indeed looking the SAM history
https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&option=old&sensors=CREAMCE&vo=ops&nodename=cream1.farm.particle.cz, 
several jobs haven't been submitted due to this:

+ glite-wms-job-submit -a --vo ops -o testjob.jid testjob.jdl
Warning - --vo option ignored
Connecting to the service 
https://wms206.cern.ch:7443/glite_wms_wmproxy_server
Warning - Unable to register the job to the service: 
https://wms206.cern.ch:7443/glite_wms_wmproxy_server
System load is too high:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Threshold for FTP Connection: 300 => Detected value for FTP Connection: 424
Method: jobRegister
Connecting to the service 
https://wms209.cern.ch:7443/glite_wms_wmproxy_server
Warning - Unable to register the job to the service: 
https://wms209.cern.ch:7443/glite_wms_wmproxy_server
System load is too high:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Threshold for FTP Connection: 300 => Detected value for FTP Connection: 413
Method: jobRegister
Error - Operation failed
Unable to find any endpoint where to perform service request
+ set +x

so probably the error "Cannot move ISB "  is due to that WMS.

Instead for nagios your CE is working fine now:
https://sam-ce-roc.cern.ch/nagios/cgi-bin/status.cgi?host=cream1.farm.particle.cz

even if there have been several failures in the past hours
https://sam-ce-roc.cern.ch/nagios/cgi-bin/history.cgi?host=cream1.farm.particle.cz
but we cannot see the errors details

However it seems to me that your CE is working fine

$ glite-ce-allowed-submission cream1.farm.particle.cz:8443
2010-05-13 15:01:29,932 WARN - No configuration file suitable for 
loading. Using built-in configuration
Job Submission to this CREAM CE is enabled

and also the submission (I've tried as dteam user):

$ glite-ce-job-submit -a -r 
cream1.farm.particle.cz:8443/cream-pbs-gridtest sleep.jdl
2010-05-13 15:10:31,724 WARN - No configuration file suitable for 
loading. Using built-in configuration
https://cream1.farm.particle.cz:8443/CREAM521588138

and after a bit:

$ glite-ce-job-status https://cream1.farm.particle.cz:8443/CREAM521588138
2010-05-13 15:13:08,963 WARN - No configuration file suitable for 
loading. Using built-in configuration

******  JobID=[https://cream1.farm.particle.cz:8443/CREAM521588138]
         Status        = [IDLE]

$ glite-ce-job-status https://cream1.farm.particle.cz:8443/CREAM521588138
2010-05-13 15:17:03,213 WARN - No configuration file suitable for 
loading. Using built-in configuration

******  JobID=[https://cream1.farm.particle.cz:8443/CREAM521588138]
         Status        = [DONE-OK]
         ExitCode      = [0]

where the jdl is:

$ less sleep.jdl
[
executable="/bin/sleep";
arguments="1";
]

Cheers,
Alessandro

-- 
Dr. Alessandro Paolini
INFN - CNAF
Viale Berti Pichat 6/2
40127 Bologna
Italy
tel: +39 051 6092723
fax: +39 051 6092916
ICQ: 192172027
skype: alex.paolini
**********************
"credo nel potere del riso e delle lacrime"
    "come antidoto all'odio ed al terrore"
         "un giorno senza un sorriso"
              "è un giorno perso">>>  Charlie Chaplin