Hola David,
Yo have to check that from the Worker Nodes you are able to do ssh
without any password from all your grid accounts:
[dteam001@my-wn01 dteam001]$ ssh my-ce.pic.es date
Scientific Linux CERN Release 3.0.8 (SL)
Wed Mar 21 09:50:46 CET 2007
Please run in your WN the script:
/opt/edg/sbin/edg-pbs-knownhosts
and make sure that your CE's are listed in the NODES var:
[root@td234 root]# grep NODES /opt/edg/etc/edg-pbs-knownhosts.conf
NODES = ce07.pic.es ifaece01.pic.es ce06.pic.es ce04.pic.es ce02.pic.es
ce03.pic.es ce05.pic.es castorsrm.pic.es pbs01.pic.es
In the CE side run the scripts:
/opt/edg/sbin/edg-pbs-knownhosts ; /opt/edg/sbin/edg-pbs-shostsequiv
It will find the list of your WN from the command pbsnodes -a
Cheers
Carlos
David Garcia Aristegui wrote:
> In the logs of our WNs (mallarme.cnb.uam.es is the CE)
>
> /var/log/messages
> Mar 20 19:21:41 somontano1 pbs_mom: sys_copy, command '/usr/bin/scp
> -rpB /var/spool/pbs/spool/78127.malla.OU
> [log in to unmask]:/home/ops002/.globus/.gass_cache/local/md5/f4/10c18e28c7e85c10ae1548668c1329/md5/b6/fb28a2fdc50e9558bf00665b0a96b7/data'
> failed with status=1, giving up after 4 attempts
> Mar 20 19:21:41 somontano1 pbs_mom: req_cpyfile, Unable to copy file
> /var/spool/pbs/spool/78127.malla.OU to
> [log in to unmask]:/home/ops002/.globus/.gass_cache/local/md5/f4/10c18e28c7e85c10ae1548668c1329/md5/b6/fb28a2fdc50e9558bf00665b0a96b7/data
>
> Mar 20 19:21:46 somontano1 pbs_mom: sys_copy, command '/usr/bin/scp
> -rpB /var/spool/pbs/spool/78127.malla.ER
> [log in to unmask]:/home/ops002/.globus/.gass_cache/local/md5/f4/10c18e28c7e85c10ae1548668c1329/md5/01/9f8deea22e3043b0889e135658593f/data'
> failed with status=1, giving up after 4 attempts
> Mar 20 19:21:46 somontano1 pbs_mom: req_cpyfile, Unable to copy file
> /var/spool/pbs/spool/78127.malla.ER to
> [log in to unmask]:/home/ops002/.globus/.gass_cache/local/md5/f4/10c18e28c7e85c10ae1548668c1329/md5/01/9f8deea22e3043b0889e135658593f/data
>
>
> /var/spool/pbs/mom_logs
> 03/20/2007 19:27:20;0002; pbs_mom;n/a;mom_main;hello sent to server
> localhost
> 03/20/2007 19:28:50;0002; pbs_mom;n/a;mom_main;connection to server
> localhost timeout
> 03/20/2007 19:28:50;0002; pbs_mom;n/a;mom_main;hello sent to server
> localhost
> 03/20/2007 19:30:20;0002; pbs_mom;n/a;mom_main;connection to server
> localhost timeout
> 03/20/2007 19:30:20;0002; pbs_mom;n/a;mom_main;hello sent to server
> localhost
>
> Any ideas about this? MAUI is not running properly, sometimes jobs
> remains queued with free resources, and they start to run after a
> pbs_mom restart (i don't konw if this is a related problem).
>
> Thank you in advanced.
--
=========================================================================
Carlos Borrego Iglesias PIC (Port d'Informació Científica)
tel: +34 93 581 3322 Campus UAB - Edifici D
e-mail: [log in to unmask] E-08193 Bellaterra
=========================================================================
Avis - Aviso - Legal Notice: http://www.ifae.es/legal.html
|