Hi.
All worked orderly before morning 9.03.2007.
After messages -
Following packages have been upgraded on your system:
maui (3.2.6p11-2_SL30X => 3.2.6p17-1_sl3)
maui-client (3.2.6p11-2_SL30X => 3.2.6p17-1_sl3)
maui-server (3.2.6p11-2_SL30X => 3.2.6p17-1_sl3)
Shutting down MAUI Scheduler: ERROR: lost connection to server
ERROR: cannot request service (status)
[FAILED]
Starting MAUI Scheduler: [ OK ]
--
message sent by apt-autoupdate system from ceitep.itep.ru
see: /etc/sysconfig/apt-autoupdate for options
The MAUI and PBS_SERVER ceased to understand each other.
Regards, Yevgeniy.
> Hi
>
> Thanks ... what do the maui logs say? And does torque say anything
> about a failed attempt to connect from maui? What does the torque
>
> qmgr -c
>
> print about administrator rights, is the maui user allowed to schedule
> jobs? (via operator / manager / acl_hosts in torque)
>
> Also, does maui know the correct torque server host? (via maui
> SERVERHOST, ADMINHOST, RMHOST and RMSERVER in maui.cfg)
>
> JT
>
>
> Y.Lyublev wrote:
> > Hi.
> >
> > PBS works correct.
> > [root@ceitep root]# !qs
> > qstat -q
> >
> > server: ceitep.itep.ru
> >
> > Queue Memory CPU Time Walltime Node Run Que Lm State
> > ---------------- ------ -------- -------- ---- --- --- -- -----
> > atlas -- 120:00:0 140:00:0 -- 6 0 -- E R
> > alice -- 120:00:0 140:00:0 -- 0 0 -- E R
> > lhcb -- 120:00:0 140:00:0 -- 7 0 -- E R
> > cms -- 120:00:0 140:00:0 -- 0 0 -- E R
> > dteam -- 48:00:00 72:00:00 -- 0 0 -- E R
> > photon -- 48:00:00 72:00:00 -- 0 0 -- E R
> > ops -- 48:00:00 72:00:00 -- 0 0 -- E R
> > ----- -----
> > 13 0
> >
> > Jobs running and ending orderly.
> > [root@ceitep root]# last -10
> > alice008 ftpd19737 wn62.itep.ru Mon Mar 12 14:32 - 14:32 (00:00)
> > alice008 ftpd17847 wn62.itep.ru Mon Mar 12 14:31 - 14:31 (00:00)
> > alice008 ftpd17840 wn62.itep.ru Mon Mar 12 14:31 - 14:31 (00:00)
> > alice008 ftpd17819 wn62.itep.ru Mon Mar 12 14:31 - 14:31 (00:00)
> > alice010 ftpd17135 wn63.itep.ru Mon Mar 12 14:30 - 14:30 (00:00)
> > alice008 ftpd16304 wn62.itep.ru Mon Mar 12 14:30 - 14:30 (00:00)
> > alice010 ftpd15566 wn63.itep.ru Mon Mar 12 14:29 - 14:29 (00:00)
> > root pts/4 vitep2.itep.ru Mon Mar 12 14:24 still logged
in
> > ops001 ftpd8216 wn50.itep.ru Mon Mar 12 14:23 - 14:23 (00:00)
> > cmssgm ftpd5590 wn63.itep.ru Mon Mar 12 14:21 - 14:21 (00:00)
> >
> > Work parameters of MAUI for queues -
> > NODEALLOCATIONPOLICY CPULOAD
> > GROUPCFG[alice] MAXPROC=20
> > GROUPCFG[atlas] MAXPROC=20
> > GROUPCFG[cms] MAXPROC=20
> > GROUPCFG[lhcb] MAXPROC=20
> >
> > But MAUI commands itself are not perfected -
> > [root@ceitep root]# showq
> > ERROR: lost connection to server
> > ERROR: cannot request service (status)
> >
> > Regards, Yevgeniy.
> >
> >
> >> Hi,
> >>
> >> after this:
> >>
> >> Y.Lyublev wrote:
> >>
> >>> [root@testbed01 root]# /etc/init.d/pbs_server restart
> >>> Shutting down TORQUE Server: [ OK ]
> >>> Starting TORQUE Server: [ OK ]
> >> now try e.g.
> >>
> >> ps uaxw | grep pbs_server
> >>
> >> and
> >>
> >> qstat -q
> >>
> >> and
> >>
> >> qstat -f
> >>
> >> and look in /var/spool/pbs/server_logs. Just the fact that the startup
> >> was successful, doesn't mean that the server keeps running for more
than
> >> a few milliseconds after it "successfully" starts up. "lost connection
> >> to server" sounds like either the maui user is not authenticated to
> >> torque, OR that the server has died immediately after startup (or has
> >> hung) ... sometimes there is a "bad job" in
> >>
> >> /var/spool/pbs/server_priv/jobs
> >>
> >> that is causing the whole thing to hang ...
> >>
> >> JT
> >>
> >>> [root@testbed01 root]# /etc/init.d/maui restart
> >>> Shutting down MAUI Scheduler: ERROR: lost connection to server
> >>> ERROR: cannot request service (status)
> >>> [FAILED]
> >>> Starting MAUI Scheduler: [ OK ]
> >>> [root@testbed01 root]# /etc/init.d/maui restart
> >>> Shutting down MAUI Scheduler: ERROR: lost connection to server
> >>> ERROR: cannot request service (status)
> >>> [FAILED]
> >>> Starting MAUI Scheduler: [ OK ]
> >>>
> >>>
> >>>> Steve
> >>>>> Yes.
> >>>>> For gLite CE -
> >>>>> $ configure_node site-info.def gliteCE TORQUE_server
> >>>>>
> >>>>> For LCG CE -
> >>>>> $ configure_node site-info.def CE_torque
> >>>>>
> >>>>>> Steve
> >>>>>>
> >>>>>>> 2. LFC server works incorrect:
> >>>>>>> On LFC server LFC LOG has -
> >>>>>>> [root@glwms ORIG]# grep error /var/log/lfc*/log
> >>>>>>> /var/log/lfc/log:03/12 04:43:43 2948,0 sendrep: NS002 - send
> >>>>>>> error : Broken
> >>>>>>> pipe
> >>>>>>> /var/log/lfc/log:03/12 05:44:06 2948,0 sendrep: NS002 - send
> >>>>>>> error : Broken
> >>>>>>> pipe
> >>>>>>> /var/log/lfc/log:03/12 06:45:40 2948,0 sendrep: NS002 - send
> >>>>>>> error : Broken
> >>>>>>> pipe
> >>>>>>> /var/log/lfc/log:03/12 07:43:56 2948,0 sendrep: NS002 - send
> >>>>>>> error : Broken
> >>>>>>> pipe
> >>>>>>> /var/log/lfc/log:03/12 08:49:12 2948,0 sendrep: NS002 - send
> >>>>>>> error : Broken
> >>>>>>> pipe
> >>>>>>> /var/log/lfc/log:03/12 09:03:50 2948,0 Cns_insert_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field l
> >>>>>>> ist'
> >>>>>>> /var/log/lfc/log:03/12 09:09:35 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field lis
> >>>>>>> t'
> >>>>>>> /var/log/lfc/log:03/12 09:09:44 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field lis
> >>>>>>> t'
> >>>>>>> /var/log/lfc/log:03/12 09:09:47 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field lis
> >>>>>>> t'
> >>>>>>> /var/log/lfc/log:03/12 09:09:50 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field lis
> >>>>>>> t'
> >>>>>>> /var/log/lfc/log:03/12 09:09:58 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>> /var/log/lfc/log:03/12 09:10:01 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>> /var/log/lfc/log:03/12 09:10:20 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>> /var/log/lfc/log:03/12 09:10:20 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>> /var/log/lfc/log:03/12 09:10:20 24056,0 Cns_list_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>> /var/log/lfc/log:03/12 09:14:13 24056,0 Cns_insert_rep_entry:
> >>>>>>> mysql_query
> >>>>>>> error: Unknown column 'CTIME' in 'field list'
> >>>>>>>
> >>>>>>> And on UI user gets errors when worhs with SE through LFC -
> >>>>>>> [lublev@uiitep TEST]$ lcg-cr -v -d se2.itep.ru -l
> >>>>>>> /grid/alice/my_dir/fileSE22.dat --vo alice
> >>>>>>> file:/home/users/lab240/lublev/JOBS/SC3/file.dat
> >>>>>>> Using grid catalog type: lfc
> >>>>>>> Using grid catalog : glwms.itep.ru
> >>>>>>> Source URL: file:/home/users/lab240/lublev/JOBS/SC3/file.dat
> >>>>>>> File size: 1073741824
> >>>>>>> VO name: alice
> >>>>>>> Destination specified: se2.itep.ru
> >>>>>>> Destination URL for copy:
> >>>>>>> gsiftp://se2.itep.ru/se2.itep.ru:/storage/alice/2007-03-12/
> >>>>>>> file91e66140-81f2
> >>>>>>> -4ca5-ae44-6b86c31d1832.523505.0
> >>>>>>> # streams: 1
> >>>>>>> # set timeout to 0 seconds
> >>>>>>> Alias registered in Catalog: lfn:/grid/alice/my_dir/fileSE22.dat
> >>>>>>> 1059061760 bytes 24352.58 KB/sec avg 22341.82 KB/sec inst
> >>>>>>> Transfer took 43420 ms
> >>>>>>> Internal error
> >>>>>>> Could not register in Catalog the URL
> >>>>>>> srm://se2.itep.ru/dpm/itep.ru/home/alice/generated/2007-03-12/
> >>>>>>> file91e66140-8
> >>>>>>> 1f2-4ca5-ae44-6b86c31d1832
> >>>>>>> lcg_cr: Communication error on send
> >>>>>>>
> >>>>>>>
> >>>>>>> [lublev@uiitep TEST]$ lcg-del -s se2.itep.ru --vo alice
> >>>>>>> lfn:/grid/alice/my_dir/fileSE22.dat
> >>>>>>> Internal error
> >>>>>>> lcg_del: Communication error on send
> >>>>>>>
> >>>>>>>
> >>>>>>> [lublev@uiitep TEST]$ lfc-rm -f alice lfn:/grid/alice/my_dir/
> >>>>>>> fileSE22.dat
> >>>>>>> alice: invalid path
> >>>>>>> send2nsd: NS009 - fatal configuration error: Host unknown: lfn
> >>>>>>> lfn:/grid/alice/my_dir/:fileSE22.dat Host not known
> >>>>>>>
> >>>>>>>
> >>>>>>> Any suggestion on how to proceed?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Yevgeniy.
> >>>>>> --
> >>>>>> Steve Traylen
> >>>>>> [log in to unmask]
> >>>>>> CERN, IT-GD-OPS.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> --
> >>>> Steve Traylen
> >>>> [log in to unmask]
> >>>> CERN, IT-GD-OPS.
> >>>>
> >>>>
> >>>>
> >>>>
|