Ricardo
Yup - it's working fine now.
Trevor
.lf n25
Dr Trevor Daniels
c/o CCLRC eSC Department Phone: (+44)|(0) 1235 778093
Rutherford Appleton Laboratory Fax: (+44)|(0) 1235 446626
Chilton, DIDCOT, Oxon, OX11 0QX, UK Email: [log in to unmask]
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended recipients do
not take action on it or show it to anyone else, but return this email to
the sender and delete your copy of it.
> -----Original Message-----
> From: Ricardo Graciani [mailto:[log in to unmask]]
> Sent: Tuesday, December 16, 2003 8:07 PM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] GOC Report 15 Dec 2003
>
>
> Hi,
>
> we are running since yesterday morning with the new
> kernel on our
> CE (2.4.20-24.7) at UB. The problem was related to the APM
> blanking of the
> screen option of the kernel. We have recompiled the kernel
> without this
> option. Everything looks stable now. Could you please confirm?
>
> Regards
>
> Ricardo
>
>
>
> On Mon, 15 Dec 2003, Ricardo Graciani wrote:
>
> > Hi,
> >
> > we have observed some problems at UB with our CE
> since the last
> > update, but we are not sure if they is a correlation.
> >
> > With the new kernel (2.4.20-24.7) the CE hangs 8-15
> minutes after
> > restart.
> >
> > We went back to the kernel we had before the update
> 2.4.22 (we
> > have been using this version form the begining since some
> of our NICs were
> > not supported in previous versions), it look fine but we
> had already 2
> > crashes after ~ 1 day of running.
> >
> > There is no error message on any log file and
> ganglia and psacc
> > detect no strange system activities before the crashes (ie
> no increase of
> > memory use, number of processes running...)
> >
> > We are looking into the problem so our CE will be
> unstable for
> > some time.
> >
> > Regards,
> >
> >
> > Ricardo
> >
> > On Mon, 15 Dec 2003, Daniels, T (Trevor) wrote:
> >
> > > The RAL RB stopped worked working just after 15:00 on Sun
> 14 Dec, and the
> > > Budapest RB stopped working just after 05:00 also on Sun
> 14 Dec. The cause
> > > appears to be the network server in both cases. Please
> restart them.
> > >
> > > Jobs have never executed successfully when submitted by
> either Globus or an
> > > RB at 2 sites:
> > >
> > > CSCS
> > > IN2P3
> > >
> > > Both Globus and RB-submitted jobs are failing at two sites:
> > >
> > > Budapest
> > > Globus:
> > > GRAM Job submission failed because the connection to the
> server failed
> > > (check host and port) (error code 12)
> > > via CERN RB:
> > > Current Status: Aborted
> > > Status Reason: Cannot plan: BrokerHelper: no
> compatible resources
> > > reached on: Mon Dec 15 09:01:11 2003
> > >
> > > UB
> > > Globus:
> > > GRAM Job submission failed because the connection to the
> server failed
> > > (check host and port) (error code 12)
> > > via CERN RB:
> > > Current Status: Submitted [all jobs stuck since just
> after 07:00 on Fri
> > > 12 Dec]
> > > reached on: Mon Dec 15 09:13:44 2003
> > >
> > > Globus jobs work but jobs submitted via the CERN RB fail
> at 4 sites:
> > >
> > > BNL
> > > **** Warning: API_NATIVE_ERROR ****
> > > Error while calling the "Status:getStatus" native api
> > > Unable to retrieve the status for:
> > > https://lxshare0380.cern.ch:9000/7t4k143jbL5UtxbIYabBg
> > > edg_wll_JobStatus: No such file or directory: Query
> returned no result.
> > > [This is a new error - we'll investigate further]
> > >
> > > CNAF
> > > Current Status: Aborted
> > > Status Reason: Cannot plan: BrokerHelper: no
> compatible resources
> > > [last job to run successfully was at 05:00 on Sun 14 Dec]
> > >
> > > FZK
> > > Current Status: Scheduled
> > > Status Reason: Job successfully submitted to Globus
> > > Destination: hik-lcg-ce.fzk.de:2119/jobmanager-lcgpbs-short
> > > [no jobs have run since just after 18:00 on Sat 13 Dec]
> > >
> > > NIKHEF
> > > Current Status: Aborted
> > > Status Reason: Cannot plan: BrokerHelper: no
> compatible resources
> > > [no jobs submitted via a RB have ever worked - we need to
> investigate
> > > further]
> > >
> > > Trevor
> > > .lf n25
> > >
> > > Dr Trevor Daniels
> > > c/o CCLRC eSC Department Phone: (+44)|(0)
> 1235 778093
> > > Rutherford Appleton Laboratory Fax: (+44)|(0)
> 1235 446626
> > > Chilton, DIDCOT, Oxon, OX11 0QX, UK Email:
> [log in to unmask]
> > > The contents of this email are sent in confidence for the
> use of the
> > > intended recipient only. If you are not one of the
> intended recipients do
> > > not take action on it or show it to anyone else, but
> return this email to
> > > the sender and delete your copy of it.
> > >
> >
> > --
> >
> ==============================================================
> ==================
> >
> > Ricardo Graciani Diaz
> >
> > Dept. Estructura i Constituents de la Materia
> > Facultat de Fisica Tel: +34 93 403 7062
> > Universitat de Barcelona Fax: +34 93 402 1198
> >
> > Diagonal, 647
> > E-08028 Barcelona
> >
> >
> ==============================================================
> ==================
> >
>
> --
> ==============================================================
> ==================
>
> Ricardo Graciani Diaz
>
> Dept. Estructura i Constituents de la Materia
> Facultat de Fisica Tel: +34 93 403 7062
> Universitat de Barcelona Fax: +34 93 402 1198
>
> Diagonal, 647
> E-08028 Barcelona
>
> ==============================================================
> ==================
>
|