Hi Joe,
here is how much space is used where in /opt on a WN
Total 3.4 GB
for the experiments:
364M alice
227M alien
1.1G atlas
1.2G cms
------------
experiments total ~ 2.9GB
and for the middle ware
14M classads
369M edg
19M gcc-2.95.2.1
38M gcc-3.2.2
107M globus
2.9M gpt
-----
~550MB
on a worker node the total space used in /
is 4.9GB. I think you stated in the begining that your
nodes have 75GB disks.
The space should be sufficient.
Stange that only 650MB is free
markus
On Thu, 18 Sep 2003, Joe Kaiser wrote:
> On Thu, 2003-09-18 at 02:05, Markus SCHULZ wrote:
> > Hi Joe,
> > how much disk space is left on the system?
>
> 650MB
>
> > All other sites use the same rpmlists and they install.
> >
> > In this release the experiment specific software was distributed with the
> > release (which was a mistake).
> >
> > This doesn't do any harm but in your case there seems to be a problem.
> >
> > To make sure that you remove all of the experiment specific RPMS check:
> >
> > 1) In the CE-rpm file the following section has to be commented:
> >
> > /* Application software. Activate only if you plan to run jobs on the CE
> > */
> > /* #include "apps_common-rpm.h" */
> > /* #include "CMS-rpm.h" */
> > /* #include "Atlas-rpm.h" */
> > /* #include "Alice-rpm.h" */
> > /* #include "LHCb-rpm.h" */
> >
> > 2) In the WN-rpm file comment the following section (which is currently
> > enabled)
> >
> > /* Common experiment software */
> > #include "apps_common-rpm.h"
> > #include "CMS-rpm.h"
> > #include "Atlas-rpm.h"
> > #include "Alice-rpm.h"
> > #include "LHCb-rpm.h"
> >
>
> Okay, I will do this....
>
> > Create the profiles and run the updaterpms.
> > In case you are short of disk space this should help a bit.
> >
> > markus
> >
> >
> > On Wed, 17 Sep 2003, Joe Kaiser wrote:
> >
> > > After comparing this list to my WN, there are many many rpms that are
> > > missing from WN's. I have tracked it down to the following:
> > >
> > > I get this error on an updaterpms in the boot sequence:
> > >
> > > [INFO] updaterpms: Flagging gpt-2.2.9-2 for installation
> > > [WARNING] updaterpms: installing package aliroot-3.09.06-1-1 needs 55Mb
> > > on the / filesystem
> > > [WARNING] updaterpms: updaterpms failed
> > > [OK] updaterpms: started
> > >
> > >
> > > This apparently causes all the rest of the rpms to not install. If I
> > > comment out the
> > >
> > > #include Alice-rpm.h
> > >
> > > line in rpmlist/WN-rpm
> > >
> > > Then all of the proper rpms install with the exception of the Alice ones
> > > of course. Soooooooo, what's the proper way to take care of this? Am i
> > > just playing with an old set of rpm.h files? (I did check out what I
> > > think are the most recent ones...)
> > >
> > > Thanks,
> > >
> > > Joe
> > >
> > >
> > > On Wed, 2003-09-17 at 11:44, Markus SCHULZ wrote:
> > > > Hi Joe,
> > > > I attached the list of RPMS on the WN to the first mail. Here it is again.
> > > >
> > > > Maybe the rpmcfg files for the WN are corrupted. You could checkout
> > > > the rpmlist directory from CVS and check if this is what you have
> > > > for your WNs.
> > > >
> > > > markus
> > > >
> > > > On Wed, 17 Sep 2003, Joe Kaiser wrote:
> > > >
> > > > > I'm using full LCFGng not the LITE distribution. I am not sure why this
> > > > > is happening on the worker nodes, the pbs rpms don't get installed
> > > > > either. Please send me the list and maybe I can track down what is
> > > > > going on......
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joe
> > > > >
> > > > > On Wed, 2003-09-17 at 01:41, Markus SCHULZ wrote:
> > > > > > Hi Joe,
> > > > > > no the /opt/globus directory is not shared through NFS.
> > > > > > In case you don't see that file you are missing at least the
> > > > > > vdt_globus_essentials-VDTALT1.1.8-9 RPM.
> > > > > >
> > > > > > How did you assemble the list of RPMs that you installed on the WNs?
> > > > > >
> > > > > > I'll attach the list of RPMs that would be there if you would use the
> > > > > > rpm lists that have been provided. You can compare to what you have and
> > > > > > do some educated guesswork what's missing.
> > > > > >
> > > > > > markus
> > > > > >
> > > > > > p.s. how did you install the WNs?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, 16 Sep 2003, Joe Kaiser wrote:
> > > > > >
> > > > > > > Oh, actually this turns out to be easy. It really isn't there because
> > > > > > > this is running on a worker node. How do I get an /opt/globus set of
> > > > > > > files on my worker node. Does that directory have to NFS exported or am
> > > > > > > I missing an rpm or two?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Joe
> > > > > > >
> > > > > > >
> > > > > > > On Tue, 2003-09-16 at 10:54, Daniels, T (Trevor) wrote:
> > > > > > > > Joe
> > > > > > > >
> > > > > > > > OK, that moves it on a step. My job now executes but terminates with job
> > > > > > > > status
> > > > > > > >
> > > > > > > > Printing status info for the Job :
> > > > > > > > https://lxshare0380.cern.ch:9000/YNVkyqJVF_CxbEaA-V8E4w
> > > > > > > > Current Status: Done (Cancelled)
> > > > > > > > Exit code: 0
> > > > > > > > Status Reason: /opt/globus/etc/globus-user-env.sh not found or
> > > > > > > > unreadable
> > > > > > > > Destination: hotdog46.fnal.gov:2119/jobmanager-pbs-short
> > > > > > > > reached on: Tue Sep 16 15:51:22 2003
> > > > > > > >
> > > > > > > > Trevor
> > > > > > > > .lf n25
> > > > > > > >
> > > > > > > > Dr Trevor Daniels
> > > > > > > > c/o CCLRC eSC Department Phone: (+44)|(0) 1235 778093
> > > > > > > > Rutherford Appleton Laboratory Fax: (+44)|(0) 1235 446626
> > > > > > > > Chilton, DIDCOT, Oxon, OX11 0QX, UK Email: [log in to unmask]
> > > > > > > > The contents of this email are sent in confidence for the use of the
> > > > > > > > intended recipient only. If you are not one of the intended recipients do
> > > > > > > > not take action on it or show it to anyone else, but return this email to
> > > > > > > > the sender and delete your copy of it.
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Joe Kaiser [mailto:[log in to unmask]]
> > > > > > > > > Sent: Tuesday, September 16, 2003 4:40 PM
> > > > > > > > > To: [log in to unmask]
> > > > > > > > > Subject: Re: [LCG-ROLLOUT] pbs issues
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > try jobmanager-pbs-short
> > > > > > > > >
> > > > > > > > > On Tue, 2003-09-16 at 10:20, Daniels, T (Trevor) wrote:
> > > > > > > > > > Joe
> > > > > > > > > >
> > > > > > > > > > Just recently the authentication problem has cleared. The
> > > > > > > > > people here have
> > > > > > > > > > reported the response from Nikhef has been poor today,
> > > > > > > > > presumably network
> > > > > > > > > > problems. So I guess that was just a glich.
> > > > > > > > > >
> > > > > > > > > > I have now successfully submitted a job directly by
> > > > > > > > > globus-job-run to
> > > > > > > > > > hotdog46, and am just about to try the same via the CERN RB........
> > > > > > > > > >
> > > > > > > > > > It failed with
> > > > > > > > > >
> > > > > > > > > > Status Reason: Cannot plan (a helper failed)
> > > > > > > > > >
> > > > > > > > > > This usually means I've specified the wrong queue on the
> > > > > > > > > CE. The default I
> > > > > > > > > > use for LCG1-1_0_0 is jobmanager-lcgpbs-short - is this wrong?
> > > > > > > > > >
> > > > > > > > > > Trevor
> > > > > > > > > > .lf n25
> > > > > > > > > >
> > > > > > > > > > Dr Trevor Daniels
> > > > > > > > > > c/o CCLRC eSC Department Phone: (+44)|(0) 1235 778093
> > > > > > > > > > Rutherford Appleton Laboratory Fax: (+44)|(0) 1235 446626
> > > > > > > > > > Chilton, DIDCOT, Oxon, OX11 0QX, UK Email: [log in to unmask]
> > > > > > > > > > The contents of this email are sent in confidence for the use of the
> > > > > > > > > > intended recipient only. If you are not one of the
> > > > > > > > > intended recipients do
> > > > > > > > > > not take action on it or show it to anyone else, but return
> > > > > > > > > this email to
> > > > > > > > > > the sender and delete your copy of it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Joe Kaiser [mailto:[log in to unmask]]
> > > > > > > > > > > Sent: Tuesday, September 16, 2003 3:55 PM
> > > > > > > > > > > To: [log in to unmask]
> > > > > > > > > > > Subject: Re: [LCG-ROLLOUT] pbs issues
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Apparently that is because you aren't in the
> > > > > > > > > grid-mapfile. When doing
> > > > > > > > > > > a:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I get the following: Is the NIKHEF vo down?
> > > > > > > > > > >
> > > > > > > > > > > /opt/edg/sbin/edg-mkgridmap --output --safe
> > > > > > > > > > >
> > > > > > > > > > > Interrupt: Hit ENTER or type command to continue
> > > > > > > > > > > ldap
> > > > > > > > > > > search(ldap://grid-vo.nikhef.nl/ou=testbed1,o=alice,dc=eu-data
> > > > > > > > > > grid,dc=org): Connection failed
> > > > > > > > > > >
> > > > > > > > > > > Skipping /etc/grid-security/grid-mapfile writing
> > > > > > > > > > >
> > > > > > > > > > > Exit with error(s) (code=64)
> > > > > > > > > > >
> > > > > > > > > > > shell returned 64
> > > > > > > > > > >
> > > > > > > > > > > My /opt/edg/etc/edg-mkgridmap.conf looks like this:
> > > > > > > > > > >
> > > > > > > > > > > #### GROUP: group URI [lcluser]
> > > > > > > > > > > # LCG Standard Virtual Organizations
> > > > > > > > > > > group
> > > > > > > > > > > ldap://grid-vo.nikhef.nl/ou=testbed1,o=alice,dc=eu-datagrid,dc=org
> > > > > > > > > > > .alice
> > > > > > > > > > > group
> > > > > > > > > > > ldap://grid-vo.nikhef.nl/ou=testbed1,o=atlas,dc=eu-datagrid,dc=org
> > > > > > > > > > > .atlas
> > > > > > > > > > > group
> > > > > > > > > ldap://grid-vo.nikhef.nl/ou=tb1users,o=cms,dc=eu-datagrid,dc=org
> > > > > > > > > > > .cms
> > > > > > > > > > > group
> > > > > > > > > > > ldap://grid-vo.nikhef.nl/ou=tb1users,o=lhcb,dc=eu-datagrid,dc=org
> > > > > > > > > > > .lhcb
> > > > > > > > > > > group ldap://lcg-vo.cern.ch/ou=lcg1,o=dteam,dc=lcg,dc=org .dteam
> > > > > > > > > > >
> > > > > > > > > > > #### AUTH: authorization URI
> > > > > > > > > > > auth
> > > > > > > > > ldap://lcg-registrar.cern.ch/ou=users,o=registrar,dc=lcg,dc=org
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2003-09-16 at 03:22, Daniels, T (Trevor) wrote:
> > > > > > > > > > > > Joe
> > > > > > > > > > > >
> > > > > > > > > > > > I find I can't authenticate against hotdog46.fnal.gov using
> > > > > > > > > > > my certificate
> > > > > > > > > > > > which is registered in DTEAM, so I can't try a job submit.
> > > > > > > > > > > Here's the
> > > > > > > > > > > > error:
> > > > > > > > > > > >
> > > > > > > > > > > > GRAM Authentication test failure: authentication with the
> > > > > > > > > > > remote server
> > > > > > > > > > > > failed
> > > > > > > > > > > >
> > > > > > > > > > > > Trevor
> > > > > > > > > > > > .lf n25
> > > > > > > > > > > >
> > > > > > > > > > > > Dr Trevor Daniels
> > > > > > > > > > > > c/o CCLRC eSC Department Phone:
> > > > > > > > > (+44)|(0) 1235 778093
> > > > > > > > > > > > Rutherford Appleton Laboratory Fax:
> > > > > > > > > (+44)|(0) 1235 446626
> > > > > > > > > > > > Chilton, DIDCOT, Oxon, OX11 0QX, UK Email:
> > > > > > > > > [log in to unmask]
> > > > > > > > > > > > The contents of this email are sent in confidence for
> > > > > > > > > the use of the
> > > > > > > > > > > > intended recipient only. If you are not one of the
> > > > > > > > > > > intended recipients do
> > > > > > > > > > > > not take action on it or show it to anyone else, but return
> > > > > > > > > > > this email to
> > > > > > > > > > > > the sender and delete your copy of it.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: Joe Kaiser [mailto:[log in to unmask]]
> > > > > > > > > > > > > Sent: Monday, September 15, 2003 11:06 PM
> > > > > > > > > > > > > To: [log in to unmask]
> > > > > > > > > > > > > Subject: [LCG-ROLLOUT] pbs issues
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have decided to go for the NFS shared directories for
> > > > > > > > > > > PBS because
> > > > > > > > > > > > > getting PBS to work with kerberos is an undertaking I am
> > > > > > > > > > > not prepared
> > > > > > > > > > > > > with either time or expertise to undertake.
> > > > > > > > > > > > >
> > > > > > > > > > > > > LCG1 will eventuall allow for other batch systems right?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Anyway, I need to have the home areas from the CE mounted
> > > > > > > > > > > to the WN's
> > > > > > > > > > > > > which is supposed to happen (as near as I can tell)
> > > > > > > > > if you leave
> > > > > > > > > > > > > NO_HOME_SHARED, which I have done. The directories do
> > > > > > > > > > > not get mounted
> > > > > > > > > > > > > however. Right now they are mounted by hand but a reboot
> > > > > > > > > > > > > will wipe that
> > > > > > > > > > > > > out. Can you please give me the magic recipe?
> > > > > > > > > > > > >
> > > > > > > > > > > > > In any event please test that you can submit jobs to fermilab
> > > > > > > > > > > > > and let me
> > > > > > > > > > > > > know if there are any problemsl.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Joe
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > ===================================================================
> > > > > > > > > > > > > Joe Kaiser - Systems Administrator
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fermi Lab
> > > > > > > > > > > > > CD/OSS-SCS Never laugh at live dragons.
> > > > > > > > > > > > > 630-840-6444
> > > > > > > > > > > > > [log in to unmask]
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > ===================================================================
> > > > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > ===================================================================
> > > > > > > > > > > Joe Kaiser - Systems Administrator
> > > > > > > > > > >
> > > > > > > > > > > Fermi Lab
> > > > > > > > > > > CD/OSS-SCS Never laugh at live dragons.
> > > > > > > > > > > 630-840-6444
> > > > > > > > > > > [log in to unmask]
> > > > > > > > > > >
> > > > > > > > > ===================================================================
> > > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > ===================================================================
> > > > > > > > > Joe Kaiser - Systems Administrator
> > > > > > > > >
> > > > > > > > > Fermi Lab
> > > > > > > > > CD/OSS-SCS Never laugh at live dragons.
> > > > > > > > > 630-840-6444
> > > > > > > > > [log in to unmask]
> > > > > > > > > ===================================================================
> > > > > > > > >
> > > > > > > --
> > > > > > > ===================================================================
> > > > > > > Joe Kaiser - Systems Administrator
> > > > > > >
> > > > > > > Fermi Lab
> > > > > > > CD/OSS-SCS Never laugh at live dragons.
> > > > > > > 630-840-6444
> > > > > > > [log in to unmask]
> > > > > > > ===================================================================
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > *************************************************************************
> > > > > > * *
> > > > > > * CERN Markus W. Schulz *
> > > > > > * Bat. 31 2-015 *
> > > > > > * CH-1211 Geneva 23 *
> > > > > > * *
> > > > > > * Phone: +41 22 76 77909 *
> > > > > > * www.cern.ch *
> > > > > > * *
> > > > > > *************************************************************************
> > > > > --
> > > > > ===================================================================
> > > > > Joe Kaiser - Systems Administrator
> > > > >
> > > > > Fermi Lab
> > > > > CD/OSS-SCS Never laugh at live dragons.
> > > > > 630-840-6444
> > > > > [log in to unmask]
> > > > > ===================================================================
> > > > >
> > > >
> > > > --
> > > > *************************************************************************
> > > > * *
> > > > * CERN Markus W. Schulz *
> > > > * Bat. 31 2-015 *
> > > > * CH-1211 Geneva 23 *
> > > > * *
> > > > * Phone: +41 22 76 77909 *
> > > > * www.cern.ch *
> > > > * *
> > > > *************************************************************************
> > > --
> > > ===================================================================
> > > Joe Kaiser - Systems Administrator
> > >
> > > Fermi Lab
> > > CD/OSS-SCS Never laugh at live dragons.
> > > 630-840-6444
> > > [log in to unmask]
> > > ===================================================================
> > >
> >
> > --
> > *************************************************************************
> > * *
> > * CERN Markus W. Schulz *
> > * Bat. 31 2-015 *
> > * CH-1211 Geneva 23 *
> > * *
> > * Phone: +41 22 76 77909 *
> > * www.cern.ch *
> > * *
> > *************************************************************************
> --
> ===================================================================
> Joe Kaiser - Systems Administrator
>
> Fermi Lab
> CD/OSS-SCS Never laugh at live dragons.
> 630-840-6444
> [log in to unmask]
> ===================================================================
>
--
*************************************************************************
* *
* CERN Markus W. Schulz *
* Bat. 31 2-015 *
* CH-1211 Geneva 23 *
* *
* Phone: +41 22 76 77909 *
* www.cern.ch *
* *
*************************************************************************
|