Emanuele LEONARDI wrote:
> I'll try answering a few of your questions...
Thanks a lot!
> This remark is correct: updaterep wants to do a full mirror copy of the
> rpm repository in Lyon (7-8 GB now, I think) and takes a long time. The
> good part is that, after the first mirror, you only get updates and this
> is usually very fast.
My repository is only about 3GB large. But I guess I have to switch
to updaterep to automate getting updated packages in the future.
>>was lying ahead: routing was not correctly set up. The file
>>/etc/sysconfig/static-routes is needed for this but I couldn't
>>find a way to have it created automatically. I therefore wrote
>>my own LCFG object to handle this. Is there another, already
>>existing way to do this?
> Not that I know of. The network object is only able to define a default
> gateway but I think you have a more complex routing schema.
>
> Can you please upload to CVS the configuration files you are using and
> send us the object you wrote? It should be part of the general
> distribution.
The configuration files are already in CVS. I am hesitating to send
you the object right now as the routing table is hardcoded into
the object, so it doesn't really help anybody (except me). I don't
really know how LCFG works, so it would take me quite a long time to
create a real, generic object. But if one of the LCFG experts would
take a look at my object and the default file that I wrote, the object
might be ready in about an hour or two. Do you know anybody who could
do this? I am volunteering to test it, as soon as I have an automatically
chkconfig'd ypbind ;-).
>>This fixed the networking problem, so I could start installing
>>CE and SE. Installation worked more or less okay, but I got
>>a bit confused when shortly after starting the installation,
>>the machines didn't do anything (at least anything observable)
>>for about five minutes. I thought they crashed and rebooted
>>them a couple of times before waiting long enough. This didn't
>>happen with the UI or the WNs. Did anybody else encounter this
>>behavior? What's going on during that time?
>
> I have not observed this behaviour before. Does this happen during the
> inital installation or after the first reboot? In anny case, I will keep
> an eye open when we will install our nodes.
It's during the initial installation and it's right before flagging RPMs
for installation. Might be before formatting the disk, too. I will
probably install another CE in a couple of days and I'll have more info
then.
> This NFS cross-mounting configuration is a kind of relic from times now
> past: I think we should change it to something like the configuration
> you are using. At CERN we mount everything from a disk server.
How much space will be needed for the different directories? There are the
home directories, the flatfiles area and the grid-security stuff. The latter
doesn't need much space, so I guess it's okay to keep it locally. But I am
unsure of how much space the other two areas might need. And depending on
their space requirements it might be a good idea to move them to some disk
server. This is what we planned here at FZK anyway, but for testing the
installation, I kept everything local...
> About the time zone, we always used Europe/Paris (which was included in
> the example site-cfg.h file). It is indeed strange that CEST is not
> correctly handled, though. I should note this in the installation
> instructions.
I think I specified CEST in the first questionaire and that's how it got
into my configurations. Didn't know that it has serious consequences ;-).
>>A problem was name resolving. As all WNs are on a private
>>network, they do not have a DNS entry. Name resolving relies on
>>a hosts file distributed via NIS. So I also had to configure
>>NIS on all the machines. This works okay, but I couldn't persuade
>>LCFG to automatically start ypbind at boot time. I added the line
>>REPLACE(chkconfig.services,network,network ypbind) to my
>>configuration, but to no avail. Does anybody know how to do
>>this? (Side note: I tried to use numerical IP addresses in the
>>beginning. Some LCFG objects do not like this at all and fail.
>>You have to use names and they need to be resolvable!)
>
> I had the same problem with an old layout where we used NIS for user
> accounts: the ypbind startup was done by hand (well, we had a script for
> this). The chkconfig can be used but the syntax should probably be
>
> EXTRA(chkconfig.services) ypbind
>
> but you should check if this is really all you need to start the NIS
> client. I attached to this e-mail a (somewhat old) guide to NIS
> configuration I wrote for the old EDG testbed.
Thanks for that guide. I'll take a closer look at it during the next
few days. I installed another WN yesterday and I found out that the
configuration that comes with the NIS/nsswitch RPMs works alright
(although I will probably want to fine tune it). So the only thing
I need is something that calls "chkconfig ypbind on" during the
installation. I'll have a look at it tomorrow.
>>Another thing I was stuck for some time was that changes in
>>the configuration were not automatically enacted by the
>>LCFG objects. New profiles were fetched, but I had to call
>>changed objects by hand. The reason for this was that some
>>script (I think /usr/bin/om) does some kind of double reverse
>>lookup of the hostname and this only works if the domain is
>>properly set. And it wasn't in my case. Adding the FQDN to
>>/etc/hosts solved this, i.e. changing
>>10.20.1.101 c20-001-101 c20-001-101
>>to
>>10.20.1.101 c20-001-101.fzk.de c20-001-101
>>I must have made a mistake in the configuration somewhere as
>>it worked fine on the CE and SE...
>
> I just checked: on our nodes /etc/hosts contains the full host name.
I hope that this will be fixed if I have a ypbind working at the
first startup. But this is just a wild guess. The two problems might
be completely unrelated... Does anybody know where "dnsdomainname"
gets the domain name from?
>>* Is mounting the gridmapdir directory on the WNs necessary?
>
> WNs need the /etc/grid-security/certificates directory in order to be
> able to verify the identity of the servers they contact. As having a
> big, possibly huge, number of WNs all around teh world trying and
> contacting the CA certs distribution sites at the same time to get the
> latest greatest CRL does not seem a good idea ;-), we chose to have a
> single shared directory and let only one node doing the job for
> everybody.
Err, wait. In my setup, only /etc/grid-security/gridmapdir is mounted
on all machines. This is the directory that contains the mapping from
DNs to pool accounts. The /etc/grid-security/certificates directory is
kept locally on each and every machine. (And I have no idea where the
CRLs are kept.) Is my set-up messed up? Or am I missing something else
here?
>>* The LCFG object nfsmount does not automatically enact changes.
>> Adding the line "nfsmount.ng_reconfig restart" had the desired
>> effect. Is this a good idea? Why is it not done anyway?
>
> I think I remember discussing this issue long time ago and the outcome
> was that triggering an automatic update can cause big trouble to the
> users of the node if the changes include the umount-ing a file system
> while it is being used.
This is definitely true. At least in a production environment. For
my installation testing I was happy to not needing to go to each
and every WN and restart nfsmount. Okay, okay, I only installed two
WNs, but still ;-). But as only one line has to be changed to adjust
the behavior, I think I can live with the current situation.
> I think that your message is exactly the kind of feedback needed to get
> LCG into a production state. It would be very useful for everybody if
> you could send all the final configuration files and the exact chanes
> you had to apply to the default files to get your configuration up and
> running.
The configuration is in CVS. As I'm likely to forget mentioning most of
the changes, it's probably better to do a diff on the different CVS versions.
I'll check that tomorrow, but I fear that the changes are kind of
un-understandable ;-).
Peer
--
Peer Hasselmeyer [log in to unmask]
Forschungszentrum Karlsruhe Tel.: +49-7247-828601
Institute for Scientific Computing / Inst. f. Wissenschaftliches Rechnen
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
|