JISCMail - TB-SUPPORT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
TB-SUPPORT Archives

TB-SUPPORT@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		TB-SUPPORT Home
		TB-SUPPORT April 2004
Options

Subscribe or Unsubscribe
Get Password
Subject:
many testbeds
From:
Steve Traylen <[log in to unmask]>
Reply-To:
Testbed Support for GridPP member institutes <[log in to unmask]>
Date:
Wed, 7 Apr 2004 15:10:42 +0100
Content-Type:
MULTIPART/MIXED
Parts/Attachments:
TEXT/PLAIN (62 lines) , lcg2-install-notes.txt (1 lines)

Hi *,

 It seems various people and sites are wanting to join various testbeds
 and grids so I thought it would be worth trying to explain them and how
 to get started with any of them.

 There basically 4 distributed testbeds.

 JRA1                  EGEE Middleware development, a very dev dev testbed,
                       this is CERN, NIKHEF and RAL only.

 LCG2                  A large scale production grid.

 EGEE Production       Basically this is LCG2 as far as I can tell and at this
                       time is exactly the same thing. Exact details on how
                       to join this are in the pipeline.

 EGEE Pre-Production   The next EGEE/LCG software release version

 The most common question is to which should I join if I want to run
 data challenges for HEP experiments and the answer in this case is
 definetly LCG2.

 The current release of LCG2 was released half way through me writing
 this mail and is LCG2.0.0. For the first time it has been recomended
 that all sites at LCG1 do an upgrade though in reality I think just
 about every one has now made this transistion.

 How to start on LCG2.

 export CVS_RSH=ssh
 cvs -d :pserver:[log in to unmask]:/cvs/lcgdeploy co \
         -r LCG-2_0_0 lcg2

 and read lcg2-install-notes.txt which details how to join. Within
 these notes it mentions a tier1 contact and that person is me.

 I have attached them as well.

 I think it is worth trying to announce/agree/discuss what some of
 the other GridPP sites plan to do. I know problably 4 or 5 are in or
 working towards LCG2 and this is very likely candiadate for others
 as well.

 A phone conference has been suggested and can be arranged for early next
 week if considered useful. This will be in time to bring our plans to the
 EGEE conference the following week.

    Steve




--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/





===============================================================================

========================== LCG-2 Installation notes ===========================

===============================================================================

=========== C 2004 by Emanuele Leonardi - [log in to unmask] ===========

===============================================================================



Reference tag: LCG-2_0_0



These notes will assist you in installing the latest LCG-2 tag and upgrading from 

the previous tag.



The document is not a typical release note. It covers in addition some 

general aspects related to LCG2.



This document is intended for:



1) Sites that run LCG2 and need to upgrade to the current version

   

2) Sites that move from LCG1 to LCG2



3) Sites that join the LCG



4) Sites that operate LCG2





What is LCG?

============

This is best answered by material found on the projects web site

http://lcg.web.cern.ch/LCG/ . From there you can find information about the 

nature of the project and its goals.

At the end of the introduction you can find a section that collects most of the references.

 



How to join LCG2?

================

If you want to join LCG and add resources to it you should contact the LCG deployment manager Ian Bird 

([log in to unmask]) to establish the contact with the project.



If you only want to use LCG you can follow the steps described in the LCG User Overview 

(http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm). The registration and 

initial training using the LCG-2 Users Guide (https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf)

should take about a week. However only 8 hours is related to working with the system, while

the majority is waiting  for the registration process with the VOs and the CA.



If you are interested in adding resources to the system you should first register as a user and 

subscribe to the LCG Rollout mailing list (http://www.listserv.rl.ac.uk/archives/lcg-rollout.html).

In addition you need to contact the Grid Operation Center (GOC) (http://goc.grid-support.ac.uk/gridsite/gocmain/)

and get access to the GOC-DB for registering your resources with them. This registration is the basis for your

system being present in their monitoring. It is mandatory to register at least your service nodes in the 

GOC DB. It is not necessary to register all farm nodes. Please see Appendix H for a detailed description.



LCG has introduced a hierarchical support model for sites. Regions have primary sites (P-sites) that supports the

smaller centers in this region. If you do not know who is your primary site, please contact the LCG deployment manager

Ian Bird. If you have identified your primary site you should fill the form that you find at the end of the guide in appendix G

and send it to your primary site AND to the deployment team at CERN ([log in to unmask]). 

The site security contacts and sysadmins will receive material from the LCG security team that describes the security policies 

of LCG.  



Discuss with the grid deployment team or with your primary site a suitable layout for your site. Various configurations are 

possible. Experience has shown that using at the beginning a standardized small setup and evolve from this to a larger more

complex system is highly advisable. 

Typical layout for a minimal site is a user interface node (UI) which allows to submit jobs to the grid. This

node will use the information system and resource broker either from the primary site, or the CERN site.

A site that can provide resources will add a computing element (CE), that acts as a gateway to the computing resources and 

a storage element (SE), that acts as a gateway to the local storage. In addition a few worker nodes (WN) to provide the 

computing power can be added.



Large sites with many users that submit a large number of jobs will add a resource broker (RB). The resource broker distributes the 

jobs to the sites that are available to run jobs and keeps track of the status of the jobs. The RB uses for the resource discovery an 

information index (BDII). It is good practice to setup a BDII on each site that operates a RB. A complete site will add a Proxy server node that 

allows the renewal of proxy certificates. 



In case you don't find a setup described in this installation guide that meets your needs you should contact your primary site for further 

help.



After a site has been setup the site manager, or the support persons of the primary sites should run the 

initial tests that are described in the first part of the chapter on testing.



If these tests have been run successful the site should contact the deployment team via e-mail. The mail should 

contain the sites GIIS name and the hostname of the GIIS.

To allow further testing the site will be added to a LCG-BDII which is used for testing new sites.

Then the primary site, or the site managers can run the additional tests described.



When a site has passed these tests the site or the primary site will announce this to the deployment team which then 

after a final round of testing will add the site to the list of core sites.





How to report problems

=======================

The way problems are reported is currently changing. On the LCG user introduction page (http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm)

you can find information on the current appropriate way to report problems. 

Before reporting a problem you should first try to consult your primary site. Many problems are currently reported to the

rollout list. Internally we still use a Savannah based bug tracking tool that can be accessed via this 

link https://savannah.cern.ch/bugs/?group=lcgoperation.





How to setup your site

======================

With this release you have the option to either install and configure your site using LCFGng, a fabric management tool

that is supported by LCG, or to install the nodes following a manual step by step description which can be used as

a basis to configure your local fabric management system.



For very small sites the manual approach has the advantage that no learning of the tool is required and no extra node needs

to be maintained. In addition no reinstallation of your nodes is required.

However, the maintenance of the nodes will require more work and it is more likely to introduce hidden misconfigurations.



For medium to larger sites without their own fabric management tools using LCFGng can be an advantage.

It is up to a site to decide which method is preferred.



The documentation for the manual installation can be found here:



 http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/



We currently support all node types with the exception of the PROXY server node. This will follow soon.

In case you decide to use the manual setup you should nevertheless have a look at parts of this

document. For example the section about firewalls and testing are valid for both installation methods.





Network access

===============



The current software requires outgoing network access from all the nodes. And incoming on the RB, CE, and SE.



Some sites have gained experience with running their sites through a NAT.

We can provide contact information of sites with experience of this setup.



To configure your firewall you should use the port table that we provide as a reference.

Please have a look at the chapter on firewall configuration.



General Note on Security

=========================

While we provide in our repositories Kernel RPMs and use for the configuration certain versions  

it has to be pointed out that you have to make sure that you consider the kernel that you install as 

safe. If the provided default is not what you want please replace it.





Sites Moving From LCG1 to LCG2

===============================



Since LCG2 is significantly different from both LCG1 and EDG, it is mandatory

to study this guide even for administrators with considerable experience. In

case you see the need to deviate from the described procedures please contact

us.



Due to the many substantial changes w.r.t LCG1, updating a site from any of the

LCG1 releases to LCG-2 is not possible in a reliable way. 



A complete re-installation of the site is the only supported procedure.



Another change is related to the CVS repository used. For CERN internal reasons 

we had to move to a different server and switch to a different authorization scheme.

See http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/

for details about getting access to the CVS repository.



For web based browsing the access to CVS is via http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/



As described later we changed for LCG2 the directory structure in CVS. There are now two 

relevant directories lcg2 and lcg2-sites. The first contains common elements while the later contains the

site specific information.   



In addition to the installation via LCFGng an increasing number of node types 

is now supported to be installed manually:

These are:

Worker Nodes (WN), User Interfaces (UI), Computing Elements (CE), classical Storage 

Elements (SEs), and the BDII 

The Proxy Server is in preparation and almost finished. 

 



References:

===========



Documentation:

-------------------------



LCG Project Homepage:

  http://lcg.web.cern.ch/LCG/



Starting point for users of the LCG infrastructure:

  http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm



LCG-2 User's Guide:

  https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf



LCFGng server installation guide:

http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/lcg2/docs/LCFGng_server_install.txt



LCG-2 Manual Installation Guide:

http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/



LCG GOC Mainpage:

  http://goc.grid-support.ac.uk/gridsite/gocmain/



CVS User's Guide:

  http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/





Registration:

--------------------------



LCG rollout list:

  http://www.listserv.rl.ac.uk/archives/lcg-rollout.html

  - join the list



Get the Certificate and register in VO:

  http://lcg-registrar.cern.ch/

  - read LCG Usage Rules

  - choose your CA and contact them to get USER certificate (for some CAs online

    certificate request is possible)

  - load your certificate into web browser (read instructions)

  - choose your VO and register (LCG Registration Form)



GOC Database:

  http://goc.grid-support.ac.uk/gridsite/db-auth-request/

  - apply for access to the GOCDB



CVS read-write access and site directory setup:

  mailto:[log in to unmask]

  - prepare and send a NAME for your site following the schema

    <domain>-<organization>[-<section>]

    (e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF)



Site contact database:

  mailto:[log in to unmask]

  - fill in the form in Appendix G and send it



Report bugs and problems with installation:

  https://savannah.cern.ch/bugs/?group=lcgoperation







Notes about lcg2-20040407

-------------------------------------



In the previous beta-release the new LCG-BDII node type has been introduced.

And for some time the two information system structures have been operated in 

parallel.

Since we expect many sites to move from LCG1 to LCG2 we will switch now

permanently to the new layout which we describe later in some

detail.



The new LCG-BDII does not use any more on the Regional MDSes but collects information 

directly from the Site GIISes. The list of existing sites and their addresses are downloaded 

from a pre-defined web location. See notes in the BDII specific section in this document for

installation and configuration. This layout will allow sites and VOs to configure their own

super- or subset of the LCG2 resources.



A new Replica Manager client has also been introduced in the previous version. 

This is the only client which is compatible with the current version of the RLS server, so file

replication at your site will not work till you have updated to this release.



Introduction and overall setup

==============================



In this text we will assume that you are already familiar with the LCFGng

server installation and management. 



Please refer to the LCFGng_server_install.txt file in the docs directory of the

lcg2 release for an up-to-date guide on how to set-up an LCFGng server for use

with LCG-2.



Note for sites which are already running LCG1: due to the incompatible update

of several configuration objects, a LCFG server cannot support both LCG1 and

LCG-2 nodes. If you are planning to re-install your LCG1 nodes with LCG-2, then

the correct way to proceed is:



1) kill the rdxprof process on all your nodes (or just switch your nodes off

   if you do not care about the extra down-time at your site);

2) update your LCFG server using the objects listed in the LCG-2 release;

3) prepare the new configuration files for your site as described in this

   document;

4) re-install all your nodes.



If you plan to keep your LCG1 site up while installing a new LCG-2 site, then

you will need a second LCFG server. This is a matter of choice. The LCG1 installation

is of very limited use if you setup the LCG-2 site since several core components are

not compatible anymore.



Files needed for the current LCG-2 release are available from a CVS server at

CERN. This CVS server contains the list of rpms to install and the LCFGng

configuration files for each node type. The CVS area, called "lcg2", can be

reached from http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/  



Note1:

at the same location there is another directory called "lcg-release":

this area is used for the integration and certification software, NOT for

production. Please ignore it!



Note2: 

documentation about access to this CVS repository can be found in 

http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/



In the same CVS location we created an area, called lcg2-sites, where all

sites participating to LCG-2 should store the configuration files used to

install and configure their nodes. Each site manager will find there a

directory for their site with a name in the format



<domain>-<city>-<institute>



or



<domain>-<organization>[-<section>]



(e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF): this is where all site

configuration files should be uploaded.

Site managers that install a site 



Site managers are kindly asked to keep these directories up-to-date by committing

all changes they do to their configuration files back to CVS so that we will be

able to keep track of the status of each site at any given moment. Once a site

reaches a consistent working configuration, site managers should create a CVS

tag which will allow them to easily recover configuration information if

needed. 

Tag names should follow the following convention:

The tags of the LCG-2 modules are: LCG2-<RELEASE>, e.g. LCG2-1_1_1 for software release 1.1.1  



If you tag your local configuration files, the tag name must contain a reference to the lcg2 

release in use at the time. The format to use is:

 LCG2-<RELEASE>_<SITENAME>_<DATE>_<TIME> 

e.g. LCG2-1_1_1_CERN_20031107_0857 for configuration files in use at CERN on November 7th, 2003, at 8:57 AM. 

The lcg2 release used for this example is 1.1.1.  



To activate a write-enabled account to the CVS repository at CERN please get in

touch with Louis Poncet < [log in to unmask] > .



Judit Novak <[log in to unmask]> or Markus Schulz <[log in to unmask]> are the persons to contact if you 

do not find a directory for your site or if you have problems uploading your configuration files to CVS. 



If you just want to install a site, but not join LCG, you can get anonymous read access to the 

repository. As described in the CVS access guide set the CVS environment variables.



Set CVS_RSH to : setenv CVS_RSH ssh 

Set CVSROOT to : setenv CVSROOT :pserver:[log in to unmask]:/cvs/lcgdeploy





All site managers have in any case to subscribe to and monitor the LCG-Rollout

mailing list. Here all issues related to the LCG deployment, including

announcements of updates and security patches, are discussed. You can subscribe

from the following site:

http://cclrclsv.RL.AC.UK/archives/lcg-rollout.html and click on the "Join or leave the list"



This is the main source for communicating problems and changes.





Preparing the installation of current tag

=========================================



The current LCG tag is ---> LCG-2_0_0 <---



In the following instructions/examples, when you see the <CURRENT_TAG> string,

you should replace it with the name of the tag defined above.



To install it, check it out on your LCFG server with



> cvs checkout -r <CURRENT_TAG> -d <TAG_DIRECTORY> lcg2



Note: the "-d <TAG_DIRECTORY>" will create a directory named <TAG_DIRECTORY>

and copy there all the files. If you do not specify the -d parameter, the

directory will be a subdirectory of the current directory named lcg2.



The default way to install the tag is to copy the content of the rpmlist

subdirectory to the /opt/local/linux/7.3/rpmcfg directory on the LCFG server.

This directory is NFS-mounted by all client nodes and is visible as

/export/local/linux/7.3/rpmcfg



Go to the directory where you keep your local configuration files. If you want

to create a new one, you can check out from CVS any of the previous tags with:



> cvs checkout -r <YOUR_TAG> -d <LOCAL_DIR> lcg2/<YOUR_SITE>



If you have not committed any configuration file yet or if you want to use the

latest (HEAD) versions, just omit the "-r <YOUR_TAG>" parameter.



Now cd to <LOCAL_DIR> and copy there the files from <TAG_DIRECTORY>/examples:

following the instructions in the 00README file, those in the example files

themselves, and those reported below in this document you should be able to

create an initial version of the configuration files for your site. If you have

problems, please contact your reference primary site.



NOTE: if you already have localized versions of these files, just compare

them with the new templates to verify that no new parameter needs to be set.

Be aware that there are several critical differences between LCG1 and LCG-2

site-cfg.h files, so apply extra care when updating this file.



IMPORTANT NOTICE: If you have a CE configuration file from LCG1, it probably

includes the definition of the secondary regional MDS for your region. This is

now handled by the ComputingElement-cfg.h configuration file and can be

configured directly from the site-cfg.h file. See Appendix E for details.



To download all the rpms needed to install this version you can use the

updaterep command. In <TAG_DIRECTORY>/tools you can find 2 configuration

files for this script: updaterep.conf and updaterep_full.conf. The first will

tell updaterep to only download the rpms which are actually needed to install

the current tag, while updaterep_full.conf will do a full mirror of the LCG rpm

repository. Copy updaterep.conf to /etc/updaterep.conf and run the updaterep

command. By default all rpms will be copied to the /opt/local/linux/7.3/RPMS

area, which is visible from the client nodes as /export/local/linux/7.3/RPMS.

You can change the repository area by editing /etc/updaterep.conf and modifying

the REPOSITORY_BASE variable.



IMPORTANT NOTICE: as the list and structure of Certification Authorities (CA)

accepted by the LCG project can change independently from the middle-ware

releases, the rpm list related to the CAs certificates and URLs has been

decoupled from the standard LCG release procedure. This means that the version

of the security-rpm.h file contained in the rpmlist directory associated to the

current tag might be incomplete or obsolete. 

Please go to the URL http://markusw.home.cern.ch/markusw/lcg2CAlist.html 

and follow the instructions there to update all CA-related settings. Changes and updates of

these settings will be announced on the LCG-Rollout mailing list.



To make sure that all the needed object rpms are installed on your LCFG server,

you should use the lcfgng_server_update.pl script, also located in

<TAG_DIRECTORY>/tools. This script will report which rpms are missing or

have the wrong version and will create the /tmp/lcfgng_server_update_script.sh

script which you can then use to fix the server configuration. Run it in the

following way:



> lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-common-rpm.h

> /tmp/lcfgng_server_update_script.sh

> lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-server-rpm.h

> /tmp/lcfgng_server_update_script.sh



WARNING: please always give a look to /tmp/lcfgng_server_update_script.sh

and verify that all rpm update commands look reasonable before running it.



In the source directory you should give a look to the redhat73-cfg.h file

and see if the location of the rpm lists (updaterpms.rpmcfgdir) and of the rpm

repository (updaterpms.rpmdir) are correct for your site (the defaults are

consistent with the instructions in this document). If needed, you can redefine

these paths from the local-cfg.h file.



In private-cfg.h you can (must!) replace the default root password with the one

you want to use for your site:



+auth.rootpwd <CRYPTED_PWD> <--- replace with your own crypted password



To obtain <CRYPTED_PWD> using the MD5 encryption algorithm (stronger than the

standard crypt method) you can use the following command:



> openssl passwd -1



This command will prompt you to insert the clear text version of the password

and then print the encrypted version. E.g.



> openssl passwd -1

Password:                                    <- write clear text password here

$1$iPJJEhjc$rtV/65l890BaPinzkb58z1           <- <CRYPTED_PWD> string



To finalize the adaptation of the current tag to your site you should edit your

site-cfg.h file. If you already have a site-cfg.h file that you used to install

any of the LCG1 releases, you can find a detailed description of the

modifications to this file needed for the new tag in Appendix E below.



WARNING: the template file site-cfg.h.template assumes you want to run the

PBS batch system without sharing the /home directory between the CE and all the

WNs. This is the recommended setup.



There may be situations when you have to run PBS in traditional mode, i.e. with

the CE exporting /home with NFS and all the WNs mounting it. This is the case,

e.g., if your site does not allow for host based authentication. To revert to

the traditional PBS configuration you can edit your site-cfg.h file and comment

out the following two lines:



#define NO_HOME_SHARE

...

#define CE_JM_TYPE lcgpbs



In addition to this, your WN configuration file should include this line:



#include CFGDIR/UsersNoHome-cfg.h"



just after including Users-cfg.h (please note that BOTH Users-cfg.h AND

UsersNoHome-cfg.h must be included).



Storage

=======



In the current version LCG still uses the "Classical SE" model. This consists

into a storage system (either a real MSS or just a node connected to some

disks) which exports a GridFTP interface. Information about the SE must be

published by a GRIS registered to the Site GIIS.



If your SE is a completely independent node connected to a bunch of disks

(these can either be local or mounted from a disk server) then you can install

this node using the example SE_node file: this will install and configure on

the node all needed services (GridFTP server, GRIS, authentication system).



If you plan to use a local disk as the main storage area, you can include the

flatfiles-dirs-SECLASSIC-cfg.h file: LCFG will take care of creating all

needed directories with the right access privileges.



If on the other hand your SE node mounts the storage area from a disk server,

then you will have to create all needed directories and set their privileges by

hand. Also, you will have to add to the SE node configuration file the correct

commands to NFS-mount the area from the disk server.



As an example, let's assume that your disk server node is called <server>

and that it exports area <diskarea> for use by LCG. On your SE you want to

mount this area as /storage and then allow access to it via GridFTP.



To this end you have to go through the following steps:



1) in site-cfg.h define



#define CE_CLOSE_SE_MOUNTPOINT  /storage



2) in the SE_node configuration file add the lines to mount this area from

   <server>:



EXTRA(nfsmount.nfsmount) storage

nfsmount.nfsdetails_storage /storage <server>:<diskarea> rw



3) once the SE node is installed and /storage has been mounted, create all VO

   directories, one per supported VO, giving read/write access to the

   corresponding group. For VO <vo>:



> mkdir /storage/<vo>

> chgrp <vo> /storage/<vo>

> chmod g+w /storage/<vo>



A final possibility is that at your site a real mass storage system with a

GridFTP interface is already available (this is the case for the CASTOR MSS at

CERN). In this case, instead of installing a full SE, you will need to install

a node which act as a front-end GRIS for the MSS, publishing to the LCG

information system all information related to the MSS.



This node is called a PlainGRIS and can be installed using the PG_node file

from the examples directory. Also, a few changes are needed in the site-cfg.h

file. Citing from site-cfg.h.template:



/* For your storage to be visible from the grid you must have a GRIS which

 * publishes information about it. If you installed your SE using the classical

 * SE configuration file provided by LCG (StorageElementClassic-cfg.h) then a

 * GRIS is automatically started on that node and you can leave the default

 * settings below. If your storage is based on a external MSS system which

 * only provides a GridFTP interface (an example is the GridFTP-enabled CASTOR

 * service at CERN), then you will have to install an external GRIS server

 * using the provided PlainGRIS-cfg.h profile. In this case you must define

 * SE_GRIS_HOSTNAME to point to this node and define the SE_DYNAMIC_CASTOR

 * variable instead of SE_DYNAMIC_CLASSIC (Warning: defining both variables at

 * the same time is WRONG!).

 *

 * Currently the only supported external MSS is the GridFTP-enabled CASTOR used

 * at CERN.

 */

#define SE_GRIS_HOSTNAME        SE_HOSTNAME

#define SE_DYNAMIC_CLASSIC

/* #define SE_DYNAMIC_CASTOR */



Firewall configuration

======================



If your LCG nodes are behind a firewall, you will have to ask your network

manager to open a few "holes" to allow external access to some LCG service

nodes.



A complete map of which port has to be accessible for each service node is

provided in file lcg-port-table.pdf in the lcg2/docs directory. 

Note that the file is in addition present in LaTeX format.



Node installation and configuration

===================================



In the <TAG_DIRECTORY>/tools you can find a new version of the do_mkxprof.sh

script. A detailed description of how this script works is contained in the

script itself. You are of course free to use your preferred call to the mkxprof

command but note that running mkxprof as a daemon is NOT recommended and can

easily lead to massive catastrophes if not used with extreme care: do it at

your own risk.



To create the LCFG configuration for one or more nodes you can do



> do_mkxprof.sh node1 [node2 node3, ...]



If you get an error status for one or more of the configurations, you can get

a detailed report on the nature of the error by looking into URL



http://<Your_LCFGng_Server>/status/



and clicking on the name of the node with a faulty configuration (a small red

bug should be shown beside the node name).



Once all node configurations are correctly published, you can proceed and

install your nodes following any one of the installation procedures described

in the "LCFGng Server Installation Guide" mentioned above

(LCFGng_server_install.txt).



When the initial installation completes (expect two automatic reboots in the

process), each node type requires a few manual steps, detailed below, to be

completely configured. After completing these steps, some of the nodes need

a final reboot which will bring them up with all the needed services active.

The need for this final reboot is explicitly stated among the node

configuration steps below.



Common steps

------------



-- On the ResourceBroker, MyProxy, StorageElement, and ComputingElement nodes

   you must install the host certificate/key files in /etc/grid-security with

   names hostcert.pem and hostkey.pem. Also make sure that hostkey.pem is only

   readable by root with



   > chmod 400 /etc/grid-security/hostkey.pem



-- All Globus services grant access to LCG users according to the certificates

   listed in the /etc/grid-security/grid-mapfile file. The list of VOs included

   in grid-mapfile is defined in /opt/edg/etc/edg-mkgridmap.conf. This file is

   now handled automatically by the mkgridmap LCFG object. This object takes

   care of enabling only the VOs accepted at each site according to the

   SE_VO_<VO> definitions in site-cfg.h. If you need to modify the default

   configuration for your site, e.g. by adding users to grid-mapfile-local,

   you can do this from your local-cfg.h file by following the examples in

   <TAG_DIRECTORY>/source/mkgridmap-cfg.h.



   After installing a ResourceBroker, StorageElement, or ComputingElement node

   you should force a first creation of the grid-mapfile by running



> /opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/grid-mapfile --safe



   Every 6 hours a cron job will repeat this procedure and update grid-mapfile.



UserInterface

-------------



No additional configuration steps are currently needed on a UserInterface node.



ResourceBroker

--------------



-- Configure the MySQL database. See detailed recipe in Appendix C at the end

of this document



-- Reboot the node



ComputingElement

----------------



-- Configure the PBS server. See detailed recipe in Appendix B at the end of

   this document.



-- Create the first version of the /etc/ssh/ssh_known_hosts file by running



   > /opt/edg/sbin/edg-pbs-knownhosts



   A cron job will update this file every 6 hours.



-- If your CE is NOT sharing the /home directory with your WNs (this is the

   LCG-2 default configuration: if you have modified site-cfg.h to run PBS in

   traditional mode as described in a previous chapter, just ignore the

   following instructions) then you have to configure sshd to allow WNs to copy

   job output back to the CE using scp. This requires the following two steps:



1) modify the sshd configuration. Edit the /etc/ssh/sshd_config file

and add these lines at the end:



HostbasedAuthentication yes

IgnoreUserKnownHosts yes

IgnoreRhosts yes



and then restart the server with 



> /etc/rc.d/init.d/sshd restart



2) configure the script enabling WNs to copy output back to the CE.



 - in /opt/edg/etc, copy edg-pbs-shostsequiv.conf.template to

   edg-pbs-shostsequiv.conf then edit this file and change parameters to your

   needs. Most sites will only have to set NODES to an empty string.



 - create the first version of the /etc/ssh/shosts.equiv file by running 



   > /opt/edg/sbin/edg-pbs-shostsequiv



   A cron job will update this file every 6 hours.



Note: every time you will add or remove WNs, do not forget to run



> /opt/edg/sbin/edg-pbs-shostsequiv    <--- only if you do not share /home

> /opt/edg/sbin/edg-pbs-knownhosts



on the CE or the new WNs will not work correctly till the next time cron runs

them for you.



-- The CE is supposed to export information about the hardware configuration

   (i.e. CPU power, memory, disk space) of the WNs. The procedure to collect

   these informations and publish them is described in Appendix D of this

   document.



-- Reboot the node



-- If your CE exports the /home area to all WNs, then after rebooting it make

   sure that all WNs can still see this area. If this is not the case, execute

   this command on all WNs:



   > /etc/obj/nfsmount restart



WorkerNode

----------



-- The default allowed maximum number of open file on a RedHat node is only

   26213. This number might be too small if users submit file-hungry jobs (we

   already had one case) so you may want to increase it on your WNs. At CERN we

   currently use 256000. To set this parameter you can use this command:



   > echo 256000 > /proc/sys/fs/file-max



   You can make this setting reboot-proof by adding the following code at the

   end of your /etc/rc.d/rc.local file:



# Increase max number of open files

if [ -f /proc/sys/fs/file-max ]; then

    echo 256000 > /proc/sys/fs/file-max

fi



-- Every 6 hours each WN needs to connect to the web sites of all known CAs to

   check if a new CRL (Certificate Revocation List) is available. As the script

   which handles this functionality uses wget to retrieve the new CRL, you can

   direct your WNs to use a web proxy. This is mandatory if your WNs sit on a

   hidden network with no direct external connectivity.



   To redirect your WNs to use a web proxy you should edit the /etc/wgetrc file

   and add a line like:



http_proxy = http://web_proxy.cern.ch:8080/ where you should replace the node name and the port to match those of your

   web proxy.



   Note: I could not test this recipe directly as I am not aware of a web proxy

   at CERN. If you try it and find problems, please post a message on the

   lcg-rollout list.



-- If your WNs are NOT sharing the /home directory with your CE (this is the

   default configuration) then you have to configure ssh to enable them to copy

   job output back to the CE using scp. To this end you have to modify the ssh

   client configuration file /etc/ssh/ssh_config adding these lines at the end:



Host *

     HostbasedAuthentication yes



Note: the "Host *" line might already exist. In this case, just add the second

line after it.



-- Create the first version of the /etc/ssh/ssh_known_hosts file by running



   > /opt/edg/sbin/edg-pbs-knownhosts



   A cron job will update this file every 6 hours.



StorageElement

--------------



-- Make sure that the storage area defined with CE_CLOSE_SE_MOUNTPOINT exists

   and contains the VO specific sub-directories with the correct access

   privileges (group=<VO> and r/w access for the group).



-- Reboot the node.



PlainGRIS

---------



No additional configuration steps are currently needed on a PlainGRIS node.



BDII Node

---------



The BDII node using the regional GIISes is no longer supported. It has been replaced by 

the LCG-BDII.



LCG-BDII node

-------------



This is the current version of the BDII service which does not rely on Regional

MDSes. If you want to install the new service then you should use the

LCG-BDII_node example file from the "examples" directory. After installation

the new LCG-BDII service does not need any further configuration: the list of

available sites will be automatically downloaded from the default web location

defined by SITE_BDII_URL in site-cfg.h and the initial population of the

database will be started. Expect a delay of a couple of minutes from when the

machine is up and when the database is fully populated.



If for some reason you want to use a static list of sites, then you should

copy the static configuration file to /opt/lcg/var/bdii/lcg-bdii-update.conf

and add this line at the end of your LCG-BDII node configuration file:



+lcgbdii.auto   no



If you need a group of BDIIs being centrally managed and see a different set of 

sites than those defined by URL above you can setup a web-server and publish the 

web page containing the sites. The URL for this file has to be used to configure the 

SITE_BDII_URL in the site-cfg.h. Leave the lcgbdii.auto to yes.



This file has the following structure:





##################################################################

#

# BDII web configuration file for 24 sites.

#

# This file has been download from the web.

#

#

##################################################################

                                                                               

Date=02/06/04 22:01

                                                                               

##################################################################

#

# Locations of lcg-bdii-update configuration files.

#

##################################################################

                                                                               

http://grid-deployment.web.cern.ch/grid-deployment/gis/cms-bdii-update.conf

##################################################################

#

#Ldap URLs for use by bdii

#

##################################################################

#CERN, Geneva, Switzerland

CERN-LCG2 ldap://lxn1181.cern.ch:2135/mds-vo-name=cernlcg2/o=grid

#CNAF, Italy

CNAF-LCG2 ldap://wn-04-07-02-a.cr.cnaf.infn.it:2135/mds-vo-name=cnaflcg2/o=grid

#RAL, UK

RAL-LCG2 ldap://lcgce02.gridpp.rl.ac.uk:2135/mds-vo-name=rallcg2/o=grid





Change the URL to the the URL of the file.

Add or remove sites. To make the BDIIs realize the change you have to 

change the Date field. Don't forget this.



Regional MDS Node

-----------------



No more regional MDS nodes are installed since the system based on the LCG-BDII

doesn't require them any more.



MyProxy Node

------------



-- Reboot the node after installing the host certificates (see "Common Steps"

   above).



Testing

-------



IMPORTANT NOTICE: if /home is NOT shared between CE and WNs (this is the

default configuration) due to the way the new jobmanager works, a

globus-job-run command will take at least 2 minutes. Even in the configuration

with shared /home the execution time of globus-job-run will be slightly longer

than before. Keep this in mind when testing your system.



To perform the standard tests (edg-job-submit & co.) you need to have your

certificate registered in one VO and to sign the LCG usage guidelines.



Detailed information on how to do these two steps can be found in : 

http://lcg-registrar.cern.ch/ 

If you are working in one of the four LHC experiments, then ask for

registration in the corresponding VO, otherwise you can choose the "LCG

Deployment Team" (aka DTeam) VO.



A test suite which will help you in making sure your site is correctly

configured is now available. This software provides basic functionality tests

and various utilities to run automated sequences of tests and to present

results in a common HTML format.



Extensive on-line documentation about this test suite can be found in 



http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG-Certification-help 



Note that this test suite has not been updated for LCG-2, yet. Nonetheless, all

tests related to job submission should work out of the box.



In Appendix H you can find some core tests that should be run certify that the site

is providing the core functionality.



Appendix A

==========



Syntax for the MDS_HOST_LIST variable

-------------------------------------



This appendix is no longer needed since with the introduction of the 

LCG-BDII no configuration related to regional MDSs is needed. 



Appendix B

==========



How to configure the PBS server on a ComputingElement

-----------------------------------------------------



1) load the server configuration with this command (replace <CEhostname> with

the hostname of the CE you are installing):



@-----------------------------------------------------------------------------

/usr/bin/qmgr <<EOF



set server scheduling = True

set server acl_host_enable = False

set server managers = root@<CEhostname>

set server operators = root@<CEhostname>

set server default_queue = short

set server log_events = 511

set server mail_from = adm

set server query_other_jobs = True

set server scheduler_iteration = 600

set server default_node = lcgpro

set server node_pack = False



create queue short

set queue short queue_type = Execution

set queue short resources_max.cput = 00:15:00

set queue short resources_max.walltime = 02:00:00

set queue short enabled = True

set queue short started = True



create queue long

set queue long queue_type = Execution

set queue long resources_max.cput = 12:00:00

set queue long resources_max.walltime = 24:00:00

set queue long enabled = True

set queue long started = True



create queue infinite

set queue infinite queue_type = Execution

set queue infinite resources_max.cput = 48:00:00

set queue infinite resources_max.walltime = 72:00:00

set queue infinite enabled = True

set queue infinite started = True

EOF

@-----------------------------------------------------------------------------



Note that queues short, long, and infinite are those defined in the site-cfg.h

file and the time limits are those in use at CERN. Feel free to

add/remove/modify them to your liking but do not forget to modify site-cfg.h

accordingly.



2) edit file /var/spool/pbs/server_priv/nodes to add the list of WorkerNodes

you plan to use. An example setup for CERN could be:



@-----------------------------------------------------------------------------

lxshare0223.cern.ch np=2 lcgpro

lxshare0224.cern.ch np=2 lcgpro

lxshare0225.cern.ch np=2 lcgpro

lxshare0226.cern.ch np=2 lcgpro

@-----------------------------------------------------------------------------



where np=2 gives the number of job slots (usually equal to #CPUs) available

on the node, and lcgpro is the group name as defined in the default_node

parameter in the server configuration.



3) Restart the PBS server



> /etc/rc.d/init.d/pbs_server restart



Appendix C

==========



How to configure the MySQL database on a ResourceBroker

-------------------------------------------------------



Log as root on your RB node, represented by <rb_node> in the example, and make

sure that the mysql server is up and running:



> /etc/rc.d/init.d/mysql start



If it was already running you will just get notified of the fact.



Now you can choose a DB management <password> you like (write it down

somewhere!) and then configure the server with the following commands:



> mysqladmin password <password>

> mysql --password=<password> \

        --exec "set password for root@<rb_node>=password('<password>')" mysql

> mysqladmin --password=<password> create lbserver20

> mysql --password=<password> lbserver20 < /opt/edg/etc/server.sql

> mysql --password=<password> \

        --exec "grant all on lbserver20.* to lbserver@localhost" lbserver20



Note that the database name "lbserver20" is hardwired in the LB server code

and cannot be changed so use it exactly as shown in the commands.



Make sure that /var/lib/mysql has the right permissions set (744).



Appendix D

==========



Publishing WN information from the CE

-------------------------------------



When submitting a job, users of LCG are supposed to state in their jdl the

minimal hardware resources (memory, scratch disk space, CPU time) required

to run the job. These requirements are matched by the RB with the information

on the BDII to select a set of available CEs where the job can run.



For this schema to work, each CE must publish some information about the

hardware configuration of the WNs connected to it. This means that site

managers must collect information about WNs available at the site and insert

it in the information published by the local CE.



The procedure to do this is the following:



- choose a WN which is "representative" of your batch system (see below for a

  definition of "representative") and make sure that the chosen node is fully

  installed and configured. In particular, check if all expected NFS partitions

  are correctly mounted.



- on the chosen WN run the following script as root, saving the output to a

  file.



@-----------------------------------------------------------------------------

#!/bin/bash

echo -n 'hostname: '

host `hostname -f` | sed -e 's/ has address.*//'

echo "Dummy: `uname -a`"

echo "OS_release: `uname -r`"

echo "OS_version: `uname -v`"

cat /proc/cpuinfo /proc/meminfo /proc/mounts

df

@-----------------------------------------------------------------------------



- copy the obtained file to /opt/edg/var/info/edg-scl-desc.txt on your CE,

  replacing any pre-existing version.



- restart the GRIS on the CE with



  > /etc/rc.d/init.d/globus-mds restart



Definition of "representative WN": in general, WNs are added to a batch system

at different times and with heterogeneous hardware configurations. All these

WNs often end up being part of a single queue, so that when an LCG job is sent

to the batch system, there is no way to ask for a specific hardware

configuration (note: LSF and other batch systems offer ways to do this but the

current version of the Globus gatekeeper is not able to take advantage of this

possibility). This means that the site manager has to choose a single WN as

"representative" of the whole batch cluster. In general it is recommended that

this node is chosen among the "least powerful" ones, to avoid sending jobs with

heavy hardware requirements to under-spec nodes.



Appendix E

==========



- Modifications to your site-cfg.h file:



As LCG-2 contains some major modifications w.r.t. LCG1, the number of changes

to site-cfg.h is substantially higher than in the past. Here we report all

required changes: please go through them carefully and apply them to your

site-cfg.h file. Also consider the possibility of creating a new site-cfg.h

file starting from site-cfg.h.template in the tag's examples directory.



1) define the disk area to store LCG-specific software



#define LCG_LOCATION_           /opt/lcg

#define LCG_LOCATION_VAR_       LCG_LOCATION_/var

#define LCG_LOCATION_TMP_       /tmp



2) change the published version to LCG-2_0_0



#define SITE_EDG_VERSION LCG-2_0_0



3) be aware that all regional MDSes are no longer present. 

   This functionality is no longer required 



   In addition there is no need anymore to explicitly define the secondary 

   MDS in your site GIIS (i.e. CE) configuration file. This means that you can remove the following settings, 

   if you have them there:



/*  Define a secondary top MDS node */

EXTRA(globuscfg.giis)       site2

EXTRA(globuscfg.giisreg)    site2

globuscfg.localName_site2   SITE_GIIS

globuscfg.regName_site2     TOP_GIIS

globuscfg.regHost_site2     secondary.mds.node



4) the BDII configuration section now includes the URL to the LCG-BDII

   configuration file:



#define SITE_BDII_URL http://grid-deployment.web.cern.ch/grid-deployment/gis/lcg2-bdii-update.conf 5) location of security-related files and directories is now more detailed.

   Replace the old section:



#define SITE_DEF_HOST_CERT    /etc/grid-security/hostcert.pem

#define SITE_DEF_HOST_KEY     /etc/grid-security/hostkey.pem

#define SITE_DEF_GRIDMAP      /etc/grid-security/grid-mapfile

#define SITE_DEF_GRIDMAPDIR   /etc/grid-security/gridmapdir/



   with the new one:



#define SITE_DEF_GRIDSEC_ROOT /etc/grid-security

#define SITE_DEF_HOST_CERT    SITE_DEF_GRIDSEC_ROOT/hostcert.pem

#define SITE_DEF_HOST_KEY     SITE_DEF_GRIDSEC_ROOT/hostkey.pem

#define SITE_DEF_GRIDMAP      SITE_DEF_GRIDSEC_ROOT/grid-mapfile

#define SITE_DEF_GRIDMAPDIR   SITE_DEF_GRIDSEC_ROOT/gridmapdir/

#define SITE_DEF_CERTDIR      SITE_DEF_GRIDSEC_ROOT/certificates/

#define SITE_DEF_VOMSDIR      SITE_DEF_GRIDSEC_ROOT/vomsdir/

#define SITE_DEF_WEBSERVICES_CERT SITE_DEF_GRIDSEC_ROOT/tomcatcert.pem

#define SITE_DEF_WEBSERVICES_KEY  SITE_DEF_GRIDSEC_ROOT/tomcatkey.pem



   changing the various paths if needed.



6) the whole "RLS PARAMETERS" section can be removed, i.e.



/* RLS PARAMETERS  --------------------------------------------------------

...

   RLS server, the RLS-cfg.h file must be edited.  Sorry. */



7) the CE_QUEUES parameter is now a space-separated list (in the past it was

   a comma-separated list):



#define CE_QUEUES               short long infinite



8) all VO software related parameters have been removed from the

   CE_IP_RUNTIMEENV parameter definition:



#define CE_IP_RUNTIMEENV     LCG-2



9) The CE_MOUNTPOINT_SE_AREA and WN_MOUNTPOINT_SE_AREA variables are not used

   anymore: you can remove them from site-cfg.h.



10) StorageElement configuration is now substantially different from LCG1.

    Replace in your site-cfg.h file the full "STORAGE ELEMENT DEFINITIONS"

    section with that from site-cfg.h.template and edit it for your site.



11) a new section is needed to configure the disk areas where VO managers can

    install VO-related software:



/* Area on the WN for the installation of the experiment software */

/* If on your WNs you have predefined shared areas where VO managers can

   pre-install software, then these variables should point to these areas.

   If you do not have shared areas and each job must install the software,

   then these variables should contain a dot ( . )

*/

/* #define WN_AREA_ALICE   /opt/exp_software/alice */

/* #define WN_AREA_ATLAS   /opt/exp_software/atlas */

/* #define WN_AREA_CMS     /opt/exp_software/cms   */

/* #define WN_AREA_LHCB    /opt/exp_software/lhcb  */

/* #define WN_AREA_DTEAM   /opt/exp_software/dteam */

#define WN_AREA_ALICE   .

#define WN_AREA_ATLAS   .

#define WN_AREA_CMS     .

#define WN_AREA_LHCB    .

#define WN_AREA_DTEAM   .



12) the LCFG-LITE installation is not supported: the "LITE INSTALLATION

   SUPPORT" section can be removed.



13) AUTOFS is not supported and the corresponding section can be removed



14) The new monitoring system based on GridICE is now included in the default

    setup. To configure it add to your site-cfg.h file the "GRIDICE MONITORING"

    section from site-cfg.h.template and edit it (if needed) for your site.



15) A few of the UID/GID defined at the end of the old site-cfg.h file are not

    used and can be removed. These are:



    USER_UID_TOMCAT4, USER_GID_TOMCAT4, USER_UID_SE, USER_GID_SE,

    USER_UID_APACHE, USER_GID_APACHE, USER_UID_MAUI, USER_UID_RTCS,

    USER_GID_RMS





Appendix G

===========



Site information needed for the contact data base.

Please fill and send to your primary site and the CERN deployment team

[log in to unmask]

             




============================= START =============================





 0) Preferred name of your site





 ---------------------------------------------





 I. Communication:


 ===========================





  a) Contact email for the site





 


 ---------------------------------





  b) Contact phone for the site





 


 ---------------------------------





  c) Reachable during which hours





 


 ---------------------------------





  d) Emergency phone for the site








 ---------------------------------





  e) Site (computer/network)security contact for your site



      f0) Official name of your institute 




          -----------------------------------




          -----------------------------------

              


      f1) Name and title/role of individual(s) responsible for


          computer/network security at your site





          -----------------------------------




          -----------------------------------





      f2) Personal email for f1)





         ___________________________________




         ___________________________________







      f3) Telephone for f1)







           ----------------------------------




           ----------------------------------







      f4) Telephone for emergency security incident response


            (if different from f3)





            -----------------------------------





            -----------------------------------





      f5) Email for emergency security incident response (listbox preferred)





            ------------------------------------





 g) Write access to CVS

     

    The LCG CVS repository is currently moved to a different CVS server. To access this server 

    a CERN AFS account is required. If you have none please contact Louis Poncet ([log in to unmask])    

    


    AFS account at CERN:



    ------------------------------------



    ------------------------------------





  II)  Site specific information





  a) Domain





     -----------------------------





 e) CA that issued host certificates for your site





    ____________________________________________________________ 





 ============================ END ===============================




Appendix H

===========



This has been provided by David Kant  <[log in to unmask]>




 LCG Site Configuration Database and Grid Operation center (GOC)

 ================================================================



 The GOC will be responsible for monitoring the grid services deployed

 through the LCG middleware at your site. 



 Information about the site is managed by the local site administrator.

 The information we require are the site contact details, list of nodes

 and IP addresses, and the middleware deplyed on those machines 

 (EDG, LCG1, LCG2 etc)



 Access to the database is done through a web browser (https) via the 

 use of an X.509 certificate issued by a trusted LCG CA . 



 GOC monitoring is done hourly and begins with an SQL query of the database

 to extract your site details. Therfore, it is imoprtant to ensure that the information

 in the database is ACCURATE and UP-TO-DATE.



 To request access to the database, load your certificate into your browser and go to:

                   http://goc.grid-support.ac.uk/gridsite/db-auth-request/

 

 The GOC team will then create a customised page for your site and give you access

 rights to these pages. This process should take less than a day and you will receive

 an email confirmation.

 Finally, you can enter your site details:

                   https://goc.grid-support.ac.uk/gridsite/db/index.php



             

 The GOC monitoring pages displaying current status information about LCG2:

                  http://goc.grid-support.ac.uk/gridsite/gocmain/











Appendix F

============

This is a collection of basic commands that can be run to test the correct setup of a site.

These tests are not meant to be a replacement of the test tools provided by LCG test team.

Extensive documentation covering this can be found here:

http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG-Certification-help

The material in this chapter should enable the site administrator to verify the basic 

functionality of the site.



Testing the UI

Testing the CE and WNs

Testing the SE



Not included in this release:



Testing the RB

Testing the BDII

Testing the Proxy



Testing the UI

==============



The main tools used on a UI are: 



1) Tools to manage certificates and create proxies

2) Tools to deal with the submission and status retrieval of jobs

3) Client tools of the data management. These include tools to transport data

   and to query the replica location service





1) Create a proxy

   _______________



The grid-proxy-init command and the other commands used here should be in your path.



[adc0014] ~ > grid-proxy-init 

Your identity: /C=CH/O=CERN/OU=GRID/CN=Markus Schulz 1319

Enter GRID pass phrase for this identity:

Creating proxy ................................................................... Done

Your proxy is valid until: Mon Apr  5 20:53:38 2004



2) Run simple jobs

   --------------------

Check that globus-job-run works.

First select a CE that is known to work. Have a look at the GOC DB and select the CE at CERN.

 

[adc0014] ~ > globus-job-run lxn1181.cern.ch /bin/pwd

/home/dteam002



What can go wrong with this most basic test? If your VO membership is not 

correct you might be not in the grid-mapfile. In this case you will see some

errors that refer to grid security.



Next is to see if the UI is correctly configured to access a RB.

Create the following files for these tests:



testJob.jdl this contains a very basic job description.

 

Executable = "testJob.sh";

StdOutput = "testJob.out";

StdError = "testJob.err";

InputSandbox = {"./testJob.sh"};

OutputSandbox = {"testJob.out","testJob.err"};

#Requirements = other.GlueCEUniqueID == "lxn1181.cern.ch:2119/jobmanager-lcgpbs-short";



testJob.sh contains a very basic test script



#!/bin/bash

date 

hostname

echo"****************************************"

echo "env | sort"

echo"****************************************"

env | sort

echo"****************************************"

echo "mount"

echo"****************************************

mount 

echo"****************************************"

echo "rpm -q -a | sort"

echo"****************************************

/bin/rpm -q -a  | sort 



sleep 20

date



run the following command to see which sites can run your job



adc0014] ~/TEST > edg-job-list-match --vo dteam testJob.jdl

the output should look like:

Selected Virtual Organisation name (from --vo option): dteam

Connecting to host lxn1177.cern.ch, port 7772



***************************************************************************

                         COMPUTING ELEMENT IDs LIST 

 The following CE(s) matching your job requirements have been found:



                   *CEId*                             

 hik-lcg-ce.fzk.de:2119/jobmanager-pbspro-lcg           

 hotdog46.fnal.gov:2119/jobmanager-pbs-infinite         

 hotdog46.fnal.gov:2119/jobmanager-pbs-long             

 hotdog46.fnal.gov:2119/jobmanager-pbs-short            

 lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-infinite

 lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-long

 lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-short

 lcgce02.ifae.es:2119/jobmanager-lcgpbs-infinite        

 lcgce02.ifae.es:2119/jobmanager-lcgpbs-long            

 lcgce02.ifae.es:2119/jobmanager-lcgpbs-short           

 lxn1181.cern.ch:2119/jobmanager-lcgpbs-infinite        

 lxn1181.cern.ch:2119/jobmanager-lcgpbs-long            

 lxn1184.cern.ch:2119/jobmanager-lcglsf-grid            

 tbn18.nikhef.nl:2119/jobmanager-pbs-qshort             

 wn-04-07-02-a.cr.cnaf.infn.it:2119/jobmanager-lcgpbs-dteam

 tbn18.nikhef.nl:2119/jobmanager-pbs-qlong              

 lxn1181.cern.ch:2119/jobmanager-lcgpbs-short           

***************************************************************************



If an error is reported rerun the command using the --debug option.

Common problems are related to the RB that has been configured to be 

used as the default RB for the node.

To test if the UI works with a different UI you can run the command using

configuration files that overwrite the default settings.

Configure the two files to use for the test a known working RB. The 

RB at CERN that can be used is: lxn1177.cern.ch

The file that contains the VO dependent configuration 

has to contain the following:

lxn1177.vo.conf



[

VirtualOrganisation = "dteam";

NSAddresses = "lxn1177.cern.ch:7772";

LBAddresses = "lxn1177.cern.ch:9000";

## HLR location is optional. Uncomment and fill correctly for

## enabling accounting

#HLRLocation = "fake HLR Location"

## MyProxyServer is optional. Uncomment and fill correctly for

## enabling proxy renewal. This field should be set equal to

## MYPROXY_SERVER environment variable

MyProxyServer = "lxn1179.cern.ch"

]



and the common one:



lxn1177.conf 



[

rank = - other.GlueCEStateEstimatedResponseTime;

requirements = other.GlueCEStateStatus == "Production";

RetryCount = 3;

ErrorStorage = "/tmp";

OutputStorage = "/tmp/jobOutput";

ListenerPort = 44000;

ListenerStorage = "/tmp";

LoggingTimeout = 30;

LoggingSyncTimeout = 30;

LoggingDestination = "lxn1177.cern.ch:9002";

# Default NS logger level is set to 0 (null)

# max value is 6 (very ugly)

NSLoggerLevel = 0;

DefaultLogInfoLevel = 0;

DefaultStatusLevel = 0;

DefaultVo = "dteam";

]



Then run the list match with the following options:



edg-job-list-match -c `pwd`/lxn1177.conf --config-vo `pwd`/lxn1177.vo.conf  testJob.jdl



If this works you should have investigate the configuration of the RB that is 

selected by default from your UI or the associated configuration files.



If the job-list-match is working you can submit the test job using:



edg-job-submit  --vo dteam testJob.jdl



The command returns some output like:



Selected Virtual Organisation name (from --vo option): dteam

Connecting to host lxn1177.cern.ch, port 7772

Logging to host lxn1177.cern.ch, port 9002





*********************************************************************************************

                               JOB SUBMIT OUTCOME

 The job has been successfully submitted to the Network Server.

 Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:



 - https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g





*********************************************************************************************



In case the output of the command has a significant different structure you should 

rerun it and add the --debug option. Save the output for further analysis. 



Now wait some minutes and try to verify the status of the job using the command:



edg-job-status https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g 



repeat this until the job is in the status:  Done (Success)



If the job doesn't reach this state, or gets stuck for longer periods in the same state you 

should run a command to access the logging information. Please save the output.



edg-job-get-logging-info -v 1 https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g 



Assuming that the job has reached the desired status please try to retrieve the output:



edg-job-get-output  https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g



Retrieving files from host: lxn1177.cern.ch ( for https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g )



*********************************************************************************

                        JOB GET OUTPUT OUTCOME



 Output sandbox files for the job:

 - https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g

 have been successfully retrieved and stored in the directory:

 /tmp/jobOutput/markusw_0b6EdeF6dJlnHkKByTkc_g



*********************************************************************************



Check that the given directory contains the output and error files.



One common reason for this command to fail is that the access privileges for the 

jobOutput directory are not correct, or the directory has hot been created.



If you encounter a problem rerun the command using the --debug option.





3) Data management tools

----------------------------

Test that you can reach an external SE.

Run the following simple command to list a directory at one of the CERN SEs.



edg-gridftp-ls gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam



You should get a long list of files.



If this command fails it is very likely that your firewall setting is wrong.



Next see which resources you can see via the information system you should run:

[adc0014] ~/TEST/STORAGE > edg-replica-manager -v --vo dteam pi

edg-replica-manager starting..

Issuing command : pi

Parameters: 

Call replica catalog printInfo function

VO used            : dteam

default SE         : lxn1183.cern.ch

default CE         : lxn1181.cern.ch

Info Service       : MDS



............ and a long list of CEs and SEs and their parameters.



Verify that the default SE and CE are the nodes that you want to use. Make sure that these nodes are installed and

configured before you conduct the tests of more advanced data management functions. 



If you get almost nothing back you should check the configuration of the replica manager.

Us the following command to get the BDII that you are using:

grep mds.url /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf

this should return the name and port of the BDII that you intended to use.

For the CERN UIs you would get:



mds.url=ldap://lxn1178.cern.ch:2170



Convince yourself that this is the address of a working BDII that you can reach.



ldapsearch -LLL -x -H ldap://<node specified above>:2170 -b "mds-vo-name=local,o=grid"



this should return something starting like this:



dn: mds-vo-name=local,o=grid

objectClass: GlobusStub



dn: Mds-Vo-name=cernlcg2,mds-vo-name=local,o=grid

objectClass: GlobusStub



dn: Mds-Vo-name=nikheflcgprod,mds-vo-name=local,o=grid

objectClass: GlobusStub



dn: GlueSEUniqueID=lxn1183.cern.ch,Mds-Vo-name=cernlcg2,mds-vo-name=local,o=gr

 id

objectClass: GlueSETop

objectClass: GlueSE

objectClass: GlueInformationService

objectClass: Gluekey

objectClass: GlueSchemaVersion

GlueSEUniqueID: lxn1183.cern.ch

GlueSEName: CERN-LCG2:disk

GlueSEPort: 2811

GlueInformationServiceURL: ldap://lxn1183.cern.ch :2135/Mds-Vo-name=local,o=gr

 id

GlueForeignKey: GlueSLUniqueID=lxn1183.cern.ch

...................................



In case the query doesn't return the expected output verify that the node specified is a BDII and that the 

node is running the service. 



As a crosscheck you can try to repeat the test with one of the BDIIs at CERN. In the GOC DB you can identify the 

BDII for the production and the test zone. 

Currently these are lxn1178.cern.ch for the production system and lxnXXXX.cern.ch for the test Zone.



Before the  edg-replica-manager -v --vo dteam pi command and the edg-gridftp-ls commands are not working it makes 

no sense to conduct further tests.



Assuming that this functionality is well established the next test is to move a local file from the UI to the 

default SE and register the file with the replica location service. 



Create a file in your home directory. To make tracing this file easy the file should be named according to the scheme:



testFile.<SITE-NAME>.txt 



the file should be generated using the following script:



#!/bin/bash

echo "********************************************"

echo "hostname:  " `hostname` " date: " `date`

echo "********************************************"





the command to move the file to the default SE is:



edg-replica-manager -v --vo dteam cr file://`pwd`/testFile.<SiteName>.txt -l lfn:testFile.<SiteName>.`date +%m.%d.%y:%H:%M:%S`





The command returns if everything is setup correctly a line with:

guid:98ef70d6-874d-11d8-b575-8de631cc17af



Save the guid for further reference and the expanded lfn. We will refer to these as YourGUID and YourLFN.



In case this command failed you should keep the output and analyze it with your support contact. There 

are various reasons why this command has failed. 



Now we check that the RLS knows about your file. This is done by using the listReplicas (lr) option.



edg-replica-manager -v --vo dteam lr lfn:YourLFN



this command should return a string with a format similar to:



sfn://lxn1183.cern.ch/storage/dteam/generated/2004-04-06/file92c9f455-874d-11d8-b575-8de631cc17af

ListReplicas successful.



as before, report problems to your primary site.$



If the RLS knows about the file the next test is to transport the file back to your UI.

For this we use the cp option.



edg-replica-manager -v --vo dteam cp lfn:YourLFN file://`pwd`/testBack.txt 



this should create in the current working directory a file named testBack.txt.

List this file. 



With this you tested most of the core functions of your UI. Many of these functions will be used 

to verify the other components of your site.



Testing the CE and WNs

======================



We assume that you have setup a local CE running a batch system.

On most sites the CE provides two major services. For the information system the CE

runs the site GIIS. The site GIIS is the top node in the hierarchy of the site and via this

service the other resources of the site are published to the grid.



To test the working of the site GIIS you can run an ldap query of the following form. 

Inspect the output with some care. Are the computing resources (queues, etc. ) correctly 

reported? Can you find the local SE?. Do these numbers make sense?



ldapsearch -LLL -x -H ldap://lxn1181.cern.ch:2135 -b "mds-vo-name=cernlcg2,o=grid"



replace lxn1181.cern.ch with your site's GIIS hostname and cernlcg2 with the name that you have

assigned to your site GIIS.



If nothing is reported try to restart the MDS service on the CE.



Now verify that the GRIS on the CE is operating correctly:

Here again the command for the CE at CERN.



ldapsearch -LLL -x -H ldap://lxn1181.cern.ch:2135 -b "mds-vo-name=local,o=grid"



One common reason for this to fail is that the information provider on the CE has 

as problem. Convince yourself that MDS on the CE is up and running.

Run on  the CE the qstat command. If this command doesn't return there might be a problem 

with one of the worker nodes WNs, or PBS. Have a look at the following link that covers some

aspects on trouble shooting PBS on the GRID. http://goc.grid.sinica.edu.tw/gocwiki/TroubleShootingHistory



The next step is to verify that you can run jobs on the CE.

For the most basic test no registration with the information system is needed. However 

tests can be run much easier if the resource is registered in the information system.

For these tests the testZone BDII and RB have been setup at CERN. 

Forward your site GIIS name and host name to the deployment team for registration.



Initial tests that work without registration.



First tests from a UI of your choice:



As described in the subsection covering the UI tests the first test is a 

test of the fork jobmanger. 



adc0014] ~ > globus-job-run  <YourCE> /bin/pwd



Frequent problems that have been observed are related to the authentication.

Check that the CE has a valid host certificate and that your DN can be found 

in the grid-mapfile. 



Next logon to your CE and run a local PBS job to verify that PBS is working.

Change your id to a user like dteam001. In the home directory create the following file:



test.sh



#!/bin/bash



echo "Hello Grid"



run: qsub test.sh  this will return a job ID of the form: 16478.lxn1181.cern.ch

you can use qstat to monitor the job. However it is very likely that the job has finished 

before your have queried the status. PBS will place two files in your directory:



test.sh.o16478 and  test.sh.e16478 These contain the stdout and stderr 



Now try to submit to one of your PBS queues that are available on the CE. The following 

command is an example for a site that runs a PBS without shared home directories. The short 

queue is used. It can take some minutes until the command returns.



globus-job-run <YourCE>/jobmanager-lcgpbs -queue short /bin/hostname

lxshare0372.cern.ch





The next test submits a job to your CE by forcing the broker to select the queue that your 

have chosen. You can use the testJob JDL and script that has been used before for the UI tests.

 

edg-job-submit --debug --vo dteam -r <YourCE>:2119/jobmanager-lcgpbs-short testJob.jdl



The --debug option should only be used if you have been confronted with problems. 



Follow the status of the job and as before try to retrieve the output. A quite common problem is that the 

output can't be retrieved. This problem is related to some inconsistency of ssh keys between the CE and the 

WN. See http://goc.grid.sinica.edu.tw/gocwiki/TroubleShootingHistory and the CE/WN configuration. 



If your UI is not configured to use  a working RB you can, as described in the UI testing subsection use

configuration files to use the testZone RB.



For further tests get registered with the testZone BDII. As described in the subsection on joining LCG2 you should send

your CE's hostname and the site GIIS name to the deployment team. 



The next step is to take the testJob.jdl that you have created for the verification of your UI.

Remove the comment from the last line of the file and modify it to reflect your CE.



Requirements = other.GlueCEUniqueID == "<YourCE>:2119/jobmanager-lcgpbs-short";



Now repeat the edg-job-list-match --vo dteam testJob.jdl command known from the UI tests.

This output should just show one resource.



The remaining tests verify that core of the data management is working from the WN and that the support for the 

experiment software installation as described in https://edms.cern.ch/file/412781//SoftwareInstallation.pdf

is working correctly. The tests you can do to verify the later are limited if you are not mapped to software 

manager for your VO. 

To test the data management functions your local default SE has to be setup and tested. Of course you can assume the 

SE working and run the tests before testing the SE.



Add an argument to the JDL that allows to identify the site.

The jdl file should look like:



testJob_SW.jdl



Executable = "testJob.sh";

StdOutput = "testJob.out";

StdError = "testJob.err";

InputSandbox = {"./testJob.sh"};

OutputSandbox = {"testJob.out","testJob.err"};

Requirements = other.GlueCEUniqueID == "lxn1181.cern.ch:2119/jobmanager-lcgpbs-short" ;

Arguments = "CERNPBS" ;



replace the name of the site and the CE and queue names to reflect your settings.



The first script to run collects some configuration information from the WN and test the user software installation area.



testJob.sh



#!/bin/bash

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "           " $1 "        "  `hostname`  "  " `date`

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "the environment on the node"

echo " " 

env | sort

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "software path for the experiments"

env | sort | grep _SW_DIR

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "mount"

mount

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "============================================================="

echo "veryfiy that the software managers of the supported VOs can write and the users read"

echo "DTEAM ls -l " $VO_DTEAM_SW_DIR

ls -dl $VO_DTEAM_SW_DIR

echo "ALICE ls -l " $VO_ALICE_SW_DIR

ls -dl $VO_ALICE_SW_DIR

echo "CMS ls -l " $VO_CMS_SW_DIR

ls -dl $VO_CMS_SW_DIR

echo "ATLAS ls -l " $VO_ATLAS_SW_DIR

ls -dl $VO_ATLAS_SW_DIR

echo "LHCB ls -l " $VO_LHCB_SW_DIR

ls -dl $VO_LHCB_SW_DIR

echo "============================================================="

echo "============================================================="

echo "============================================================="

echo "============================================================="

echo "cat /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf"

echo "=============================================================" 

cat /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf 

echo "============================================================="

echo "============================================================="

echo "============================================================="

echo "============================================================="

echo "rpm -q -a | sort "

rpm -q -a | sort  

echo "============================================================="

date





Run this job as described in the subsection on testing UIs. Retrieve the 

output and verify that the environment variables for the experiment software

installation is correctly set and that the directories for the VOs that you

support are mounted and accessible.



In the edg-replica-manager.conf file reasonable default CEs and SEs should be specified:

The output for the CERN PBS might serve as an example:



localDomain=cern.ch

defaultCE=lxn1181.cern.ch

defaultSE=wacdr002d.cern.ch



Then a working BDII node has to be specified as the MDS top node:

For the CERN production this is currently:



mds.url=ldap://lxn1178.cern.ch:2170

mds.root=mds-vo-name=local,o=grid



Please keep the output of this job as a reference. It can be helpful if problems have to be 

located.



Next we test the data management. For this the default SE should be working.

The following script will do some operations similar to those used on the UI.



We first test that we can access a remote SE via simple gridftp commands.

Then we test that the replica manager tools have access to the information 

system. This is followed by exercising the data moving capabilities between the 

WN, the local SE and between a remote SE and the local SE. 

Between the commands we run small commands to verify that the RLS service knows

about the location of the files.



Submit the job via edg-job-submit and retrieve the output. Read the file containing stdout and 

stderr. Keep the files for reference. 



Here now a listing of testJob.sh:



#!/bin/bash

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "           " $1   "        "  `hostname`  "Start: " `date` 

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Can we see the SE at CERN?"

echo "-----------------------------------------------------------------------------------------------------"

echo "edg-gridftp-ls --verbose gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/mm20 "

edg-gridftp-ls --verbose gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/mm20 

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Can we see the information system?"

echo "-----------------------------------------------------------------------------------------------------"

echo " edg-replica-manager -v --vo dteam pi "

edg-replica-manager -v --vo dteam pi 

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

lfname=testFile.$1.txt

echo "create a local file: " $lfname 

rm -rf $lfname

echo "*********************************************************************************"  > $lfname

echo "Site: " $1 " hostname: " `hostname` " date: " `date`    >> $lfname                    

echo "*********************************************************************************"  >> $lfname



myLFN=$1.`hostname`.`date +%m.%d.%y:%H:%M:%S`

echo "move the file to the default SE and register it with the LFN: " $myLFN 

echo "-----------------------------------------------------------------------------------------------------"

echo "edg-replica-manager -v --vo dteam cr file://"`pwd`"/"$lfname " -l lfn:"$myLFN

edg-replica-manager -v --vo dteam cr file://`pwd`/$lfname  -l lfn:$myLFN 

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "list the replica edg-replica-manager -v --vo dteam lr lfn:"$myLFN

echo "-----------------------------------------------------------------------------------------------------"

edg-replica-manager -v --vo dteam lr lfn:$myLFN

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

lf2=$lfname.2

echo "get the file back and store it in "$lf2

echo "-----------------------------------------------------------------------------------------------------"

rm -rf $lf2

echo "edg-replica-manager -v --vo dteam cp lfn:"$myLFN " file://"`pwd`"/"$lf2 

edg-replica-manager -v --vo dteam cp lfn:$myLFN file://`pwd`/$lf2

echo " "

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "cat " $lf2

echo "-----------------------------------------------------------------------------------------------------"

cat $lf2

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Replicate the file from the default SE to the CASTOR service at CERN"

echo "-----------------------------------------------------------------------------------------------------"

echo "edg-replica-manager -v --vo dteam replicateFile lfn:"$myLFN "-d castorgrid.cern.ch"

edg-replica-manager -v --vo dteam replicateFile lfn:$myLFN -d castorgrid.cern.ch

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Was it successful?"

echo "-----------------------------------------------------------------------------------------------------"

echo "list the replica edg-replica-manager -v --vo dteam lr lfn:"$myLFN

edg-replica-manager -v --vo dteam lr lfn:$myLFN

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "3rd party copy from castorgrid.cern.ch  to the default SE"

echo "-----------------------------------------------------------------------------------------------------"

echo "edg-replica-manager -v --vo dteam replicateFile lfn:TheUniversalFile.txt"

edg-replica-manager -v --vo dteam replicateFile lfn:TheUniversalFile.txt

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Was it successful?"

echo "-----------------------------------------------------------------------------------------------------"

echo "list the replica edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt"

edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "get this file on the WN"

echo "-----------------------------------------------------------------------------------------------------"

rm -rf TheUniversalFile.txt

echo "edg-replica-manager -v --vo dteam cp lfn:TheUniversalFile.txt file://"`pwd`"/TheUniversalFile.txt"

edg-replica-manager -v --vo dteam cp lfn:TheUniversalFile.txt file://`pwd`/TheUniversalFile.txt

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "cat TheUniversalFile.txt"

echo "-----------------------------------------------------------------------------------------------------"

cat TheUniversalFile.txt

echo " "

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

defaultSE=`grep defaultSE /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf | cut -d "=" -f 2`

echo "remove the replica from the default SE: "$defaultSE

echo "-----------------------------------------------------------------------------------------------------"

echo "edg-replica-manager -v --vo dteam del lfn:TheUniversalFile.txt -s " $defaultSE 

edg-replica-manager -v --vo dteam del lfn:TheUniversalFile.txt -s $defaultSE 

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Was it successful?"

echo "-----------------------------------------------------------------------------------------------------"

echo "list the replica edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt"

edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "Ended to run the certification on site: " $1 " at date: "

date 



Testing the SE

===============



If the tests described to test the UI and the CE on a site have run successful then there is no additional

test for the SE needed. 

We describe here some of the common problems that have been observed related to SEs.



In case the SE can't be found by the edg-replica-manager tools the SE GRIS might be not working, or 

not registered with the site GIIS.



To verify that the SE GRIS is working you should run the following ldapsearch.

Note that the hostname that you use should be the one of the node where the GRIS is located.

For mass storage SEs it is quite common that this is not the the SE itself.



ldapsearch -LLL -x -H ldap://lxn1183.cern.ch:2135 -b "mds-vo-name=local,o=grid"



If this returns nothing or very little the MDS service on the SE should be restarted.

If the SE returns some information you should carefully check that the VOs that require access to the resource 

are listed in the  GlueSAAccessControlBaseRule field. 

Does the information published in the GlueSEAccessProtocolType fields reflect your intention? 

Is the GlueSEName: carrying the extra "type" information?



The next major problem that has been observed with SEs is due to a mismatch with what is published in the 

information system and what has been implemented on the SE.



Check that the gridmap-file on the SE is configured to support the VOs that are published the  GlueSAAccessControlBaseRule fields.



Run a ldapsearch on your site GIIS and compare the information published by the local CE with what you can find on the SE.

Interesting fields are:

GlueSEName, GlueCESEBindSEUniqueID, GlueCESEBindCEAccesspoint



Are the access-points for all the supported VOs created and is the access control correctly configured?



The edg-replica-manager command printInfo summarizes this quite well.

Here is an example for a report generated for a classic SE at CERN.



SE at CERN-LCG2 : 

                      name : CERN-LCG2

                      host : lxn1183.cern.ch

                      type : disk

               accesspoint : /storage

                       VOs : dteam

            VO directories : dteam:dteam

                 protocols : gsiftp,rfio



to test the gsiftp protocol in a convenient way you can use the edg-gridftp-ls and edg-gridftp-mkdir commands.

You can use the globus-url-copy command instead. The -help option describes the syntax to be used.





Run on your UI and replace the host and accesspoint according to the report for your SE:



edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage 

drwxrwxr-x    3 root     dteam        4096 Feb 26 14:22 dteam



and:

edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage/dteam

drwxrwxr-x   17 dteam003 dteam        4096 Apr  6 00:07 generated



if the globus-gridftp service is not running on the SE you get the following message back:

error a system call failed (Connection refused)



If this happens restart the globus-gridftp service on your SE.



Now create a directory on your SE.



edg-gridftp-mkdir  gsiftp://lxn1183.cern.ch/storage/dteam/t1



Verify that the command ran successful with: 

edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage/dteam/



Verify that the access permissions for all the supported VOs are correctly set.











Change History

========================



Change History

--------------



-merged the document with the how2start guide and added additional material to

 it. This is the last text based version.



Release LCG-2_0_0 (XX/02/2004):



Major release: please see release notes for details.



Release LCG1-1_1_3 (04/12/2003):



- Updated kernel to version2.4.20-24.7 to fix a critical security bug



- Removed ca_CERN-old-0.19-1 and ca_GermanGrid-0.19-1 rpms as the corresponding

  CAs have recently expired



- On user request, added zsh back to to the UI rpm list



- Updated myproxy-server-config-static-lcg rpm to recognize the new CERN CA



- Added oscar-dar rpm from CMS to WN



Release LCG1-1_1_2 (25/11/2003):



- Added LHCb software to WN



- Introduced private-cfg.h.template file to handle sensible settings for the

  site (only the encrypted root password, for the moment)



- Added instructions on how to use MD5 encryption for root password



- Added instructions on how to configure http server on the LCFG node to be

  accessible only from nodes on site



- Fixed TCP port range setting for Globus on UI



- Removed CERN libraries installation on the UI (added by mistake in release

  LCG1-1_1_1)



- Added instructions to increase maximum number of open files on WNs



- Added instructions to correctly set the root password for the MySQl server

  on the RB



- Added instructions to configure WNs to use a web proxy for CRL download
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options