JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for TB-SUPPORT Archives


TB-SUPPORT Archives

TB-SUPPORT Archives


TB-SUPPORT@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TB-SUPPORT Home

TB-SUPPORT Home

TB-SUPPORT  April 2004

TB-SUPPORT April 2004

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

many testbeds

From:

Steve Traylen <[log in to unmask]>

Reply-To:

Testbed Support for GridPP member institutes <[log in to unmask]>

Date:

Wed, 7 Apr 2004 15:10:42 +0100

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (62 lines) , lcg2-install-notes.txt (1 lines)


Hi *,

 It seems various people and sites are wanting to join various testbeds
 and grids so I thought it would be worth trying to explain them and how
 to get started with any of them.

 There basically 4 distributed testbeds.

 JRA1                  EGEE Middleware development, a very dev dev testbed,
                       this is CERN, NIKHEF and RAL only.

 LCG2                  A large scale production grid.

 EGEE Production       Basically this is LCG2 as far as I can tell and at this
                       time is exactly the same thing. Exact details on how
                       to join this are in the pipeline.

 EGEE Pre-Production   The next EGEE/LCG software release version

 The most common question is to which should I join if I want to run
 data challenges for HEP experiments and the answer in this case is
 definetly LCG2.

 The current release of LCG2 was released half way through me writing
 this mail and is LCG2.0.0. For the first time it has been recomended
 that all sites at LCG1 do an upgrade though in reality I think just
 about every one has now made this transistion.

 How to start on LCG2.

 export CVS_RSH=ssh
 cvs -d :pserver:[log in to unmask]:/cvs/lcgdeploy co \
         -r LCG-2_0_0 lcg2

 and read lcg2-install-notes.txt which details how to join. Within
 these notes it mentions a tier1 contact and that person is me.

 I have attached them as well.

 I think it is worth trying to announce/agree/discuss what some of
 the other GridPP sites plan to do. I know problably 4 or 5 are in or
 working towards LCG2 and this is very likely candiadate for others
 as well.

 A phone conference has been suggested and can be arranged for early next
 week if considered useful. This will be in time to bring our plans to the
 EGEE conference the following week.

    Steve




--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/





=============================================================================== ========================== LCG-2 Installation notes =========================== =============================================================================== =========== C 2004 by Emanuele Leonardi - [log in to unmask] =========== =============================================================================== Reference tag: LCG-2_0_0 These notes will assist you in installing the latest LCG-2 tag and upgrading from the previous tag. The document is not a typical release note. It covers in addition some general aspects related to LCG2. This document is intended for: 1) Sites that run LCG2 and need to upgrade to the current version 2) Sites that move from LCG1 to LCG2 3) Sites that join the LCG 4) Sites that operate LCG2 What is LCG? ============ This is best answered by material found on the projects web site http://lcg.web.cern.ch/LCG/ . From there you can find information about the nature of the project and its goals. At the end of the introduction you can find a section that collects most of the references. How to join LCG2? ================ If you want to join LCG and add resources to it you should contact the LCG deployment manager Ian Bird ([log in to unmask]) to establish the contact with the project. If you only want to use LCG you can follow the steps described in the LCG User Overview (http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm). The registration and initial training using the LCG-2 Users Guide (https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf) should take about a week. However only 8 hours is related to working with the system, while the majority is waiting for the registration process with the VOs and the CA. If you are interested in adding resources to the system you should first register as a user and subscribe to the LCG Rollout mailing list (http://www.listserv.rl.ac.uk/archives/lcg-rollout.html). In addition you need to contact the Grid Operation Center (GOC) (http://goc.grid-support.ac.uk/gridsite/gocmain/) and get access to the GOC-DB for registering your resources with them. This registration is the basis for your system being present in their monitoring. It is mandatory to register at least your service nodes in the GOC DB. It is not necessary to register all farm nodes. Please see Appendix H for a detailed description. LCG has introduced a hierarchical support model for sites. Regions have primary sites (P-sites) that supports the smaller centers in this region. If you do not know who is your primary site, please contact the LCG deployment manager Ian Bird. If you have identified your primary site you should fill the form that you find at the end of the guide in appendix G and send it to your primary site AND to the deployment team at CERN ([log in to unmask]). The site security contacts and sysadmins will receive material from the LCG security team that describes the security policies of LCG. Discuss with the grid deployment team or with your primary site a suitable layout for your site. Various configurations are possible. Experience has shown that using at the beginning a standardized small setup and evolve from this to a larger more complex system is highly advisable. Typical layout for a minimal site is a user interface node (UI) which allows to submit jobs to the grid. This node will use the information system and resource broker either from the primary site, or the CERN site. A site that can provide resources will add a computing element (CE), that acts as a gateway to the computing resources and a storage element (SE), that acts as a gateway to the local storage. In addition a few worker nodes (WN) to provide the computing power can be added. Large sites with many users that submit a large number of jobs will add a resource broker (RB). The resource broker distributes the jobs to the sites that are available to run jobs and keeps track of the status of the jobs. The RB uses for the resource discovery an information index (BDII). It is good practice to setup a BDII on each site that operates a RB. A complete site will add a Proxy server node that allows the renewal of proxy certificates. In case you don't find a setup described in this installation guide that meets your needs you should contact your primary site for further help. After a site has been setup the site manager, or the support persons of the primary sites should run the initial tests that are described in the first part of the chapter on testing. If these tests have been run successful the site should contact the deployment team via e-mail. The mail should contain the sites GIIS name and the hostname of the GIIS. To allow further testing the site will be added to a LCG-BDII which is used for testing new sites. Then the primary site, or the site managers can run the additional tests described. When a site has passed these tests the site or the primary site will announce this to the deployment team which then after a final round of testing will add the site to the list of core sites. How to report problems ======================= The way problems are reported is currently changing. On the LCG user introduction page (http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm) you can find information on the current appropriate way to report problems. Before reporting a problem you should first try to consult your primary site. Many problems are currently reported to the rollout list. Internally we still use a Savannah based bug tracking tool that can be accessed via this link https://savannah.cern.ch/bugs/?group=lcgoperation. How to setup your site ====================== With this release you have the option to either install and configure your site using LCFGng, a fabric management tool that is supported by LCG, or to install the nodes following a manual step by step description which can be used as a basis to configure your local fabric management system. For very small sites the manual approach has the advantage that no learning of the tool is required and no extra node needs to be maintained. In addition no reinstallation of your nodes is required. However, the maintenance of the nodes will require more work and it is more likely to introduce hidden misconfigurations. For medium to larger sites without their own fabric management tools using LCFGng can be an advantage. It is up to a site to decide which method is preferred. The documentation for the manual installation can be found here: http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/ We currently support all node types with the exception of the PROXY server node. This will follow soon. In case you decide to use the manual setup you should nevertheless have a look at parts of this document. For example the section about firewalls and testing are valid for both installation methods. Network access =============== The current software requires outgoing network access from all the nodes. And incoming on the RB, CE, and SE. Some sites have gained experience with running their sites through a NAT. We can provide contact information of sites with experience of this setup. To configure your firewall you should use the port table that we provide as a reference. Please have a look at the chapter on firewall configuration. General Note on Security ========================= While we provide in our repositories Kernel RPMs and use for the configuration certain versions it has to be pointed out that you have to make sure that you consider the kernel that you install as safe. If the provided default is not what you want please replace it. Sites Moving From LCG1 to LCG2 =============================== Since LCG2 is significantly different from both LCG1 and EDG, it is mandatory to study this guide even for administrators with considerable experience. In case you see the need to deviate from the described procedures please contact us. Due to the many substantial changes w.r.t LCG1, updating a site from any of the LCG1 releases to LCG-2 is not possible in a reliable way. A complete re-installation of the site is the only supported procedure. Another change is related to the CVS repository used. For CERN internal reasons we had to move to a different server and switch to a different authorization scheme. See http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ for details about getting access to the CVS repository. For web based browsing the access to CVS is via http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/ As described later we changed for LCG2 the directory structure in CVS. There are now two relevant directories lcg2 and lcg2-sites. The first contains common elements while the later contains the site specific information. In addition to the installation via LCFGng an increasing number of node types is now supported to be installed manually: These are: Worker Nodes (WN), User Interfaces (UI), Computing Elements (CE), classical Storage Elements (SEs), and the BDII The Proxy Server is in preparation and almost finished. References: =========== Documentation: ------------------------- LCG Project Homepage: http://lcg.web.cern.ch/LCG/ Starting point for users of the LCG infrastructure: http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm LCG-2 User's Guide: https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf LCFGng server installation guide: http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/lcg2/docs/LCFGng_server_install.txt LCG-2 Manual Installation Guide: http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/ LCG GOC Mainpage: http://goc.grid-support.ac.uk/gridsite/gocmain/ CVS User's Guide: http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ Registration: -------------------------- LCG rollout list: http://www.listserv.rl.ac.uk/archives/lcg-rollout.html - join the list Get the Certificate and register in VO: http://lcg-registrar.cern.ch/ - read LCG Usage Rules - choose your CA and contact them to get USER certificate (for some CAs online certificate request is possible) - load your certificate into web browser (read instructions) - choose your VO and register (LCG Registration Form) GOC Database: http://goc.grid-support.ac.uk/gridsite/db-auth-request/ - apply for access to the GOCDB CVS read-write access and site directory setup: mailto:[log in to unmask] - prepare and send a NAME for your site following the schema <domain>-<organization>[-<section>] (e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF) Site contact database: mailto:[log in to unmask] - fill in the form in Appendix G and send it Report bugs and problems with installation: https://savannah.cern.ch/bugs/?group=lcgoperation Notes about lcg2-20040407 ------------------------------------- In the previous beta-release the new LCG-BDII node type has been introduced. And for some time the two information system structures have been operated in parallel. Since we expect many sites to move from LCG1 to LCG2 we will switch now permanently to the new layout which we describe later in some detail. The new LCG-BDII does not use any more on the Regional MDSes but collects information directly from the Site GIISes. The list of existing sites and their addresses are downloaded from a pre-defined web location. See notes in the BDII specific section in this document for installation and configuration. This layout will allow sites and VOs to configure their own super- or subset of the LCG2 resources. A new Replica Manager client has also been introduced in the previous version. This is the only client which is compatible with the current version of the RLS server, so file replication at your site will not work till you have updated to this release. Introduction and overall setup ============================== In this text we will assume that you are already familiar with the LCFGng server installation and management. Please refer to the LCFGng_server_install.txt file in the docs directory of the lcg2 release for an up-to-date guide on how to set-up an LCFGng server for use with LCG-2. Note for sites which are already running LCG1: due to the incompatible update of several configuration objects, a LCFG server cannot support both LCG1 and LCG-2 nodes. If you are planning to re-install your LCG1 nodes with LCG-2, then the correct way to proceed is: 1) kill the rdxprof process on all your nodes (or just switch your nodes off if you do not care about the extra down-time at your site); 2) update your LCFG server using the objects listed in the LCG-2 release; 3) prepare the new configuration files for your site as described in this document; 4) re-install all your nodes. If you plan to keep your LCG1 site up while installing a new LCG-2 site, then you will need a second LCFG server. This is a matter of choice. The LCG1 installation is of very limited use if you setup the LCG-2 site since several core components are not compatible anymore. Files needed for the current LCG-2 release are available from a CVS server at CERN. This CVS server contains the list of rpms to install and the LCFGng configuration files for each node type. The CVS area, called "lcg2", can be reached from http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/ Note1: at the same location there is another directory called "lcg-release": this area is used for the integration and certification software, NOT for production. Please ignore it! Note2: documentation about access to this CVS repository can be found in http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ In the same CVS location we created an area, called lcg2-sites, where all sites participating to LCG-2 should store the configuration files used to install and configure their nodes. Each site manager will find there a directory for their site with a name in the format <domain>-<city>-<institute> or <domain>-<organization>[-<section>] (e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF): this is where all site configuration files should be uploaded. Site managers that install a site Site managers are kindly asked to keep these directories up-to-date by committing all changes they do to their configuration files back to CVS so that we will be able to keep track of the status of each site at any given moment. Once a site reaches a consistent working configuration, site managers should create a CVS tag which will allow them to easily recover configuration information if needed. Tag names should follow the following convention: The tags of the LCG-2 modules are: LCG2-<RELEASE>, e.g. LCG2-1_1_1 for software release 1.1.1 If you tag your local configuration files, the tag name must contain a reference to the lcg2 release in use at the time. The format to use is: LCG2-<RELEASE>_<SITENAME>_<DATE>_<TIME> e.g. LCG2-1_1_1_CERN_20031107_0857 for configuration files in use at CERN on November 7th, 2003, at 8:57 AM. The lcg2 release used for this example is 1.1.1. To activate a write-enabled account to the CVS repository at CERN please get in touch with Louis Poncet < [log in to unmask] > . Judit Novak <[log in to unmask]> or Markus Schulz <[log in to unmask]> are the persons to contact if you do not find a directory for your site or if you have problems uploading your configuration files to CVS. If you just want to install a site, but not join LCG, you can get anonymous read access to the repository. As described in the CVS access guide set the CVS environment variables. Set CVS_RSH to : setenv CVS_RSH ssh Set CVSROOT to : setenv CVSROOT :pserver:[log in to unmask]:/cvs/lcgdeploy All site managers have in any case to subscribe to and monitor the LCG-Rollout mailing list. Here all issues related to the LCG deployment, including announcements of updates and security patches, are discussed. You can subscribe from the following site: http://cclrclsv.RL.AC.UK/archives/lcg-rollout.html and click on the "Join or leave the list" This is the main source for communicating problems and changes. Preparing the installation of current tag ========================================= The current LCG tag is ---> LCG-2_0_0 <--- In the following instructions/examples, when you see the <CURRENT_TAG> string, you should replace it with the name of the tag defined above. To install it, check it out on your LCFG server with > cvs checkout -r <CURRENT_TAG> -d <TAG_DIRECTORY> lcg2 Note: the "-d <TAG_DIRECTORY>" will create a directory named <TAG_DIRECTORY> and copy there all the files. If you do not specify the -d parameter, the directory will be a subdirectory of the current directory named lcg2. The default way to install the tag is to copy the content of the rpmlist subdirectory to the /opt/local/linux/7.3/rpmcfg directory on the LCFG server. This directory is NFS-mounted by all client nodes and is visible as /export/local/linux/7.3/rpmcfg Go to the directory where you keep your local configuration files. If you want to create a new one, you can check out from CVS any of the previous tags with: > cvs checkout -r <YOUR_TAG> -d <LOCAL_DIR> lcg2/<YOUR_SITE> If you have not committed any configuration file yet or if you want to use the latest (HEAD) versions, just omit the "-r <YOUR_TAG>" parameter. Now cd to <LOCAL_DIR> and copy there the files from <TAG_DIRECTORY>/examples: following the instructions in the 00README file, those in the example files themselves, and those reported below in this document you should be able to create an initial version of the configuration files for your site. If you have problems, please contact your reference primary site. NOTE: if you already have localized versions of these files, just compare them with the new templates to verify that no new parameter needs to be set. Be aware that there are several critical differences between LCG1 and LCG-2 site-cfg.h files, so apply extra care when updating this file. IMPORTANT NOTICE: If you have a CE configuration file from LCG1, it probably includes the definition of the secondary regional MDS for your region. This is now handled by the ComputingElement-cfg.h configuration file and can be configured directly from the site-cfg.h file. See Appendix E for details. To download all the rpms needed to install this version you can use the updaterep command. In <TAG_DIRECTORY>/tools you can find 2 configuration files for this script: updaterep.conf and updaterep_full.conf. The first will tell updaterep to only download the rpms which are actually needed to install the current tag, while updaterep_full.conf will do a full mirror of the LCG rpm repository. Copy updaterep.conf to /etc/updaterep.conf and run the updaterep command. By default all rpms will be copied to the /opt/local/linux/7.3/RPMS area, which is visible from the client nodes as /export/local/linux/7.3/RPMS. You can change the repository area by editing /etc/updaterep.conf and modifying the REPOSITORY_BASE variable. IMPORTANT NOTICE: as the list and structure of Certification Authorities (CA) accepted by the LCG project can change independently from the middle-ware releases, the rpm list related to the CAs certificates and URLs has been decoupled from the standard LCG release procedure. This means that the version of the security-rpm.h file contained in the rpmlist directory associated to the current tag might be incomplete or obsolete. Please go to the URL http://markusw.home.cern.ch/markusw/lcg2CAlist.html and follow the instructions there to update all CA-related settings. Changes and updates of these settings will be announced on the LCG-Rollout mailing list. To make sure that all the needed object rpms are installed on your LCFG server, you should use the lcfgng_server_update.pl script, also located in <TAG_DIRECTORY>/tools. This script will report which rpms are missing or have the wrong version and will create the /tmp/lcfgng_server_update_script.sh script which you can then use to fix the server configuration. Run it in the following way: > lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-common-rpm.h > /tmp/lcfgng_server_update_script.sh > lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-server-rpm.h > /tmp/lcfgng_server_update_script.sh WARNING: please always give a look to /tmp/lcfgng_server_update_script.sh and verify that all rpm update commands look reasonable before running it. In the source directory you should give a look to the redhat73-cfg.h file and see if the location of the rpm lists (updaterpms.rpmcfgdir) and of the rpm repository (updaterpms.rpmdir) are correct for your site (the defaults are consistent with the instructions in this document). If needed, you can redefine these paths from the local-cfg.h file. In private-cfg.h you can (must!) replace the default root password with the one you want to use for your site: +auth.rootpwd <CRYPTED_PWD> <--- replace with your own crypted password To obtain <CRYPTED_PWD> using the MD5 encryption algorithm (stronger than the standard crypt method) you can use the following command: > openssl passwd -1 This command will prompt you to insert the clear text version of the password and then print the encrypted version. E.g. > openssl passwd -1 Password: <- write clear text password here $1$iPJJEhjc$rtV/65l890BaPinzkb58z1 <- <CRYPTED_PWD> string To finalize the adaptation of the current tag to your site you should edit your site-cfg.h file. If you already have a site-cfg.h file that you used to install any of the LCG1 releases, you can find a detailed description of the modifications to this file needed for the new tag in Appendix E below. WARNING: the template file site-cfg.h.template assumes you want to run the PBS batch system without sharing the /home directory between the CE and all the WNs. This is the recommended setup. There may be situations when you have to run PBS in traditional mode, i.e. with the CE exporting /home with NFS and all the WNs mounting it. This is the case, e.g., if your site does not allow for host based authentication. To revert to the traditional PBS configuration you can edit your site-cfg.h file and comment out the following two lines: #define NO_HOME_SHARE ... #define CE_JM_TYPE lcgpbs In addition to this, your WN configuration file should include this line: #include CFGDIR/UsersNoHome-cfg.h" just after including Users-cfg.h (please note that BOTH Users-cfg.h AND UsersNoHome-cfg.h must be included). Storage ======= In the current version LCG still uses the "Classical SE" model. This consists into a storage system (either a real MSS or just a node connected to some disks) which exports a GridFTP interface. Information about the SE must be published by a GRIS registered to the Site GIIS. If your SE is a completely independent node connected to a bunch of disks (these can either be local or mounted from a disk server) then you can install this node using the example SE_node file: this will install and configure on the node all needed services (GridFTP server, GRIS, authentication system). If you plan to use a local disk as the main storage area, you can include the flatfiles-dirs-SECLASSIC-cfg.h file: LCFG will take care of creating all needed directories with the right access privileges. If on the other hand your SE node mounts the storage area from a disk server, then you will have to create all needed directories and set their privileges by hand. Also, you will have to add to the SE node configuration file the correct commands to NFS-mount the area from the disk server. As an example, let's assume that your disk server node is called <server> and that it exports area <diskarea> for use by LCG. On your SE you want to mount this area as /storage and then allow access to it via GridFTP. To this end you have to go through the following steps: 1) in site-cfg.h define #define CE_CLOSE_SE_MOUNTPOINT /storage 2) in the SE_node configuration file add the lines to mount this area from <server>: EXTRA(nfsmount.nfsmount) storage nfsmount.nfsdetails_storage /storage <server>:<diskarea> rw 3) once the SE node is installed and /storage has been mounted, create all VO directories, one per supported VO, giving read/write access to the corresponding group. For VO <vo>: > mkdir /storage/<vo> > chgrp <vo> /storage/<vo> > chmod g+w /storage/<vo> A final possibility is that at your site a real mass storage system with a GridFTP interface is already available (this is the case for the CASTOR MSS at CERN). In this case, instead of installing a full SE, you will need to install a node which act as a front-end GRIS for the MSS, publishing to the LCG information system all information related to the MSS. This node is called a PlainGRIS and can be installed using the PG_node file from the examples directory. Also, a few changes are needed in the site-cfg.h file. Citing from site-cfg.h.template: /* For your storage to be visible from the grid you must have a GRIS which * publishes information about it. If you installed your SE using the classical * SE configuration file provided by LCG (StorageElementClassic-cfg.h) then a * GRIS is automatically started on that node and you can leave the default * settings below. If your storage is based on a external MSS system which * only provides a GridFTP interface (an example is the GridFTP-enabled CASTOR * service at CERN), then you will have to install an external GRIS server * using the provided PlainGRIS-cfg.h profile. In this case you must define * SE_GRIS_HOSTNAME to point to this node and define the SE_DYNAMIC_CASTOR * variable instead of SE_DYNAMIC_CLASSIC (Warning: defining both variables at * the same time is WRONG!). * * Currently the only supported external MSS is the GridFTP-enabled CASTOR used * at CERN. */ #define SE_GRIS_HOSTNAME SE_HOSTNAME #define SE_DYNAMIC_CLASSIC /* #define SE_DYNAMIC_CASTOR */ Firewall configuration ====================== If your LCG nodes are behind a firewall, you will have to ask your network manager to open a few "holes" to allow external access to some LCG service nodes. A complete map of which port has to be accessible for each service node is provided in file lcg-port-table.pdf in the lcg2/docs directory. Note that the file is in addition present in LaTeX format. Node installation and configuration =================================== In the <TAG_DIRECTORY>/tools you can find a new version of the do_mkxprof.sh script. A detailed description of how this script works is contained in the script itself. You are of course free to use your preferred call to the mkxprof command but note that running mkxprof as a daemon is NOT recommended and can easily lead to massive catastrophes if not used with extreme care: do it at your own risk. To create the LCFG configuration for one or more nodes you can do > do_mkxprof.sh node1 [node2 node3, ...] If you get an error status for one or more of the configurations, you can get a detailed report on the nature of the error by looking into URL http://<Your_LCFGng_Server>/status/ and clicking on the name of the node with a faulty configuration (a small red bug should be shown beside the node name). Once all node configurations are correctly published, you can proceed and install your nodes following any one of the installation procedures described in the "LCFGng Server Installation Guide" mentioned above (LCFGng_server_install.txt). When the initial installation completes (expect two automatic reboots in the process), each node type requires a few manual steps, detailed below, to be completely configured. After completing these steps, some of the nodes need a final reboot which will bring them up with all the needed services active. The need for this final reboot is explicitly stated among the node configuration steps below. Common steps ------------ -- On the ResourceBroker, MyProxy, StorageElement, and ComputingElement nodes you must install the host certificate/key files in /etc/grid-security with names hostcert.pem and hostkey.pem. Also make sure that hostkey.pem is only readable by root with > chmod 400 /etc/grid-security/hostkey.pem -- All Globus services grant access to LCG users according to the certificates listed in the /etc/grid-security/grid-mapfile file. The list of VOs included in grid-mapfile is defined in /opt/edg/etc/edg-mkgridmap.conf. This file is now handled automatically by the mkgridmap LCFG object. This object takes care of enabling only the VOs accepted at each site according to the SE_VO_<VO> definitions in site-cfg.h. If you need to modify the default configuration for your site, e.g. by adding users to grid-mapfile-local, you can do this from your local-cfg.h file by following the examples in <TAG_DIRECTORY>/source/mkgridmap-cfg.h. After installing a ResourceBroker, StorageElement, or ComputingElement node you should force a first creation of the grid-mapfile by running > /opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/grid-mapfile --safe Every 6 hours a cron job will repeat this procedure and update grid-mapfile. UserInterface ------------- No additional configuration steps are currently needed on a UserInterface node. ResourceBroker -------------- -- Configure the MySQL database. See detailed recipe in Appendix C at the end of this document -- Reboot the node ComputingElement ---------------- -- Configure the PBS server. See detailed recipe in Appendix B at the end of this document. -- Create the first version of the /etc/ssh/ssh_known_hosts file by running > /opt/edg/sbin/edg-pbs-knownhosts A cron job will update this file every 6 hours. -- If your CE is NOT sharing the /home directory with your WNs (this is the LCG-2 default configuration: if you have modified site-cfg.h to run PBS in traditional mode as described in a previous chapter, just ignore the following instructions) then you have to configure sshd to allow WNs to copy job output back to the CE using scp. This requires the following two steps: 1) modify the sshd configuration. Edit the /etc/ssh/sshd_config file and add these lines at the end: HostbasedAuthentication yes IgnoreUserKnownHosts yes IgnoreRhosts yes and then restart the server with > /etc/rc.d/init.d/sshd restart 2) configure the script enabling WNs to copy output back to the CE. - in /opt/edg/etc, copy edg-pbs-shostsequiv.conf.template to edg-pbs-shostsequiv.conf then edit this file and change parameters to your needs. Most sites will only have to set NODES to an empty string. - create the first version of the /etc/ssh/shosts.equiv file by running > /opt/edg/sbin/edg-pbs-shostsequiv A cron job will update this file every 6 hours. Note: every time you will add or remove WNs, do not forget to run > /opt/edg/sbin/edg-pbs-shostsequiv <--- only if you do not share /home > /opt/edg/sbin/edg-pbs-knownhosts on the CE or the new WNs will not work correctly till the next time cron runs them for you. -- The CE is supposed to export information about the hardware configuration (i.e. CPU power, memory, disk space) of the WNs. The procedure to collect these informations and publish them is described in Appendix D of this document. -- Reboot the node -- If your CE exports the /home area to all WNs, then after rebooting it make sure that all WNs can still see this area. If this is not the case, execute this command on all WNs: > /etc/obj/nfsmount restart WorkerNode ---------- -- The default allowed maximum number of open file on a RedHat node is only 26213. This number might be too small if users submit file-hungry jobs (we already had one case) so you may want to increase it on your WNs. At CERN we currently use 256000. To set this parameter you can use this command: > echo 256000 > /proc/sys/fs/file-max You can make this setting reboot-proof by adding the following code at the end of your /etc/rc.d/rc.local file: # Increase max number of open files if [ -f /proc/sys/fs/file-max ]; then echo 256000 > /proc/sys/fs/file-max fi -- Every 6 hours each WN needs to connect to the web sites of all known CAs to check if a new CRL (Certificate Revocation List) is available. As the script which handles this functionality uses wget to retrieve the new CRL, you can direct your WNs to use a web proxy. This is mandatory if your WNs sit on a hidden network with no direct external connectivity. To redirect your WNs to use a web proxy you should edit the /etc/wgetrc file and add a line like: http_proxy = http://web_proxy.cern.ch:8080/ where you should replace the node name and the port to match those of your web proxy. Note: I could not test this recipe directly as I am not aware of a web proxy at CERN. If you try it and find problems, please post a message on the lcg-rollout list. -- If your WNs are NOT sharing the /home directory with your CE (this is the default configuration) then you have to configure ssh to enable them to copy job output back to the CE using scp. To this end you have to modify the ssh client configuration file /etc/ssh/ssh_config adding these lines at the end: Host * HostbasedAuthentication yes Note: the "Host *" line might already exist. In this case, just add the second line after it. -- Create the first version of the /etc/ssh/ssh_known_hosts file by running > /opt/edg/sbin/edg-pbs-knownhosts A cron job will update this file every 6 hours. StorageElement -------------- -- Make sure that the storage area defined with CE_CLOSE_SE_MOUNTPOINT exists and contains the VO specific sub-directories with the correct access privileges (group=<VO> and r/w access for the group). -- Reboot the node. PlainGRIS --------- No additional configuration steps are currently needed on a PlainGRIS node. BDII Node --------- The BDII node using the regional GIISes is no longer supported. It has been replaced by the LCG-BDII. LCG-BDII node ------------- This is the current version of the BDII service which does not rely on Regional MDSes. If you want to install the new service then you should use the LCG-BDII_node example file from the "examples" directory. After installation the new LCG-BDII service does not need any further configuration: the list of available sites will be automatically downloaded from the default web location defined by SITE_BDII_URL in site-cfg.h and the initial population of the database will be started. Expect a delay of a couple of minutes from when the machine is up and when the database is fully populated. If for some reason you want to use a static list of sites, then you should copy the static configuration file to /opt/lcg/var/bdii/lcg-bdii-update.conf and add this line at the end of your LCG-BDII node configuration file: +lcgbdii.auto no If you need a group of BDIIs being centrally managed and see a different set of sites than those defined by URL above you can setup a web-server and publish the web page containing the sites. The URL for this file has to be used to configure the SITE_BDII_URL in the site-cfg.h. Leave the lcgbdii.auto to yes. This file has the following structure: ################################################################## # # BDII web configuration file for 24 sites. # # This file has been download from the web. # # ################################################################## Date=02/06/04 22:01 ################################################################## # # Locations of lcg-bdii-update configuration files. # ################################################################## http://grid-deployment.web.cern.ch/grid-deployment/gis/cms-bdii-update.conf ################################################################## # #Ldap URLs for use by bdii # ################################################################## #CERN, Geneva, Switzerland CERN-LCG2 ldap://lxn1181.cern.ch:2135/mds-vo-name=cernlcg2/o=grid #CNAF, Italy CNAF-LCG2 ldap://wn-04-07-02-a.cr.cnaf.infn.it:2135/mds-vo-name=cnaflcg2/o=grid #RAL, UK RAL-LCG2 ldap://lcgce02.gridpp.rl.ac.uk:2135/mds-vo-name=rallcg2/o=grid Change the URL to the the URL of the file. Add or remove sites. To make the BDIIs realize the change you have to change the Date field. Don't forget this. Regional MDS Node ----------------- No more regional MDS nodes are installed since the system based on the LCG-BDII doesn't require them any more. MyProxy Node ------------ -- Reboot the node after installing the host certificates (see "Common Steps" above). Testing ------- IMPORTANT NOTICE: if /home is NOT shared between CE and WNs (this is the default configuration) due to the way the new jobmanager works, a globus-job-run command will take at least 2 minutes. Even in the configuration with shared /home the execution time of globus-job-run will be slightly longer than before. Keep this in mind when testing your system. To perform the standard tests (edg-job-submit & co.) you need to have your certificate registered in one VO and to sign the LCG usage guidelines. Detailed information on how to do these two steps can be found in : http://lcg-registrar.cern.ch/ If you are working in one of the four LHC experiments, then ask for registration in the corresponding VO, otherwise you can choose the "LCG Deployment Team" (aka DTeam) VO. A test suite which will help you in making sure your site is correctly configured is now available. This software provides basic functionality tests and various utilities to run automated sequences of tests and to present results in a common HTML format. Extensive on-line documentation about this test suite can be found in http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG-Certification-help Note that this test suite has not been updated for LCG-2, yet. Nonetheless, all tests related to job submission should work out of the box. In Appendix H you can find some core tests that should be run certify that the site is providing the core functionality. Appendix A ========== Syntax for the MDS_HOST_LIST variable ------------------------------------- This appendix is no longer needed since with the introduction of the LCG-BDII no configuration related to regional MDSs is needed. Appendix B ========== How to configure the PBS server on a ComputingElement ----------------------------------------------------- 1) load the server configuration with this command (replace <CEhostname> with the hostname of the CE you are installing): @----------------------------------------------------------------------------- /usr/bin/qmgr <<EOF set server scheduling = True set server acl_host_enable = False set server managers = root@<CEhostname> set server operators = root@<CEhostname> set server default_queue = short set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server scheduler_iteration = 600 set server default_node = lcgpro set server node_pack = False create queue short set queue short queue_type = Execution set queue short resources_max.cput = 00:15:00 set queue short resources_max.walltime = 02:00:00 set queue short enabled = True set queue short started = True create queue long set queue long queue_type = Execution set queue long resources_max.cput = 12:00:00 set queue long resources_max.walltime = 24:00:00 set queue long enabled = True set queue long started = True create queue infinite set queue infinite queue_type = Execution set queue infinite resources_max.cput = 48:00:00 set queue infinite resources_max.walltime = 72:00:00 set queue infinite enabled = True set queue infinite started = True EOF @----------------------------------------------------------------------------- Note that queues short, long, and infinite are those defined in the site-cfg.h file and the time limits are those in use at CERN. Feel free to add/remove/modify them to your liking but do not forget to modify site-cfg.h accordingly. 2) edit file /var/spool/pbs/server_priv/nodes to add the list of WorkerNodes you plan to use. An example setup for CERN could be: @----------------------------------------------------------------------------- lxshare0223.cern.ch np=2 lcgpro lxshare0224.cern.ch np=2 lcgpro lxshare0225.cern.ch np=2 lcgpro lxshare0226.cern.ch np=2 lcgpro @----------------------------------------------------------------------------- where np=2 gives the number of job slots (usually equal to #CPUs) available on the node, and lcgpro is the group name as defined in the default_node parameter in the server configuration. 3) Restart the PBS server > /etc/rc.d/init.d/pbs_server restart Appendix C ========== How to configure the MySQL database on a ResourceBroker ------------------------------------------------------- Log as root on your RB node, represented by <rb_node> in the example, and make sure that the mysql server is up and running: > /etc/rc.d/init.d/mysql start If it was already running you will just get notified of the fact. Now you can choose a DB management <password> you like (write it down somewhere!) and then configure the server with the following commands: > mysqladmin password <password> > mysql --password=<password> \ --exec "set password for root@<rb_node>=password('<password>')" mysql > mysqladmin --password=<password> create lbserver20 > mysql --password=<password> lbserver20 < /opt/edg/etc/server.sql > mysql --password=<password> \ --exec "grant all on lbserver20.* to lbserver@localhost" lbserver20 Note that the database name "lbserver20" is hardwired in the LB server code and cannot be changed so use it exactly as shown in the commands. Make sure that /var/lib/mysql has the right permissions set (744). Appendix D ========== Publishing WN information from the CE ------------------------------------- When submitting a job, users of LCG are supposed to state in their jdl the minimal hardware resources (memory, scratch disk space, CPU time) required to run the job. These requirements are matched by the RB with the information on the BDII to select a set of available CEs where the job can run. For this schema to work, each CE must publish some information about the hardware configuration of the WNs connected to it. This means that site managers must collect information about WNs available at the site and insert it in the information published by the local CE. The procedure to do this is the following: - choose a WN which is "representative" of your batch system (see below for a definition of "representative") and make sure that the chosen node is fully installed and configured. In particular, check if all expected NFS partitions are correctly mounted. - on the chosen WN run the following script as root, saving the output to a file. @----------------------------------------------------------------------------- #!/bin/bash echo -n 'hostname: ' host `hostname -f` | sed -e 's/ has address.*//' echo "Dummy: `uname -a`" echo "OS_release: `uname -r`" echo "OS_version: `uname -v`" cat /proc/cpuinfo /proc/meminfo /proc/mounts df @----------------------------------------------------------------------------- - copy the obtained file to /opt/edg/var/info/edg-scl-desc.txt on your CE, replacing any pre-existing version. - restart the GRIS on the CE with > /etc/rc.d/init.d/globus-mds restart Definition of "representative WN": in general, WNs are added to a batch system at different times and with heterogeneous hardware configurations. All these WNs often end up being part of a single queue, so that when an LCG job is sent to the batch system, there is no way to ask for a specific hardware configuration (note: LSF and other batch systems offer ways to do this but the current version of the Globus gatekeeper is not able to take advantage of this possibility). This means that the site manager has to choose a single WN as "representative" of the whole batch cluster. In general it is recommended that this node is chosen among the "least powerful" ones, to avoid sending jobs with heavy hardware requirements to under-spec nodes. Appendix E ========== - Modifications to your site-cfg.h file: As LCG-2 contains some major modifications w.r.t. LCG1, the number of changes to site-cfg.h is substantially higher than in the past. Here we report all required changes: please go through them carefully and apply them to your site-cfg.h file. Also consider the possibility of creating a new site-cfg.h file starting from site-cfg.h.template in the tag's examples directory. 1) define the disk area to store LCG-specific software #define LCG_LOCATION_ /opt/lcg #define LCG_LOCATION_VAR_ LCG_LOCATION_/var #define LCG_LOCATION_TMP_ /tmp 2) change the published version to LCG-2_0_0 #define SITE_EDG_VERSION LCG-2_0_0 3) be aware that all regional MDSes are no longer present. This functionality is no longer required In addition there is no need anymore to explicitly define the secondary MDS in your site GIIS (i.e. CE) configuration file. This means that you can remove the following settings, if you have them there: /* Define a secondary top MDS node */ EXTRA(globuscfg.giis) site2 EXTRA(globuscfg.giisreg) site2 globuscfg.localName_site2 SITE_GIIS globuscfg.regName_site2 TOP_GIIS globuscfg.regHost_site2 secondary.mds.node 4) the BDII configuration section now includes the URL to the LCG-BDII configuration file: #define SITE_BDII_URL http://grid-deployment.web.cern.ch/grid-deployment/gis/lcg2-bdii-update.conf 5) location of security-related files and directories is now more detailed. Replace the old section: #define SITE_DEF_HOST_CERT /etc/grid-security/hostcert.pem #define SITE_DEF_HOST_KEY /etc/grid-security/hostkey.pem #define SITE_DEF_GRIDMAP /etc/grid-security/grid-mapfile #define SITE_DEF_GRIDMAPDIR /etc/grid-security/gridmapdir/ with the new one: #define SITE_DEF_GRIDSEC_ROOT /etc/grid-security #define SITE_DEF_HOST_CERT SITE_DEF_GRIDSEC_ROOT/hostcert.pem #define SITE_DEF_HOST_KEY SITE_DEF_GRIDSEC_ROOT/hostkey.pem #define SITE_DEF_GRIDMAP SITE_DEF_GRIDSEC_ROOT/grid-mapfile #define SITE_DEF_GRIDMAPDIR SITE_DEF_GRIDSEC_ROOT/gridmapdir/ #define SITE_DEF_CERTDIR SITE_DEF_GRIDSEC_ROOT/certificates/ #define SITE_DEF_VOMSDIR SITE_DEF_GRIDSEC_ROOT/vomsdir/ #define SITE_DEF_WEBSERVICES_CERT SITE_DEF_GRIDSEC_ROOT/tomcatcert.pem #define SITE_DEF_WEBSERVICES_KEY SITE_DEF_GRIDSEC_ROOT/tomcatkey.pem changing the various paths if needed. 6) the whole "RLS PARAMETERS" section can be removed, i.e. /* RLS PARAMETERS -------------------------------------------------------- ... RLS server, the RLS-cfg.h file must be edited. Sorry. */ 7) the CE_QUEUES parameter is now a space-separated list (in the past it was a comma-separated list): #define CE_QUEUES short long infinite 8) all VO software related parameters have been removed from the CE_IP_RUNTIMEENV parameter definition: #define CE_IP_RUNTIMEENV LCG-2 9) The CE_MOUNTPOINT_SE_AREA and WN_MOUNTPOINT_SE_AREA variables are not used anymore: you can remove them from site-cfg.h. 10) StorageElement configuration is now substantially different from LCG1. Replace in your site-cfg.h file the full "STORAGE ELEMENT DEFINITIONS" section with that from site-cfg.h.template and edit it for your site. 11) a new section is needed to configure the disk areas where VO managers can install VO-related software: /* Area on the WN for the installation of the experiment software */ /* If on your WNs you have predefined shared areas where VO managers can pre-install software, then these variables should point to these areas. If you do not have shared areas and each job must install the software, then these variables should contain a dot ( . ) */ /* #define WN_AREA_ALICE /opt/exp_software/alice */ /* #define WN_AREA_ATLAS /opt/exp_software/atlas */ /* #define WN_AREA_CMS /opt/exp_software/cms */ /* #define WN_AREA_LHCB /opt/exp_software/lhcb */ /* #define WN_AREA_DTEAM /opt/exp_software/dteam */ #define WN_AREA_ALICE . #define WN_AREA_ATLAS . #define WN_AREA_CMS . #define WN_AREA_LHCB . #define WN_AREA_DTEAM . 12) the LCFG-LITE installation is not supported: the "LITE INSTALLATION SUPPORT" section can be removed. 13) AUTOFS is not supported and the corresponding section can be removed 14) The new monitoring system based on GridICE is now included in the default setup. To configure it add to your site-cfg.h file the "GRIDICE MONITORING" section from site-cfg.h.template and edit it (if needed) for your site. 15) A few of the UID/GID defined at the end of the old site-cfg.h file are not used and can be removed. These are: USER_UID_TOMCAT4, USER_GID_TOMCAT4, USER_UID_SE, USER_GID_SE, USER_UID_APACHE, USER_GID_APACHE, USER_UID_MAUI, USER_UID_RTCS, USER_GID_RMS Appendix G =========== Site information needed for the contact data base. Please fill and send to your primary site and the CERN deployment team [log in to unmask] ============================= START ============================= 0) Preferred name of your site --------------------------------------------- I. Communication: =========================== a) Contact email for the site --------------------------------- b) Contact phone for the site --------------------------------- c) Reachable during which hours --------------------------------- d) Emergency phone for the site --------------------------------- e) Site (computer/network)security contact for your site f0) Official name of your institute ----------------------------------- ----------------------------------- f1) Name and title/role of individual(s) responsible for computer/network security at your site ----------------------------------- ----------------------------------- f2) Personal email for f1) ___________________________________ ___________________________________ f3) Telephone for f1) ---------------------------------- ---------------------------------- f4) Telephone for emergency security incident response (if different from f3) ----------------------------------- ----------------------------------- f5) Email for emergency security incident response (listbox preferred) ------------------------------------ g) Write access to CVS The LCG CVS repository is currently moved to a different CVS server. To access this server a CERN AFS account is required. If you have none please contact Louis Poncet ([log in to unmask]) AFS account at CERN: ------------------------------------ ------------------------------------ II) Site specific information a) Domain ----------------------------- e) CA that issued host certificates for your site ____________________________________________________________ ============================ END =============================== Appendix H =========== This has been provided by David Kant <[log in to unmask]> LCG Site Configuration Database and Grid Operation center (GOC) ================================================================ The GOC will be responsible for monitoring the grid services deployed through the LCG middleware at your site. Information about the site is managed by the local site administrator. The information we require are the site contact details, list of nodes and IP addresses, and the middleware deplyed on those machines (EDG, LCG1, LCG2 etc) Access to the database is done through a web browser (https) via the use of an X.509 certificate issued by a trusted LCG CA . GOC monitoring is done hourly and begins with an SQL query of the database to extract your site details. Therfore, it is imoprtant to ensure that the information in the database is ACCURATE and UP-TO-DATE. To request access to the database, load your certificate into your browser and go to: http://goc.grid-support.ac.uk/gridsite/db-auth-request/ The GOC team will then create a customised page for your site and give you access rights to these pages. This process should take less than a day and you will receive an email confirmation. Finally, you can enter your site details: https://goc.grid-support.ac.uk/gridsite/db/index.php The GOC monitoring pages displaying current status information about LCG2: http://goc.grid-support.ac.uk/gridsite/gocmain/ Appendix F ============ This is a collection of basic commands that can be run to test the correct setup of a site. These tests are not meant to be a replacement of the test tools provided by LCG test team. Extensive documentation covering this can be found here: http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG-Certification-help The material in this chapter should enable the site administrator to verify the basic functionality of the site. Testing the UI Testing the CE and WNs Testing the SE Not included in this release: Testing the RB Testing the BDII Testing the Proxy Testing the UI ============== The main tools used on a UI are: 1) Tools to manage certificates and create proxies 2) Tools to deal with the submission and status retrieval of jobs 3) Client tools of the data management. These include tools to transport data and to query the replica location service 1) Create a proxy _______________ The grid-proxy-init command and the other commands used here should be in your path. [adc0014] ~ > grid-proxy-init Your identity: /C=CH/O=CERN/OU=GRID/CN=Markus Schulz 1319 Enter GRID pass phrase for this identity: Creating proxy ................................................................... Done Your proxy is valid until: Mon Apr 5 20:53:38 2004 2) Run simple jobs -------------------- Check that globus-job-run works. First select a CE that is known to work. Have a look at the GOC DB and select the CE at CERN. [adc0014] ~ > globus-job-run lxn1181.cern.ch /bin/pwd /home/dteam002 What can go wrong with this most basic test? If your VO membership is not correct you might be not in the grid-mapfile. In this case you will see some errors that refer to grid security. Next is to see if the UI is correctly configured to access a RB. Create the following files for these tests: testJob.jdl this contains a very basic job description. Executable = "testJob.sh"; StdOutput = "testJob.out"; StdError = "testJob.err"; InputSandbox = {"./testJob.sh"}; OutputSandbox = {"testJob.out","testJob.err"}; #Requirements = other.GlueCEUniqueID == "lxn1181.cern.ch:2119/jobmanager-lcgpbs-short"; testJob.sh contains a very basic test script #!/bin/bash date hostname echo"****************************************" echo "env | sort" echo"****************************************" env | sort echo"****************************************" echo "mount" echo"**************************************** mount echo"****************************************" echo "rpm -q -a | sort" echo"**************************************** /bin/rpm -q -a | sort sleep 20 date run the following command to see which sites can run your job adc0014] ~/TEST > edg-job-list-match --vo dteam testJob.jdl the output should look like: Selected Virtual Organisation name (from --vo option): dteam Connecting to host lxn1177.cern.ch, port 7772 *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* hik-lcg-ce.fzk.de:2119/jobmanager-pbspro-lcg hotdog46.fnal.gov:2119/jobmanager-pbs-infinite hotdog46.fnal.gov:2119/jobmanager-pbs-long hotdog46.fnal.gov:2119/jobmanager-pbs-short lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-infinite lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-long lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-short lcgce02.ifae.es:2119/jobmanager-lcgpbs-infinite lcgce02.ifae.es:2119/jobmanager-lcgpbs-long lcgce02.ifae.es:2119/jobmanager-lcgpbs-short lxn1181.cern.ch:2119/jobmanager-lcgpbs-infinite lxn1181.cern.ch:2119/jobmanager-lcgpbs-long lxn1184.cern.ch:2119/jobmanager-lcglsf-grid tbn18.nikhef.nl:2119/jobmanager-pbs-qshort wn-04-07-02-a.cr.cnaf.infn.it:2119/jobmanager-lcgpbs-dteam tbn18.nikhef.nl:2119/jobmanager-pbs-qlong lxn1181.cern.ch:2119/jobmanager-lcgpbs-short *************************************************************************** If an error is reported rerun the command using the --debug option. Common problems are related to the RB that has been configured to be used as the default RB for the node. To test if the UI works with a different UI you can run the command using configuration files that overwrite the default settings. Configure the two files to use for the test a known working RB. The RB at CERN that can be used is: lxn1177.cern.ch The file that contains the VO dependent configuration has to contain the following: lxn1177.vo.conf [ VirtualOrganisation = "dteam"; NSAddresses = "lxn1177.cern.ch:7772"; LBAddresses = "lxn1177.cern.ch:9000"; ## HLR location is optional. Uncomment and fill correctly for ## enabling accounting #HLRLocation = "fake HLR Location" ## MyProxyServer is optional. Uncomment and fill correctly for ## enabling proxy renewal. This field should be set equal to ## MYPROXY_SERVER environment variable MyProxyServer = "lxn1179.cern.ch" ] and the common one: lxn1177.conf [ rank = - other.GlueCEStateEstimatedResponseTime; requirements = other.GlueCEStateStatus == "Production"; RetryCount = 3; ErrorStorage = "/tmp"; OutputStorage = "/tmp/jobOutput"; ListenerPort = 44000; ListenerStorage = "/tmp"; LoggingTimeout = 30; LoggingSyncTimeout = 30; LoggingDestination = "lxn1177.cern.ch:9002"; # Default NS logger level is set to 0 (null) # max value is 6 (very ugly) NSLoggerLevel = 0; DefaultLogInfoLevel = 0; DefaultStatusLevel = 0; DefaultVo = "dteam"; ] Then run the list match with the following options: edg-job-list-match -c `pwd`/lxn1177.conf --config-vo `pwd`/lxn1177.vo.conf testJob.jdl If this works you should have investigate the configuration of the RB that is selected by default from your UI or the associated configuration files. If the job-list-match is working you can submit the test job using: edg-job-submit --vo dteam testJob.jdl The command returns some output like: Selected Virtual Organisation name (from --vo option): dteam Connecting to host lxn1177.cern.ch, port 7772 Logging to host lxn1177.cern.ch, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g ********************************************************************************************* In case the output of the command has a significant different structure you should rerun it and add the --debug option. Save the output for further analysis. Now wait some minutes and try to verify the status of the job using the command: edg-job-status https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g repeat this until the job is in the status: Done (Success) If the job doesn't reach this state, or gets stuck for longer periods in the same state you should run a command to access the logging information. Please save the output. edg-job-get-logging-info -v 1 https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g Assuming that the job has reached the desired status please try to retrieve the output: edg-job-get-output https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g Retrieving files from host: lxn1177.cern.ch ( for https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lxn1177.cern.ch:9000/0b6EdeF6dJlnHkKByTkc_g have been successfully retrieved and stored in the directory: /tmp/jobOutput/markusw_0b6EdeF6dJlnHkKByTkc_g ********************************************************************************* Check that the given directory contains the output and error files. One common reason for this command to fail is that the access privileges for the jobOutput directory are not correct, or the directory has hot been created. If you encounter a problem rerun the command using the --debug option. 3) Data management tools ---------------------------- Test that you can reach an external SE. Run the following simple command to list a directory at one of the CERN SEs. edg-gridftp-ls gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam You should get a long list of files. If this command fails it is very likely that your firewall setting is wrong. Next see which resources you can see via the information system you should run: [adc0014] ~/TEST/STORAGE > edg-replica-manager -v --vo dteam pi edg-replica-manager starting.. Issuing command : pi Parameters: Call replica catalog printInfo function VO used : dteam default SE : lxn1183.cern.ch default CE : lxn1181.cern.ch Info Service : MDS ............ and a long list of CEs and SEs and their parameters. Verify that the default SE and CE are the nodes that you want to use. Make sure that these nodes are installed and configured before you conduct the tests of more advanced data management functions. If you get almost nothing back you should check the configuration of the replica manager. Us the following command to get the BDII that you are using: grep mds.url /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf this should return the name and port of the BDII that you intended to use. For the CERN UIs you would get: mds.url=ldap://lxn1178.cern.ch:2170 Convince yourself that this is the address of a working BDII that you can reach. ldapsearch -LLL -x -H ldap://<node specified above>:2170 -b "mds-vo-name=local,o=grid" this should return something starting like this: dn: mds-vo-name=local,o=grid objectClass: GlobusStub dn: Mds-Vo-name=cernlcg2,mds-vo-name=local,o=grid objectClass: GlobusStub dn: Mds-Vo-name=nikheflcgprod,mds-vo-name=local,o=grid objectClass: GlobusStub dn: GlueSEUniqueID=lxn1183.cern.ch,Mds-Vo-name=cernlcg2,mds-vo-name=local,o=gr id objectClass: GlueSETop objectClass: GlueSE objectClass: GlueInformationService objectClass: Gluekey objectClass: GlueSchemaVersion GlueSEUniqueID: lxn1183.cern.ch GlueSEName: CERN-LCG2:disk GlueSEPort: 2811 GlueInformationServiceURL: ldap://lxn1183.cern.ch :2135/Mds-Vo-name=local,o=gr id GlueForeignKey: GlueSLUniqueID=lxn1183.cern.ch ................................... In case the query doesn't return the expected output verify that the node specified is a BDII and that the node is running the service. As a crosscheck you can try to repeat the test with one of the BDIIs at CERN. In the GOC DB you can identify the BDII for the production and the test zone. Currently these are lxn1178.cern.ch for the production system and lxnXXXX.cern.ch for the test Zone. Before the edg-replica-manager -v --vo dteam pi command and the edg-gridftp-ls commands are not working it makes no sense to conduct further tests. Assuming that this functionality is well established the next test is to move a local file from the UI to the default SE and register the file with the replica location service. Create a file in your home directory. To make tracing this file easy the file should be named according to the scheme: testFile.<SITE-NAME>.txt the file should be generated using the following script: #!/bin/bash echo "********************************************" echo "hostname: " `hostname` " date: " `date` echo "********************************************" the command to move the file to the default SE is: edg-replica-manager -v --vo dteam cr file://`pwd`/testFile.<SiteName>.txt -l lfn:testFile.<SiteName>.`date +%m.%d.%y:%H:%M:%S` The command returns if everything is setup correctly a line with: guid:98ef70d6-874d-11d8-b575-8de631cc17af Save the guid for further reference and the expanded lfn. We will refer to these as YourGUID and YourLFN. In case this command failed you should keep the output and analyze it with your support contact. There are various reasons why this command has failed. Now we check that the RLS knows about your file. This is done by using the listReplicas (lr) option. edg-replica-manager -v --vo dteam lr lfn:YourLFN this command should return a string with a format similar to: sfn://lxn1183.cern.ch/storage/dteam/generated/2004-04-06/file92c9f455-874d-11d8-b575-8de631cc17af ListReplicas successful. as before, report problems to your primary site.$ If the RLS knows about the file the next test is to transport the file back to your UI. For this we use the cp option. edg-replica-manager -v --vo dteam cp lfn:YourLFN file://`pwd`/testBack.txt this should create in the current working directory a file named testBack.txt. List this file. With this you tested most of the core functions of your UI. Many of these functions will be used to verify the other components of your site. Testing the CE and WNs ====================== We assume that you have setup a local CE running a batch system. On most sites the CE provides two major services. For the information system the CE runs the site GIIS. The site GIIS is the top node in the hierarchy of the site and via this service the other resources of the site are published to the grid. To test the working of the site GIIS you can run an ldap query of the following form. Inspect the output with some care. Are the computing resources (queues, etc. ) correctly reported? Can you find the local SE?. Do these numbers make sense? ldapsearch -LLL -x -H ldap://lxn1181.cern.ch:2135 -b "mds-vo-name=cernlcg2,o=grid" replace lxn1181.cern.ch with your site's GIIS hostname and cernlcg2 with the name that you have assigned to your site GIIS. If nothing is reported try to restart the MDS service on the CE. Now verify that the GRIS on the CE is operating correctly: Here again the command for the CE at CERN. ldapsearch -LLL -x -H ldap://lxn1181.cern.ch:2135 -b "mds-vo-name=local,o=grid" One common reason for this to fail is that the information provider on the CE has as problem. Convince yourself that MDS on the CE is up and running. Run on the CE the qstat command. If this command doesn't return there might be a problem with one of the worker nodes WNs, or PBS. Have a look at the following link that covers some aspects on trouble shooting PBS on the GRID. http://goc.grid.sinica.edu.tw/gocwiki/TroubleShootingHistory The next step is to verify that you can run jobs on the CE. For the most basic test no registration with the information system is needed. However tests can be run much easier if the resource is registered in the information system. For these tests the testZone BDII and RB have been setup at CERN. Forward your site GIIS name and host name to the deployment team for registration. Initial tests that work without registration. First tests from a UI of your choice: As described in the subsection covering the UI tests the first test is a test of the fork jobmanger. adc0014] ~ > globus-job-run <YourCE> /bin/pwd Frequent problems that have been observed are related to the authentication. Check that the CE has a valid host certificate and that your DN can be found in the grid-mapfile. Next logon to your CE and run a local PBS job to verify that PBS is working. Change your id to a user like dteam001. In the home directory create the following file: test.sh #!/bin/bash echo "Hello Grid" run: qsub test.sh this will return a job ID of the form: 16478.lxn1181.cern.ch you can use qstat to monitor the job. However it is very likely that the job has finished before your have queried the status. PBS will place two files in your directory: test.sh.o16478 and test.sh.e16478 These contain the stdout and stderr Now try to submit to one of your PBS queues that are available on the CE. The following command is an example for a site that runs a PBS without shared home directories. The short queue is used. It can take some minutes until the command returns. globus-job-run <YourCE>/jobmanager-lcgpbs -queue short /bin/hostname lxshare0372.cern.ch The next test submits a job to your CE by forcing the broker to select the queue that your have chosen. You can use the testJob JDL and script that has been used before for the UI tests. edg-job-submit --debug --vo dteam -r <YourCE>:2119/jobmanager-lcgpbs-short testJob.jdl The --debug option should only be used if you have been confronted with problems. Follow the status of the job and as before try to retrieve the output. A quite common problem is that the output can't be retrieved. This problem is related to some inconsistency of ssh keys between the CE and the WN. See http://goc.grid.sinica.edu.tw/gocwiki/TroubleShootingHistory and the CE/WN configuration. If your UI is not configured to use a working RB you can, as described in the UI testing subsection use configuration files to use the testZone RB. For further tests get registered with the testZone BDII. As described in the subsection on joining LCG2 you should send your CE's hostname and the site GIIS name to the deployment team. The next step is to take the testJob.jdl that you have created for the verification of your UI. Remove the comment from the last line of the file and modify it to reflect your CE. Requirements = other.GlueCEUniqueID == "<YourCE>:2119/jobmanager-lcgpbs-short"; Now repeat the edg-job-list-match --vo dteam testJob.jdl command known from the UI tests. This output should just show one resource. The remaining tests verify that core of the data management is working from the WN and that the support for the experiment software installation as described in https://edms.cern.ch/file/412781//SoftwareInstallation.pdf is working correctly. The tests you can do to verify the later are limited if you are not mapped to software manager for your VO. To test the data management functions your local default SE has to be setup and tested. Of course you can assume the SE working and run the tests before testing the SE. Add an argument to the JDL that allows to identify the site. The jdl file should look like: testJob_SW.jdl Executable = "testJob.sh"; StdOutput = "testJob.out"; StdError = "testJob.err"; InputSandbox = {"./testJob.sh"}; OutputSandbox = {"testJob.out","testJob.err"}; Requirements = other.GlueCEUniqueID == "lxn1181.cern.ch:2119/jobmanager-lcgpbs-short" ; Arguments = "CERNPBS" ; replace the name of the site and the CE and queue names to reflect your settings. The first script to run collects some configuration information from the WN and test the user software installation area. testJob.sh #!/bin/bash echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo " " $1 " " `hostname` " " `date` echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "the environment on the node" echo " " env | sort echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "software path for the experiments" env | sort | grep _SW_DIR echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "mount" mount echo "+++++++++++++++++++++++++++++++++++++++++++++++++++" echo "=============================================================" echo "veryfiy that the software managers of the supported VOs can write and the users read" echo "DTEAM ls -l " $VO_DTEAM_SW_DIR ls -dl $VO_DTEAM_SW_DIR echo "ALICE ls -l " $VO_ALICE_SW_DIR ls -dl $VO_ALICE_SW_DIR echo "CMS ls -l " $VO_CMS_SW_DIR ls -dl $VO_CMS_SW_DIR echo "ATLAS ls -l " $VO_ATLAS_SW_DIR ls -dl $VO_ATLAS_SW_DIR echo "LHCB ls -l " $VO_LHCB_SW_DIR ls -dl $VO_LHCB_SW_DIR echo "=============================================================" echo "=============================================================" echo "=============================================================" echo "=============================================================" echo "cat /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf" echo "=============================================================" cat /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf echo "=============================================================" echo "=============================================================" echo "=============================================================" echo "=============================================================" echo "rpm -q -a | sort " rpm -q -a | sort echo "=============================================================" date Run this job as described in the subsection on testing UIs. Retrieve the output and verify that the environment variables for the experiment software installation is correctly set and that the directories for the VOs that you support are mounted and accessible. In the edg-replica-manager.conf file reasonable default CEs and SEs should be specified: The output for the CERN PBS might serve as an example: localDomain=cern.ch defaultCE=lxn1181.cern.ch defaultSE=wacdr002d.cern.ch Then a working BDII node has to be specified as the MDS top node: For the CERN production this is currently: mds.url=ldap://lxn1178.cern.ch:2170 mds.root=mds-vo-name=local,o=grid Please keep the output of this job as a reference. It can be helpful if problems have to be located. Next we test the data management. For this the default SE should be working. The following script will do some operations similar to those used on the UI. We first test that we can access a remote SE via simple gridftp commands. Then we test that the replica manager tools have access to the information system. This is followed by exercising the data moving capabilities between the WN, the local SE and between a remote SE and the local SE. Between the commands we run small commands to verify that the RLS service knows about the location of the files. Submit the job via edg-job-submit and retrieve the output. Read the file containing stdout and stderr. Keep the files for reference. Here now a listing of testJob.sh: #!/bin/bash echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo " " $1 " " `hostname` "Start: " `date` echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Can we see the SE at CERN?" echo "-----------------------------------------------------------------------------------------------------" echo "edg-gridftp-ls --verbose gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/mm20 " edg-gridftp-ls --verbose gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/mm20 echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Can we see the information system?" echo "-----------------------------------------------------------------------------------------------------" echo " edg-replica-manager -v --vo dteam pi " edg-replica-manager -v --vo dteam pi echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" lfname=testFile.$1.txt echo "create a local file: " $lfname rm -rf $lfname echo "*********************************************************************************" > $lfname echo "Site: " $1 " hostname: " `hostname` " date: " `date` >> $lfname echo "*********************************************************************************" >> $lfname myLFN=$1.`hostname`.`date +%m.%d.%y:%H:%M:%S` echo "move the file to the default SE and register it with the LFN: " $myLFN echo "-----------------------------------------------------------------------------------------------------" echo "edg-replica-manager -v --vo dteam cr file://"`pwd`"/"$lfname " -l lfn:"$myLFN edg-replica-manager -v --vo dteam cr file://`pwd`/$lfname -l lfn:$myLFN echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "list the replica edg-replica-manager -v --vo dteam lr lfn:"$myLFN echo "-----------------------------------------------------------------------------------------------------" edg-replica-manager -v --vo dteam lr lfn:$myLFN echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" lf2=$lfname.2 echo "get the file back and store it in "$lf2 echo "-----------------------------------------------------------------------------------------------------" rm -rf $lf2 echo "edg-replica-manager -v --vo dteam cp lfn:"$myLFN " file://"`pwd`"/"$lf2 edg-replica-manager -v --vo dteam cp lfn:$myLFN file://`pwd`/$lf2 echo " " echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "cat " $lf2 echo "-----------------------------------------------------------------------------------------------------" cat $lf2 echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Replicate the file from the default SE to the CASTOR service at CERN" echo "-----------------------------------------------------------------------------------------------------" echo "edg-replica-manager -v --vo dteam replicateFile lfn:"$myLFN "-d castorgrid.cern.ch" edg-replica-manager -v --vo dteam replicateFile lfn:$myLFN -d castorgrid.cern.ch echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Was it successful?" echo "-----------------------------------------------------------------------------------------------------" echo "list the replica edg-replica-manager -v --vo dteam lr lfn:"$myLFN edg-replica-manager -v --vo dteam lr lfn:$myLFN echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "3rd party copy from castorgrid.cern.ch to the default SE" echo "-----------------------------------------------------------------------------------------------------" echo "edg-replica-manager -v --vo dteam replicateFile lfn:TheUniversalFile.txt" edg-replica-manager -v --vo dteam replicateFile lfn:TheUniversalFile.txt echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Was it successful?" echo "-----------------------------------------------------------------------------------------------------" echo "list the replica edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt" edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "get this file on the WN" echo "-----------------------------------------------------------------------------------------------------" rm -rf TheUniversalFile.txt echo "edg-replica-manager -v --vo dteam cp lfn:TheUniversalFile.txt file://"`pwd`"/TheUniversalFile.txt" edg-replica-manager -v --vo dteam cp lfn:TheUniversalFile.txt file://`pwd`/TheUniversalFile.txt echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "cat TheUniversalFile.txt" echo "-----------------------------------------------------------------------------------------------------" cat TheUniversalFile.txt echo " " echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" defaultSE=`grep defaultSE /opt/edg/var/etc/edg-replica-manager/edg-replica-manager.conf | cut -d "=" -f 2` echo "remove the replica from the default SE: "$defaultSE echo "-----------------------------------------------------------------------------------------------------" echo "edg-replica-manager -v --vo dteam del lfn:TheUniversalFile.txt -s " $defaultSE edg-replica-manager -v --vo dteam del lfn:TheUniversalFile.txt -s $defaultSE echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Was it successful?" echo "-----------------------------------------------------------------------------------------------------" echo "list the replica edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt" edg-replica-manager -v --vo dteam lr lfn:TheUniversalFile.txt echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo "Ended to run the certification on site: " $1 " at date: " date Testing the SE =============== If the tests described to test the UI and the CE on a site have run successful then there is no additional test for the SE needed. We describe here some of the common problems that have been observed related to SEs. In case the SE can't be found by the edg-replica-manager tools the SE GRIS might be not working, or not registered with the site GIIS. To verify that the SE GRIS is working you should run the following ldapsearch. Note that the hostname that you use should be the one of the node where the GRIS is located. For mass storage SEs it is quite common that this is not the the SE itself. ldapsearch -LLL -x -H ldap://lxn1183.cern.ch:2135 -b "mds-vo-name=local,o=grid" If this returns nothing or very little the MDS service on the SE should be restarted. If the SE returns some information you should carefully check that the VOs that require access to the resource are listed in the GlueSAAccessControlBaseRule field. Does the information published in the GlueSEAccessProtocolType fields reflect your intention? Is the GlueSEName: carrying the extra "type" information? The next major problem that has been observed with SEs is due to a mismatch with what is published in the information system and what has been implemented on the SE. Check that the gridmap-file on the SE is configured to support the VOs that are published the GlueSAAccessControlBaseRule fields. Run a ldapsearch on your site GIIS and compare the information published by the local CE with what you can find on the SE. Interesting fields are: GlueSEName, GlueCESEBindSEUniqueID, GlueCESEBindCEAccesspoint Are the access-points for all the supported VOs created and is the access control correctly configured? The edg-replica-manager command printInfo summarizes this quite well. Here is an example for a report generated for a classic SE at CERN. SE at CERN-LCG2 : name : CERN-LCG2 host : lxn1183.cern.ch type : disk accesspoint : /storage VOs : dteam VO directories : dteam:dteam protocols : gsiftp,rfio to test the gsiftp protocol in a convenient way you can use the edg-gridftp-ls and edg-gridftp-mkdir commands. You can use the globus-url-copy command instead. The -help option describes the syntax to be used. Run on your UI and replace the host and accesspoint according to the report for your SE: edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage drwxrwxr-x 3 root dteam 4096 Feb 26 14:22 dteam and: edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage/dteam drwxrwxr-x 17 dteam003 dteam 4096 Apr 6 00:07 generated if the globus-gridftp service is not running on the SE you get the following message back: error a system call failed (Connection refused) If this happens restart the globus-gridftp service on your SE. Now create a directory on your SE. edg-gridftp-mkdir gsiftp://lxn1183.cern.ch/storage/dteam/t1 Verify that the command ran successful with: edg-gridftp-ls --verbose gsiftp://lxn1183.cern.ch/storage/dteam/ Verify that the access permissions for all the supported VOs are correctly set. Change History ======================== Change History -------------- -merged the document with the how2start guide and added additional material to it. This is the last text based version. Release LCG-2_0_0 (XX/02/2004): Major release: please see release notes for details. Release LCG1-1_1_3 (04/12/2003): - Updated kernel to version2.4.20-24.7 to fix a critical security bug - Removed ca_CERN-old-0.19-1 and ca_GermanGrid-0.19-1 rpms as the corresponding CAs have recently expired - On user request, added zsh back to to the UI rpm list - Updated myproxy-server-config-static-lcg rpm to recognize the new CERN CA - Added oscar-dar rpm from CMS to WN Release LCG1-1_1_2 (25/11/2003): - Added LHCb software to WN - Introduced private-cfg.h.template file to handle sensible settings for the site (only the encrypted root password, for the moment) - Added instructions on how to use MD5 encryption for root password - Added instructions on how to configure http server on the LCFG node to be accessible only from nodes on site - Fixed TCP port range setting for Globus on UI - Removed CERN libraries installation on the UI (added by mistake in release LCG1-1_1_1) - Added instructions to increase maximum number of open files on WNs - Added instructions to correctly set the root password for the MySQl server on the RB - Added instructions to configure WNs to use a web proxy for CRL download

Top of Message | Previous Page | Permalink

JISCMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002


WWW.JISCMAIL.AC.UK

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager