Hello, all,
attached you will find the release note for the LCG2
Dec/2004 upgrade (the "Xmas present").
Note: this is an internal release which will form
the basis of the next production release.
The following are the major points of this upgrade
(for details refer to the Release Note):
- WN for IA64 has been certified and released, all
other services will be released for IA64 in the
next upgrade.
- A site can now mix RH7.3 service nodes with SLC3
service nodes, e.g. the typical configuration could
be all service nodes running RH7.3 with worker nodes
running SLC3 or IA64, or service nodes on SLC3 with
worker nodes on RH7.3 or IA64.
- The inter-site operability has been verified, sites
running different O/S's will be able to communicate.
- dCache:
-------
dCache software has been certified and is packaged
as part of the LCG2 release for the first time. It
is hoped this can simplify sites installation for
simple configurations. Alternatively sites can still
take the dCache release rpms from http://www.dcache.org
and configure it themselves.
- LFC (LCG File Catalog):
New high performance LCG File Catalog.
Both EDG (old) and LFC (new) file catalogs are
included and all clients (GFAL, lcg-util) tools
support both catalogs, selectable by every user job
via an environmental variable.
- VOMS
----
The CE and Classic SE now support VOMS access
through VOMS proxies as well as gridmapfile access
through normal proxies.
- DPM (Disk Pool Manager)
-----------------------
This is the very first release (*ALPHA* version),
not meant to be used yet by anybody without proper
coaching by the GD. It has been added to the release
to integrate the software in the build process (this
will make life easier later) and to simplify
communications with those sites that will eventually
be willing to serve as the external "remote" testing
sites.
- Provides socket, SRM v1 and SRM v2 control interfaces
- Integrated GSI Authentication for SRM
- Access Control Lists (Unix Permissions and POSIX ACLs)
- Supports Oracle and MySQL database backends
- Easy to install and manage
- Distribute alpha release with reference man pages.
- big and small fixes, as always
------------
This is the last LCG2 release of 2004 and also the last
release of the current Certification & Testing team of
the Grid Deployment group in its current configuration.
We are reorganizing to be more suitably prepared for the
coming gLite challenge.
Let me take this opportunity to thank, in big terms,
to all members of my "old" team, who for the last two
years withstand all the challenges and high stress
when trying to bring the EDG middleware to the current
level of quality, via LCG0, LCG1 and finally LCG2, which
was used for the Data Challenges this year.
Their devotion for the task was without limits.
It was a great pleasure for me to lead this totally
dedicated team:
Marco SERRA (INFN) - the architect, the "body and soul" of
the certification testbed
Piera BETTINI (INFN) - who made software presented as
"fully tested" running even better
Di QING (IOPAS) - who would never miss the slightest
testbed configuration detail
Louis PONCET (IN2P3) - our sysadmin, who saw
red when FIO started managing our SLC3 testbed
Gilbert GROSDIDIER (IN2P3) - whose testing
suite uncovered more problems than we ever
wanted to know
Frederique CHOLLET (IN2P3) - the other half of the testing
team for a long time
David SMITH (CERN) - who made even some Globus
features work
Maarten LITMAATH (CERN) - who would chase every bug
until it vanished
Jean-Philippe BAUD (CERN) - who just wouldn't stop
coding until we got GFAL, lcg-util, LFC, DPM
Zdenek SEKERA (CERN) - who tried to hold all together
Our thanks go also to our visitors from Russia and Taiwan
and Prague for helping in critical areas.
Merry Christmas and a Happy New Year to all!
---Zdenek (for the Certification & Testing team)
RELEASE NOTE , Dec/17, 2004
----------------------------------------
| |
| December/2004 LCG2 release candidate |
| |
----------------------------------------
The December/2004 LCG2 release candidate has been certified and tagged as:
lcg20041217_1327_DecRel
for the release to the GIS section on Dec/17, 2004.
This Christmas present contains the usual RH7.3 and the SLC3 (Scientific Linux
CERN 3.0) releases well as (for the first time) the port of WN to IA64
architecture.
The release can be obtained from the usual places.
It is now installed on the C&T testbed and has undergone the certification
testing.
Major points:
=============
- WN for IA64 has been certified and released, all other services will be
released for IA64 in the next upgrade.
- A site can now mix RH7.3 service nodes with SLC3 service nodes, e.g. the
typical configuration could be all service nodes running RH7.3 with worker
nodes running SLC3 or IA64, or service nodes on SLC3 with worker nodes
on RH7.3 or IA64. The latter configuration may be of some interest given the
facts the service nodes are exposed to the external connections and
consequently the security issues may be much bigger than on worker
nodes; the kernel security patches are coming faster for SLC3 while
some sites may have a big problem obtaing those patches for the RH7.3.
The worker nodes can be added to either RH7.3 or SLC3 (IA32) service
nodes.
- The inter-site operability has been verified, sites running different
O/S's will be able to communicate.
- SLC3 service nodes (only IA32 supported in this release) can be installed
only manually, installation using LCFGng is not supported. The manual
installation scripts have been upgraded and called now the YAIM installation.
- SLC3 worker nodes (both IA32 and IA64) can be installed only manually,
installation using LCFGng is not supported. The manual installation scripts
have been upgraded and called now the YAIM installation.
- dCache:
-------
dCache software (client as well as server) can run on either of
RH7.3 or SLC3 (IA32) O/S.
- LFC (LCG File Catalog):
-----------------------
New high performance LCG File Catalog
- Based on lessons learned in DC's in last few months
- Fixes performance and scalability problems seen in EDG Catalogs
Cursors for large queries
Timeouts and retries from the client
- Provides more features than the EDG Catalogs
User exposed transaction API
Hierarchical namespace and namespace operations
Integrated GSI Authentication + Authorization
Access Control Lists (Unix Permissions and POSIX ACLs)
Checksums
- Based on existing code base
Supports Oracle and MySQL database backends
- Integration with GFAL and lcg_util complete
- POOL Integration will be provided (January 2005)
- Client Line Interface (CLI) is included in this release as well
Both EDG (old) and LFC (new) file catalogs are included and all clients
(GFAL, lcg-util) tools support both catalogs, selectable by every user job
via an environmental variable, which (as released) defaults to the old one.
We have started migration with experiments of the old catalog to the new one.
For details about the design, implementation and the performance of the
new file catalog see Jean-Philippe Baud presentation at the CHEP2004.
Find more information also in GFAL/lcg-util README.
- VOMS
----
The CE and Classic SE now support VOMS access through VOMS proxies
as well as gridmapfile access through normal proxies.
- DPM (Disk Pool Manager)
-----------------------
This is the very first release (*ALPHA* version), not meant to be used
yet by anybody without proper coaching by the GD. It has been added to the
release to integrate the software in the build process (this will make
life easier later) and to simplify communications with those sites that
will eventually be willing to serve as the external "remote" testing sites.
- Provides socket, SRM v1 and SRM v2 control interfaces
- Integrated GSI Authentication for SRM
- Access Control Lists (Unix Permissions and POSIX ACLs)
- Supports Oracle and MySQL database backends
- Easy to install and manage
- Distribute alpha release with reference man pages.
- Tank & Spark
------------
New Experiment Software Installation Tool called 'Tank & Spark', designed
with experiments following their wishes and requirements.
- Number of bigger and smaller bug fixes, as always.
For the full details see below.
Build system:
-------------
In order to guarantee the full maintainability and repeatability of the
build process, the following setup is used:
- a CVS server containing the source tree. The tree is common for both
O/S's and will continue to be common for any other O/S we will add to the
release in the future such as IA64.
This guarantees all patches are immediately available to the middleware
of all O/S's for which the software is build.
- every O/S for which we build the middleware has its own 'build machine',
running the O/S for which we have to build (currently we have two build
machines, one for each RH7.3 and SLC3). The build machine accesses the
CVS server to obtain the relevant software.
To apply the specific environment for each platform (flags for compilers,
type of architecture, release suffixe for rpm's) two variables are set
to identify the platform:
OSVERSION = (rh7.3", "sl3") and ARCH = ("i386", "ia64").
Rpm's for SLC3 contain the suffix "_sl3" in order to distiguish them from
the RH7.3 version.
Summary of changes with respect to the previous LCG2 Oct/2004 release:
======================================================================
Note: 3-digit numbers are Savannah *patch* numbers
4-digit numbers are Savannah *bug* numbers
- VDT:
-------
No change.
- CondorG:
--------
No change.
- Information System:
------------------
290 - Updated BDII to version 3.1.12
- Information Providers:
----------------------
296 - lcg-info-dynamic-lsf upgraded to version 1.0.1
- Workload Management System
--------------------------
Updated to version lcg2.1.58 (lcg2.1.59-3 on IA64 WN)
Changes with respect to previous version, lcg2.1.54
3987 - Try to finish jobs exiting with globus 155?
4319 - Suggestion for change of policy on when to resubmit
4388 - WP1 on IA64: correct pointer casts in sources
4836 - locallogger 'error getting event's jobid'
5237 - bkpurge timeout problem
5238 - Restricting permissions in reduced part subdirectories
5244 - distribution of jobs among multiple Brokers
5269 - Wrong logging notice in CommandFactoryServerImpl
5274 - Interface Resource Broker to Dataset catalogue (use the
DataLocationInterface)
5348 - RPM configuration detection/checkfiles.c for WMS building/SL3
5350 - Change minimum proxy time for submission
5351 - WMS uninitialised variable
5384 - myproxy client and TCP_PORT_RANGE
5427 - JobAd bug
5488 - not able to match the closeCE starting from the SE name
5750 - Mechanism in WMS to specify middleware version
Some of the changes in this version of the WMS refer to the implementation
or preparation of new features. By default these are not enabled and the
previous behavior is retained.
The new version adds support for the Data Location Interface (DLI). Input
data types 'lds' & 'query' have been added, which are specific to the DLI.
The previous input data types of 'guid' & 'lfn' will use the RLS by default
(as previously) or optionally may be configured to use the DLI on a per VO
basis. The DLI endpoint may be specified either by the user in the JDL or
may be retrieved from the information system. For details of the
configuration options available see the DLI specific documentation.
Remember that when upgrading one should ensure that the WMS services are
restarted, to allow the new version to become active.
Known problems:
1. There are known weaknesses in the startup scripts for the WMS services.
After rebooting or restarting services care should be taken to ensure
that the services are running.
2. There is a very small chance that canceling a job multiple times can
crash the workload manager. In this case the workload manager will be
automatically restarted after a few minutes.
- LCG Job Managers:
-----------------
No change.
- Data Management:
----------------
Upgraded to version 1.7.13.
6065 - edg-rm has bad interaction with new dCache
- dCache
------
The following dCache rpms have been verified to work with the other
middleware included in this release:
pnfs-3.1.10-12
d-cache-client-1.0-24
d-cache-core-1.5.2-26
d-cache-opt-1.5.3-2
There is a list of open problems, but none of them was deemed to be
a blocker at this time.
On the UI and the WN only the d-cache-client rpm is installed.
A dCache SE consists of an "admin" node which e.g. runs the SRM,
and zero or more "pool" nodes providing disk pools (the admin node
can also provide a pool itself). On each of the nodes one can
configure the necessary services using this rpm:
d-cache-lcg-4.0.0-1
Please refer to /opt/d-cache-lcg/install/README for instructions.
The LCG deployment team will provide support only for configurations
that are considered "standard" (1 pool per node, defaults for most
parameters, etc.). Issues with non-standard configurations will
be forwarded to the dCache developers at http://www.dcache.org;
please refer to that site for documentation and updates in between
LCG releases.
dCache software (client as well as server) can run on either of
RH7.3 or SLC3 (IA32) O/S.
Outstanding issues:
-------------------
- files cannot be overwritten
- gridftp data channel authentication absent
- pinning method not supported
- non-existent paths do not give clear errors
- writing a file hangs if the disk is full
- rpms not relocatable
- core dump when gsidcap port not supplied
- no manual garbage collection
- dc_rename entry point missing in dcap library
- TURL for writing returned even when no space available
- SRM getFileMetaData info incomplete, bad error strings
- admin guide and user documentation to be improved
- logfiles to be cleaned up
- dc_opendir can return non-null for non-existent directory
- dc_readdir results incorrect for short paths
- there is no easy way to add a gridftp door node
- VOMS
----
The CE and Classic SE now support VOMS access through VOMS proxies
as well as gridmapfile access through normal proxies.
VOMS is tried first; if it fails, the gridmapfile is tried.
VOMS access is provided via an LCMAPS plugin. The LCMAPS mechanism
was already used by the edg-gatekeeper, but not by the globus gridftp
daemon; in its place the CE and Classic SE now run the edg-gridftpd,
which does use LCMAPS. The LCMAPS configuration files for edg-gridftpd
are provided in the lcg-lcas-lcmaps rpm; some sites may want to modify
the files slightly, e.g. allowing for non-LHC VOs.
Note:
-----
The LCAS (!) configuration should *not* refer to VOMS at all,
otherwise the fallback on the gridmapfile will not work!
- DPM (Disk Pool Manager)
-----------------------
Intial release version 1.0.3.
This is the very first release (*alpha* version), not meant to be used
yet by anybody without proper coaching by C&T. It has been added to the
release to integrate the software in the build process (this will make
life easier later) and to simplify communications with those sites that
will eventually be willing to serve as the external "remote" testing sites.
- Provides socket, SRM v1 and SRM v2 control interfaces
- Integrated GSI Authentication for SRM
- Access Control Lists (Unix Permissions and POSIX ACLs)
- Supports Oracle and MySQL database backends
- Easy to install and manage
- Distribute alpha release with reference man pages.
Known problems:
---------------
- the socket interface has been extensively tested, a few small problems
remain to be fixed
- SRM v1
All unit tests passed, the stress testing has not been completed yet.
- SRM v2
Code written and integrated, unit testing has started.
Known issues:
-------------
- need new rfio and gsi security modules (development not completed yet)
- GFAL:
-----
Upgraded to GFAL version 1.5.2.
- add errbuf and errbufsz to argument list to receive detailed error message
- lcg-util:
---------
lcg-util upgraded to version 1.2.1.
- add errbuf and errbufsz to argument list to receive detailed error message
4757 - ignore error EEXIST when creating directories. They could be
created by another process running at the same time.
- LFC (LCG File Catalog)
----------------------
Upgraded to version 1.0.3.
- distribute also the Client Line Interface
- issue more understandable error messages in the server
Known problems:
- it doesn't take into account the LFC_HOME environmental variable so all
LFNs must be an absolute path.
- R-GMA:
-----
293 - updated to version 3.4.36 to support GIP and FMON and add fixes
None - added patch from L Field to edg-rgma-config-common
None - added log rotate for the servlets log
None - tomcat would not shut down, causes machine to hang
None - changes so gin can use gip and fmon
5086 - edg-rgma-run-examples now copes with user names 8 char
Note: there is no RGMA for IA64 yet.
- APEL accounting:
----------------
Update to new version 3.4.38
- Monitoring (Grid ICE):
---------------------
No change.
- WN on IA64:
-----------
- Only one batch system tested : PBS
Others (Torque, Condor) expected in near future.
- No RGMA packages yet, next release will be also compiled for IA64
- No GridIce packages yet
- Tank & Spark
------------
New experiment installation tool, designed with experiments following their
wishes and requirements.
Note:
-----
Only for RH7.3:
When using the installation using the LCFGng server, the following manual
steps are needed on the CE:
a. /etc/rc.d/init.d/mysql start
b. mysqladmin password <yourpassword>
c. mysqladmin -h <yourCE.yourdomain> password <yourpassword>
d. mysql -u root -p </opt/lcg/etc/tankspark/command.sql
- Others:
------
CA updated to verion 0.25
================================================================================
===============
- KNOWN PROBLEMS:
===============
This paragraph contains a non-exhaustive list of problems which we
were not able to fix in time, however, workarounds or other help
has been found to help.
In general, these problems will be fixed in the next release.
1. There are known weaknesses in the startup scripts for the WMS services.
After rebooting or restarting services care should be taken to ensure
that the services are running.
2. There is a very small chance that canceling a job multiple times
can crash the workload manager or proxy renewal service. In this case
the workload manager will be automaticaly restarted after a few minutes,
but the proxy renewal service will not be. It has to be restarted by hand.
---Zdenek (for the CERN IT-GD LCG Certification & Testing section)
|