Hi Alessandra,
upgrade from 2.4.21-27.0.4.EL to 2.4.21-32.0.1.EL, should have done
that ages ago, I know, but they are not production boxes.
Running 2.6 for more than a year on my desktop. Tempted to ask about
your experience running pristine 2.6 kernels on production boxes
(if any).
Regards.
--
Jiri
Words written by [log in to unmask] on 27 Jul 2005 at 15:07:08 +0100 prompted:
> Hi Jiri,
>
> just out of curiosity what kernel did you put on the machines?
>
> thanks
>
> cheers
> alessandra
>
> On Tue, 26 Jul 2005, Jiri Mencak wrote:
>
> >Hi all,
> >
> >sorry for replying to my own email, but I thought I'd preserve the
> >``thread''.
> >
> >After giving the dual-homed boxes some time to rest and discussing
> >this with dCache developers, I've given the 3rd party copies (_to_
> >dual-homed boxes) another chance. The weird thing is that they started
> >to work! I admit to having rebooted the boxes for kernel upgrade and
> >therefore restarted dCache, so that might have helped. I spent some
> >time trying to replicate the problem, with no luck unfortunately.
> >Dual-homed dCache (as described below) just works for me now.
> >
> >Thanks and regards.
> >
> >--
> >Jiri
> >
> >Words written by `Jiri Mencak' on 19 Jul 2005 at 13:53:52 +0100 prompted:
> >>Dear all,
> >>
> >>I've played a little bit with dual-homed machines and dCache with mixed
> >>success. Nevertheless, it think it is worth reporting and I'm looking
> >>forward to your feedback.
> >>
> >>Architecture
> >>~~~~~~~~~~~~
> >>Pentium III 600
> >>
> >>OS
> >>~~
> >>Scientific Linux SL Release 3.0.4 (SL)
> >>
> >>dCache
> >>~~~~~~
> >>d-cache-client-1.0-100
> >>d-cache-core-1.5.2-83
> >>d-cache-lcg-5.0.0-1
> >>d-cache-opt-1.5.3-84
> >>(d-cache-gpp-v1.2.1-1)
> >>
> >>I have done a simplified dCache installation using the GridPP storage
> >>dependency RPMs (no BDII etc.) to speed things up, LCG yaim 2.5.0
> >>installation should work equally well.
> >>
> >>Scenario
> >>~~~~~~~~
> >>Admin node: dual-homed box with a /pool on the same box
> >>(I know, 3 dual-homed boxes would be better with no pool on the admin
> >>node,
> >> but this should do as a proof of concept)
> >>Pool node: dual-homed box with a /pool
> >>
> >>Public Interfaces: E0a (admin.public.ac.uk), E0p (pool.public.ac.uk)
> >>Private Interfaces: E1a (192.168.0.32), E1p (192.168.0.33)
> >>
> >>
> >> E0a --------------- E1a
> >> | |
> >> +---| admin |---+
> >> | | /pool | |
> >> | --------------- |
> >> | |
> >> | | Private Net
> >> Public Net ----+----- ----+-----
> >> ------------| switch | | switch |
> >> ----+----- ----+-----
> >> | | | |
> >> | | ---------------- | |
> >> | | | | | |
> >> | +--| pool |---+ |
> >> | | /pool | |
> >> | E0p ---------------- E1p |
> >> | |
> >> | ........................ |
> >> | |
> >> | O T H E R P O O Ls |
> >>
> >>
> >>Installation
> >>~~~~~~~~~~~~
> >>1) Installed SL 3.0.4 and grid certificates
> >>2) Made sure `hostname` returns FQDN associated
> >> with E0a and E0p, in other words, public FQDN.
> >>3) To make internal dCache communication pass through private
> >> interfaces I've set up an internal DNS server to fool admin and
> >> pool nodes into thinking admin.public.ac.uk is 192.168.0.32 and
> >> pool.public.ac.uk is 192.168.0.33.
> >>4) Made sure
> >> `hostname -d` = `grep ^search /etc/resolv.conf | awk '{print $2}'`
> >>5) Set up site-info.def:
> >> MY_DOMAIN=`hostname -d`
> >> DCACHE_ADMIN=<E1a private FQDN>
> >> DCACHE_POOLS="`hostname -f`:2:/pool"
> >>6) Installed dCache using GridPP storage dependency RPMs.
> >>
> >>
> >>Testing
> >>~~~~~~~
> >>globus-url-copy and dCache SRM copy worked fine including third party
> >>copying (get) _from_ dual-homed boxes. Unfortunately, third party
> >>(put) _to_ dual-homed boxes fails. Relevant dCache log snippets
> >>attached.
> >>
> >>
> >>Tier 2 dual-homing requirements
> >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>It would be nice to hear what the architectural requirements from
> >>Tier 2 sites are with regard to dual-homing are. I was working
> >>under the assumption that the purpose of dual-homed machines was
> >>to increase network throughput on the public interface by passing
> >>internal dCache communication through the private interface and to
> >>shield dCache from the outside world and expose only SRM and GridFTP
> >>on the public interface.
> >>
> >>I suspect there will be other/different requirements with regard
> >>to the dual-homed architecture so it would be nice to hear them.
> >>Owen tells me that if you need dual-homing, your setup will almost
> >>certainly be Lightpath on the public interface, and university network
> >>on the private interface.
> >>
> >>I'm now partly leaving dCache support moving onto another project, so I
> >>cannot guarantee I'll be working on dual-homing in the future.
> >>
> >>Regards.
> >>
> >>--
> >>Jiri
> >
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : Failed :
> >>CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) :
> >>CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : at
> >>diskCacheV111.cells.PnfsManager2.getStorageInfo(PnfsManager2.java:950)
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : at
> >>diskCacheV111.cells.PnfsManager2.processPnfsMessage(PnfsManager2.java:1597)
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : at
> >>diskCacheV111.cells.PnfsManager2$ProcessThread.run(PnfsManager2.java:1518)
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : at
> >>java.lang.Thread.run(Thread.java:534)
> >>07/19 10:19:11 Cell(PnfsManager@pnfsDomain) : Error obtaining 'l' flag
> >>for getSimulatedFilesize : java.io.FileNotFoundException:
> >>/pnfs/fs/.(puse)(000100000000000000001120)(2) (Is a directory)
> >>07/19 10:19:12 Cell(PnfsManager@pnfsDomain) : Error obtaining 'l' flag
> >>for getSimulatedFilesize : java.io.FileNotFoundException:
> >>/pnfs/fs/.(puse)(000100000000000000001120)(2) (Is a directory)
> >
> >>07/19 10:16:18 Cell(SRM@srmDomain) : Request id=-2147483523: copy request
> >>state changed to Done
> >>07/19 10:16:18 Cell(SRM@srmDomain) : Request id=-2147483523: changing
> >>fr#-2147483522 to Done
> >>07/19 10:18:35 Cell(SRM@srmDomain) : CopyRequest reqId #
> >>-2147483521Request.createCopyRequest : created new request succesfully
> >>07/19 10:20:08 Cell(SRM@srmDomain) : remoing TransferInfo for
> >>callerId=20000
> >>07/19 10:20:08 Cell(SRM@srmDomain) :
> >>org.dcache.srm.scheduler.NonFatalJobFailure:
> >>CacheException(rc=666;msg=tranfer failed
> >>:org.globus.ftp.exception.ServerException: Server refused performing the
> >>request. Custom message: Server reported transfer failure (error code 1)
> >>[Nested exception message: Custom message: Unexpected reply: 426
> >>Transfer aborted, closing connection :java.net.NoRouteToHostException: No
> >>route to host] [Nested exception is
> >>org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message:
> >>Unexpected reply: 426 Transfer aborted, closing connection
> >>:java.net.NoRouteToHostException: No route to host])
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.request.CopyFileRequest.runRemoteToLocalCopy(CopyFileRequest.java:666)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.request.CopyFileRequest.run(CopyFileRequest.java:770)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Scheduler$JobWrapper.run(Scheduler.java:1121)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>java.lang.Thread.run(Thread.java:534)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : CopyFileRequest #-2147483520: copy
> >>failed
> >>07/19 10:20:08 Cell(SRM@srmDomain) :
> >>org.dcache.srm.scheduler.NonFatalJobFailure:
> >>org.dcache.srm.scheduler.NonFatalJobFailure:
> >>CacheException(rc=666;msg=tranfer failed
> >>:org.globus.ftp.exception.ServerException: Server refused performing the
> >>request. Custom message: Server reported transfer failure (error code 1)
> >>[Nested exception message: Custom message: Unexpected reply: 426
> >>Transfer aborted, closing connection :java.net.NoRouteToHostException: No
> >>route to host] [Nested exception is
> >>org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message:
> >>Unexpected reply: 426 Transfer aborted, closing connection
> >>:java.net.NoRouteToHostException: No route to host])
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.request.CopyFileRequest.run(CopyFileRequest.java:798)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Scheduler$JobWrapper.run(Scheduler.java:1121)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java)
> >>07/19 10:20:08 Cell(SRM@srmDomain) : at
> >>java.lang.Thread.run(Thread.java:534)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : CopyRequest reqId #
> >>-2147483521copyRequest getter_putter is non null, stopping
> >>07/19 10:20:36 Cell(SRM@srmDomain) : CopyRequest reqId #
> >>-2147483521changing fr#-2147483520 to Failed
> >>07/19 10:20:36 Cell(SRM@srmDomain) : CopyRequest reqId # -2147483521error
> >>:
> >>07/19 10:20:36 Cell(SRM@srmDomain) :
> >>org.dcache.srm.scheduler.IllegalStateTransition: g illegal state
> >>transition from Canceled to Failed
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Job.setState(Job.java:532)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Job.setState(Job.java:417)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.request.CopyRequest.stateChanged(CopyRequest.java:952)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Job.setState(Job.java:566)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.scheduler.Job.setState(Job.java:417)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.request.Request.getRequestStatus(Request.java:521)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>org.dcache.srm.SRM.getRequestStatus(SRM.java:868)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>diskCacheV111.srm.server.SRMServerV1.getRequestStatus(SRMServerV1.java:360)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>java.lang.reflect.Method.invoke(Method.java:324)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.util.reflect.Invocation.execute(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.util.reflect.Invocation.invoke(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.service.object.ObjectService.invoke(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.soap.SOAPMessage.invoke(SOAPMessage.java:534)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.soap.SOAPMessage.invoke(SOAPMessage.java:508)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.soap.http.SOAPHTTPHandler.service(SOAPHTTPHandler.java:88)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.server.http.ServletServer.service(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.servlet.Config.service(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.http.HTTPContext.service(HTTPContext.java:84)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.servlet.ServletContainer.service(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.http.WebServer.service(WebServer.java:87)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.socket.SocketServer.run(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.net.socket.SocketRequest.run(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>electric.util.thread.ThreadPool.run(Unknown Source)
> >>07/19 10:20:36 Cell(SRM@srmDomain) : at
> >>java.lang.Thread.run(Thread.java:534)
> >
>
> --
> ********************************************
> * Dr Alessandra Forti *
> * Technical Coordinator - NorthGrid Tier2 *
> * http://www.hep.man.ac.uk/u/aforti *
> ********************************************
--
Jiri
|