Hi,
We managed to debug our PhEDEx installation yesterday (thanks to those who
helped). Overnight be ran agents for both srmcp and FTS. While we seem to
have good performance from the srmcp we didn't from the FTS ... for
example we had 20 successful FTS transfers and 130 failures, all the
failures that we looked at (and we suspect the rest as well) were due to
it timing out despite having a 3.5Hr time out. FTS had also built
up a huge stack of pending transfers. Fearing that the direct
srmcp transfers were some how spoiling the FTS transfers we stopped
everything, killed the pending FTS transfers and finally restarted only
the FTS agents. The transfers started pretty much immediately, but slowly
... oh so slowly. We currently have rates of between 1 and 2 MB/s
(although the first file that came through after we restarted did manage a
blistering 10MB/s... but sadly it was alone in this). In previous PhEDEx
tests we have achieved rates of greater than 50MB/s (when we hit the
limit of the current firewall)
Now, I know essentially nothing about FTS, but I suspect that this
behaviour indicates that all is not well with the system or, more likely,
our configuration. However, my ignorance of FTS means that I don't know
where to start proding and tweaking so any advice that anybody can offer
as to what we should be doing and looking at would be very much
appreciated.
A little more information on our setup...
We appear to have two channels enabled:
RALLCG2-UKILT2ICHEP
STAR-UKILT2ICHEP
With the current transfers coming through the second one.
The configuration options for our FTS agent are:
### AGENT LABEL=download-fts PROGRAM=Toolkit/Transfer/FileDownload ENVIRON=glite
-db ${PHEDEX_DBPARAM}
-nodes ${PHEDEX_NODE}
-storagemap ${PHEDEX_STORAGEMAP}
-delete ${PHEDEX_CONF}/FileDownloadDelete
-validate ${PHEDEX_CONF}/FileDownloadVerify
-backend SRM
-command ${PHEDEX_SCRIPTS}/Utilities/ftscp,-passfile=${PHEDEX_CONF}/ftspass,-server=${PHEDEX_FTS_SERVER}
-jobs 2
-batch-files 2
-timeout 12600 # 3h30
The number of jobs and batch-files are copied from the RAL configuration,
as is the timeout.
As I said all help will be very much appreciated.
All the best,
david
|