Hi Min,
first of all, thanks for the quick answer, it helped me a lot! It does
not, however, solve all issues:
On Wed, 6 Jul 2005, Min Tsai wrote:
> 2) There is a bug in the RB now that prevents the use of "lfn" or "guid" for
> InputData for files registered in the LFC. This is fixed in version edg-wl
> 2.1.64 that should be included in LCG2_5_0.
So what can be used then? I tried a SURL (sfn:...), and edg-job-list-match
complained that only lfn or guid can be used. I assume that edg-job-submit
will parse the JDL in the same way. Is the 'InputData' completely useless
in this release then?
Cheers,
Szabolcs
>
> Cheers,
> Min
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
> On Behalf Of Hernath Szabolcs
> Sent: Wednesday, July 06, 2005 5:49 PM
> To: [log in to unmask]
> Subject: [LCG-ROLLOUT] LFC problem
>
> Dear List (Sophie?),
>
> we have trouble with LFC. In order to support a local VO, an LFC-mysql
> node has been installed/configured via yaim (lcg-yaim-2.4.0-4,
> LFC-server-mysql-1.2.6-1sec_sl3). Most things, such as
> registering/creating/removing/linking files work all right, the lcg-*
> commands from various UIs also interact with the LFC well.
> lgc-infosites also returns the LFC and SE for the VO. There is a
> problem with jobs, however: when using InputData requirements for some
> existing files (see below), the jobs cannot be matched. Sample JDL file:
>
> [
> requirements = ( other.GlueCEStateStatus == "Production" );
> RetryCount = 3;
> MyProxyServer = "grid153.kfki.hu";
> Executable = "RemoteInput.sh";
> StdOutput = "stdout";
> OutputSandbox = { "stdout","stderr" };
> VirtualOrganisation = "hungrid";
> rank = -other.GlueCEStateEstimatedResponseTime;
> StdError = "stderr";
> InputSandbox = { "RemoteInput","RemoteInput.sh" };
> InputData = { "lfn:/grid/hungrid/szabi/test.txt" };
> DataAccessProtocol = { "gsiftp" }
> ]
>
> and edg-job-list-match cannot find a suitable CE. Without the
> InputData/DataAccessProtocol constraints, the CEs that support the VO are
> returned. The -debug option of edg-job-list-match did not reveal anything,
> but the logs on the RB contain the following (this is the relevant part of
> /var/edgwl/networkserver/log/events.log):
>
> 06 Jul, 17:27:50 -I- checkRequirement:
> grid109.kfki.hu:2119/jobmanager-lcgcondor-hungrid, Ok!
> 06 Jul, 17:27:50 -E- listReplica(): Replica Manager C++ API: InfoService:
> No service found in InfoService
> 06 Jul, 17:27:51 -E- CommandFactoryServerImpl()::ListJobMatch():
> ListJobMatch done
>
> I searched the web and the documentation(?-)) for the "listReplica()..."
> error above, but couldn't find anything. What could be the problem? I
> suppose some misconfiguration on the RB, but have no clue whatsoever.
> For your information, here is the environment of 'edguser' on the RB:
>
> [root@grid151 log]# su - edguser
> [edguser@grid151 edguser]$ env
> MANPATH=/opt/globus/man::/opt/edg/share/man:/opt/lcg/share/man:/opt/edg/man
> GRIDMAP=/etc/grid-security/grid-mapfile
> HOSTNAME=grid151.kfki.hu
> LCG_LOCATION_VAR=/opt/lcg/var
> SHELL=/bin/bash
> TERM=rxvt
> HISTSIZE=1000
> GLOBUS_PATH=/opt/globus
> GLOBUS_LOCATION=/opt/globus
> EDG_WL_CONFIG_DIR=/opt/edg/etc
> QTDIR=/usr/lib/qt-3.1
> EDG_TMP=/tmp
> GRIDMAPDIR=/etc/grid-security/gridmapdir/
> USER=edguser
> JAVA_INSTALL_PATH=/usr/java/j2sdk1.4.2_04
> LD_LIBRARY_PATH=/opt/lcg/lib:/opt/globus/lib:/opt/edg/lib:/usr/local/lib:/op
> t/globus/lib:/opt/edg/lib:/opt/globus/lib:/opt/edg/lib:/opt/globus/lib:/opt/
> edg/lib
> LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;
> 33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=0
> 1;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:
> *.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;
> 31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.
> gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
> GPT_LOCATION=/opt/gpt
> LCG_LOCATION=/opt/lcg
> EDG_WL_TMP=/var/edgwl
> CLASSADJ_INSTALL_PATH=/usr
> LIBPATH=/opt/globus/lib:/usr/lib:/lib
> EDG_WL_USER=edguser
> MAIL=/var/spool/mail/edguser
> PATH=/usr/java/j2sdk1.4.2_08/bin:/opt/lcg/bin:/usr/kerberos/bin:/opt/globus/
> bin:/opt/globus/sbin:/opt/edg/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/bi
> n:/usr/bin/X11:/usr/X11R6/bin:/opt/gpt/sbin:/opt/edg/bin:/opt/edg/sbin:/opt/
> edg/bin:/opt/edg/sbin:/opt/edg/bin:/opt/edg/sbin:/home/edguser/bin
> EDG_WL_LOCATION=/opt/edg
> CONDOR_CONFIG=/opt/condor/etc/condor.conf
> CONDORG_INSTALL_PATH=/opt/condor
> LCG_TMP=/tmp
> EDG_LOCATION=/opt/edg
> INPUTRC=/etc/inputrc
> PWD=/home/edguser
> LANG=C
> SASL_PATH=/opt/globus/lib/sasl
> PERLLIB=/opt/edg/lib/perl
> SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
> EDG_WL_LOCATION_VAR=/opt/edg/var
> SHLVL=1
> HOME=/home/edguser
> EDG_WL_LOG_DESTINATION=grid151.kfki.hu:9002
> GLOBUS_TCP_PORT_RANGE=20000 25000
> RGMA_PROPS=/opt/edg/etc/rgma
> COG_INSTALL_PATH=/usr
> EDG_LOCATION_VAR=/opt/edg/var
> PYTHONPATH=/opt/edg/lib:/opt/edg/lib/python
> LOGNAME=edguser
> RGMA_HOME=/opt/edg
> LESSOPEN=|/usr/bin/lesspipe.sh %s
> SHLIB_PATH=/opt/globus/lib
> LOG4J_INSTALL_PATH=/usr
> G_BROKEN_FILENAMES=1
> _=/bin/env
>
> And the config for the Workload Manager (opt/edg/etc/edg_wl.conf):
>
> [
> Common = [
> DGUser = "${EDG_WL_USER}";
> HostProxyFile = "${EDG_WL_TMP}/networkserver/ns.proxy";
> UseCacheInsteadOfGris = true;
> ];
> JobController = [
> CondorSubmit = "${CONDORG_INSTALL_PATH}/bin/condor_submit";
> CondorRemove = "${CONDORG_INSTALL_PATH}/bin/condor_rm";
> CondorQuery = "${CONDORG_INSTALL_PATH}/bin/condor_q";
> CondorSubmitDag = "${CONDORG_INSTALL_PATH}/bin/condor_submit_dag";
> CondorRelease = "${CONDORG_INSTALL_PATH}/bin/condor_release";
> SubmitFileDir = "${EDG_WL_TMP}/jobcontrol/submit";
> OutputFileDir = "${EDG_WL_TMP}/jobcontrol/cond";
> Input = "${EDG_WL_TMP}/jobcontrol/queue.fl";
> LockFile = "${EDG_WL_TMP}/jobcontrol/lock";
> LogFile = "${EDG_WL_TMP}/jobcontrol/log/events.log";
> LogLevel = 5;
> ContainerRefreshThreshold = 1000;
> ];
> LogMonitor = [
> JobsPerCondorLog = 1000;
> LockFile = "${EDG_WL_TMP}/logmonitor/lock";
> LogFile = "${EDG_WL_TMP}/logmonitor/log/events.log";
> LogLevel = 5;
> ExternalLogFile = "${EDG_WL_TMP}/logmonitor/log/external.log";
> MainLoopDuration = 10;
> CondorLogDir = "${EDG_WL_TMP}/logmonitor/CondorG.log";
> CondorLogRecycleDir = "${EDG_WL_TMP}/logmonitor/CondorG.log/recycle";
> MonitorInternalDir = "${EDG_WL_TMP}/logmonitor/internal";
> IdRepositoryName = "irepository.dat";
> AbortedJobsTimeout = 600;
> ];
> NetworkServer = [
> II_Port = 2170;
> Gris_Port = 2135;
> II_Timeout = 30;
> Gris_Timeout = 20;
> II_DN = "mds-vo-name=local, o=grid";
> Gris_DN = "mds-vo-name=local, o=grid";
> II_Contact = "grid152.kfki.hu";
> ListeningPort = 7772;
> MasterThreads = 8;
> SandboxStagingPath = "${EDG_WL_TMP}/SandboxDir";
> LogFile = "${EDG_WL_TMP}/networkserver/log/events.log";
> LogLevel = 5;
> BacklogSize = 16;
> EnableQuotaManagement = false;
> MaxInputSandboxSize = 10000000;
> EnableDynamicQuotaAdjustment = false;
> QuotaAdjustmentAmount = 10000;
> QuotaInsensibleDiskPortion = 2.0;
> ];
> WorkloadManager = [
> PipeDepth = 1;
> NumberOfWorkerThreads = 1;
> DispatcherType = "filelist";
> Input = "${EDG_WL_TMP}/workload_manager/input.fl";
> LogLevel = 5;
> LogFile = "${EDG_WL_TMP}/workload_manager/log/events.log";
> MaxRetryCount = 10;
> ];
> ]
>
> Soory for the long post. Any help/hint is appreciated. Thanks,
> Cheers
>
> Szabolcs
>
|