On Oct 5, 2007, at 5:45 AM, Dave Flitney wrote:
> Hi Lokke,
>
> On 4 Oct 2007, at 22:11, Lokke Highstein wrote:
>
>> one thing is that the paths to the data we are working on are
>> slightly different on the head nodes and the cluster nodes, due to
>> the head node having the data mounted off of an xraid (and showing
>> up in /Volumes, which i then made symbolic links to the /
>> directory) and the nodes mounting the same directories at /nfs/
>> with symbolic links to /
>
> Could you confirm:
>
> On the head node /data_directory is a link to /Volumes/data_directory
> On compute node /data_directory is a link to /nfs/data_directory
well i was using placeholder names (data_directory) but the actual
data in this case lives in /Volumes/Clinical/data/lokke_test/
tky_13dir on the head node.
also on the head node /data is a link to /Volumes/Clinical/data
on the cluster node it's in /nfs/Clinical/data/lokke_test/tky_13dir
and on that node /data is a link to /nfs/Clinical/data
FYI the use of the name "Clinical" is just bad naming, it isn't
actually clinical data there, but not it's hard for me to change the
name at this point (long story, not worth telling.)
>> i have set up sge_aliases which are supposed to solve this, but
>> still it seems that when we submit a bedpost job through the GUI
>> (which forces the full path to be used - even if we type in the /
>> data_directory path it then resolves the actual full path) it ends
>> up in an error state in the queue.
>
> Could you send me the output of: 'cat ${SGE_ROOT}/default/common/
> sge_aliases'
> Also an example of the full error message (try 'qstat -j <jobid> -
> xml | grep -i QIM_message')
#___INFO__MARK_BEGIN__
########################################################################
##
#
# The Contents of this file are made available subject to the terms of
# the Sun Industry Standards Source License Version 1.2
#
# Sun Microsystems Inc., March, 2001
#
#
# Sun Industry Standards Source License Version 1.2
# =================================================
# The contents of this file are subject to the Sun Industry Standards
# Source License Version 1.2 (the "License"); You may not use this file
# except in compliance with the License. You may obtain a copy of the
# License at http://gridengine.sunsource.net/
Gridengine_SISSL_license.html
#
# Software provided under this License is provided on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
# WITHOUT LIMITATION, WARRANTIES THAT THE SOFTWARE IS FREE OF DEFECTS,
# MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, OR NON-INFRINGING.
# See the License for the specific provisions governing your rights and
# obligations concerning the Software.
#
# The Initial Developer of the Original Code is: Sun Microsystems, Inc.
#
# Copyright: 2001 by Sun Microsystems, Inc.
#
# All Rights Reserved.
#
########################################################################
##
#___INFO__MARK_END__
#
# Template Grid Engine path aliasing configuration file
#
# The following entry aliases physical address as generated by
automounter
# (with a leading /tmp_mnt) to the logical path (w/o leading /tmp_mnt).
#
# subm_dir subm_host exec_host path_replacement
/tmp_mnt/ * * /
/Volumes/Clinical/ * * /
/export/data/ * * /
<QIM_message>10/05/2007 15:25:55 [813:25899]: error: can't open
output file "/Volumes/Clinical/data/lokke_test/
tky_13dir.bedpostX/logs": No such file or directory</QIM_message>
> Assuming the links are as above, then I think sge_aliases should
> contain "/Volumes/data_directory * * /data_directory", however, we
> re-engineered our setup to avoid these problems, i.e., our /Volumes/
> Data is mounted as /Volumes/Data everywhere, so I'm not entirely
> certain of this.
>
>> when we submit the job via command line with the full path it
>> fails. when we submit the job with the /data_directory path it
>> works.
>>
>> i also want to run FEEDS on this to get more info, but it doesn't
>> seem to submit the job to the cluster and only runs on the head
>> node. is there a way to submit the FEEDS job to the cluster?
>
> The FEEDS script unsets SGE_ROOT at the start. Just comment this
> liine out and replace it with 'exec $FSLTCLSH :$0" "$@"'. It should
> then run the SGE aware subsections using available SGE slots. Note
> that some scripts will then exit immediately leaving just their
> processing tasks on the queues. To work out how long everything
> takes you'll need to manually add up all the runtimes (use qacct)
> and/or observe the wall-clock time data.
i'll worry about FEEDS on monday, i was trying to run it to give you
guys as much info as possible.
also, i tried executing the job from the GUI, and after getting all
the data loaded, i cut the /Volumes/Clinical out of the input
directoy box (leaving /data/lokke_test/tky_13dir) and it worked. so
the path is definitely the issue, not the GUI. so why wouldn't the
sge_aliases solve this?
thanks for the help.
lokke
|