Here's more from biomed. When I asked why there were still problems he
also said "Yes, since this week-end the SE clrauvergridse01 is not
responding and I am currently canceling all the jobs that are running."
Stephen
-------- Original Message --------
Subject: Re: Problems with inefficient biomed jobs
Date: Mon, 25 Feb 2008 10:43:12 +0100
From: Vincent Bloch <[log in to unmask]>
Dear Stephen,
Sorry for the late reply.
As Johan explained, we are using pilot jobs that need to query a web
service and an AMGA database to get the proper inputs.
There are 3 critical process in the job script that can lead to idle
jobs:
- the query to the web service to get a task number
- the query to the AMGA database to get the grid location of the inputs
- the retrieval of the inputs from the SE
If the web service is unable to give a task to the job, the job will
retry a few time and then stops. This should not lead to a big idle
time.
The query to the AMGA database and the retrieval from the SE can be
more problematic because if for any reason one of these service
doesn't work, the job will ask for a new task it gets killed.
We had some problem around february 8th with a SE that was not
responding. As far as I know this could explain the idle time of the
jobs since the others services were working correctly.
If this is the case then the problem is more a general grid problem
rather than an issue with the way we use the grid.
I hope this explains the problem
Best regards
Vincent
--
Dr. Stephen Childs,
Research Fellow, EGEE Project, phone: +353-1-8961797
Computer Architecture Group, email: Stephen.Childs @ cs.tcd.ie
Trinity College Dublin, Ireland web: http://www.cs.tcd.ie/Stephen.Childs
|