Print

Print


Hi,

Here at CCIN2P3, we had the same problem.
124 lhcb jobs has been stucked from last saturday till yesterday afternoon.
French LHCb guys said that the jobs were sleeping because of a down server in CERN.
We requeued all the sleeping jobs and they are now running.

Cheers,

David.

Dimitris Zilaskos wrote:
[log in to unmask]" type="cite">       Hello ,

       I have a number of lhcb jobs sitting in my queue . They have been
siting in that exact stage for more than 12 hours (the Time Use counter
is not increasing and the process that was eating cpu appears to have
completed its task). They appear to be waiting for something ( user
intervention?).
       There were some same jobs 3-4 days ago that exhibited the same
behaviour but after around another 12 hours the jobs exited
successfully.I have mailed Ricardo Graciani who appears to have
submitted those jobs but I got no response. If someone knows what is
going on ... because our queues have been filled for days and no other
jobs cat run (ie the job submission tests)

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
8.node001        STDIN            lhcb001          27:01:05 R infinite
9.node001        STDIN            lhcb001          27:37:44 R infinite
10.node001       STDIN            lhcb001          27:08:25 R infinite
11.node001       STDIN            lhcb001          27:33:07 R infinite
12.node001       STDIN            lhcb001          25:59:44 R infinite
13.node001       STDIN            lhcb001          26:29:33 R infinite
14.node001       STDIN            lhcb001          27:52:40 R infinite
16.node001       STDIN            lhcb001          27:13:36 R infinite
17.node001       STDIN            lhcb001                 0 Q infinite
18.node001       STDIN            lhcb001                 0 Q infinite
19.node001       STDIN            lhcb001                 0 Q infinite
20.node001       STDIN            lhcb001                 0 Q infinite
21.node001       STDIN            lhcb001                 0 Q infinite
23.node001       STDIN            dteam004                0 Q short
(...)


Best regards ,
--
============================================================================

Dimitris Zilaskos

Department of Physics @ Aristotle Univercity of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
          http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum  : de2bd8f73d545f0e4caf3096894ad83f  pgp_public_key.asc
============================================================================

--
Signature_mail David BOUVET
Applications Support Coordinator - EGEE Project team
IN2P3/CNRS Computing Centre - Lyon (FRANCE)
http://grid.in2p3.fr
Tel. :
+33 4 72 69 41 62 | Fax. : +33 4 72 69 41 70 | e-mail : [log in to unmask]