Hi Sjors,
Yes it is.
with kind regards
Mani.
On 2014-04-01 13:32, Sjors Scheres wrote:
> Hi Mani,
> Sounds like you filled your disc with the temporary files that are
> written out at the end of each iteration.
> HTH, S
>
> On 04/01/2014 12:16 PM, Manikandan KARUPPASAMY wrote:
>> Hi all,
>>
>> We are getting the following error during relion_auto_refine run. It
>> happens only after 8th iteration.
>>
>> The command used:
>> $ mpirun -np 188 -machinefile /user/mbotte/.openmpi-farm-hostfile
>> `which relion_refine_mpi` --o Refine3D/140312_2/run1 --i
>> relion/140303_input/all_images_ori.star --particle_diameter 180
>> --angpix 1.36 --ref relion/Refine3D/140305/run1_class001.mrc
>> --flatten_solvent --sym C1 --oversampling 1 --auto_refine
>> --split_random_halves --low_resol_join_halves 40 --healpix_order 3
>> --offset_range 10 --offset_step 2 --auto_local_healpix_order 5 --norm
>> --scale --j 1
>>
>> -------------------
>> Expectation iteration 8
>> 28.61/28.61 hrs
>> ............................................................~~(,_,">
>> MultidimArray::write: File Refine3D/140312_2/run1_rank000187.tmp
>> cannot be opened for output
>> File: ./src/multidim_array.h line: 3945
>>
>> --------------------------------------------------------------------------
>>
>> MPI_ABORT was invoked on rank 187 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>>
>> --------------------------------------------------------------------------
>>
>>
>> [sky50][[24417,1],123][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>>
>> [sky57][[24417,1],171][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>>
>> [sky56][[24417,1],155][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>>
>> --------------------------------------------------------------------------
>>
>> mpirun has exited due to process rank 187 with PID 11357 on
>> node sky58 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>>
>> --------------------------------------------------------------------------
>>
>> ---------------------
>>
>> Please help us to fix this.
>>
>> with kind regards
>>
>> Mani.
|