Hi Takanori,
It does not print a warning in the current version - maybe part of the same bug?
See below - it tells me 95k particles are present on the scratch disk, but the full stack is 145k, and when sufficient space is available it copies the whole stack as expected.
Cheers
Oli
log file snip:
+ On host ubuntu: free scratch space = 138 Gb.
Copying particles to scratch directory: /scratch/relion_volatile/
9.22/9.22 min ............................................................~~(,_,">
For optics_group 1, there are 95443 particles on the scratch disk.
> On Oct 25, 2019, at 11:37 AM, Takanori Nakane <[log in to unmask]> wrote:
>
> Hi,
>
> It does print warning.
>
>> Warning: scratch space full on XXX. Remaining YYY particles will be
> read from where they were.
>
> The bug in 3.1 is that it does not read from where they were, but crashes.
> I will investigate this next week.
>
> Best regards,
>
> Takanori Nakane
>
>> Hi Takanori,
>>
>> Thanks for the quick reply!
>>
>> Relion of course can’t do anything if another process starts writing - but
>> perhaps another solution might be to at least print a warning to the log
> fil
>> e? Saying that there is not enough space on scratch for the entire dataset,
>> and explaining what relion will do in this instance? This would save
> some pu
>> zzlement I think.
>>
>> Cheers
>> Oli
>>
>>> On Oct 25, 2019, at 10:56 AM, Takanori Nakane
>>> <[log in to unmask]> wrote:
>>>
>>> Hi,
>>>
>>> Indeed, this seems to be a bug introduced in RELION 3.1.
>>> (Another person at LMB reported this few hours ago)
>>>
>>> Up to 3.0, this "copy to scratch as much as possible and read
>>> remaining from disk" was working fine. I will investigate
>>> this next week.
>>>
>>>> The behavior in this instance I think should be to throw a fatal error
>>>> if
>>>> there is not enough space to copy the particles, not to copy as many as
>>>> possible and then proceed anyway
>>>
>>> No. We could check how much space left before copying, but what if
>>> another process starts writing too? And some people have HUGE datasets
>>> which cannot fit in a 300 GB scratch on our nodes.
>>>
>>> Best regards,
>>>
>>> Takanori Nakane
>>>
>>>> Hi,
>>>>
>>>> I have a run that completed successfully using “skip gridding”. It went to
>>>> rather high resolution, and I would now like to continue from the last
>>> opti
>>>> miser.star file with “skip gridding” off, to compare.
>>>>
>>>> I am running using the --scratch option, which worked fine during the
>>>> initial run. However, when I run using --continue, relion gives an
>>>> error
>>>> after starting the first iteration (error appended below). The error
>>>> indicates that there are particles in the data that are not in the
>>>> stack
>>>> on scratch. Indeed, when I look, I see that the stack on scratch has
>>>> only
>>>> 95k particles, whereas there are 145k in the original stack.
>>>>
>>>> After puzzling over this for a while, I realized it was occurring
>>> because th
>>>> ere wasn’t enough space on scratch for the full stack - it is also used as
>>>> a scratch disk for cryosparc.
>>>>
>>>> The behavior in this instance I think should be to throw a fatal error
>>>> if
>>>> there is not enough space to copy the particles, not to copy as many as
>>>> possible and then proceed anyway - would it be possible to alter this
>>>> behavior, so relion checks available disk space on scratch before
>>>> getting
>>>> started?
>>>>
>>>> Cheers
>>>> Oli
>>>>
>>>> readMRC: Image number 145268 exceeds stack size 95443 of image
>>>> [log in to unmask]
>>>> in: /home/user/software/relion/src/rwMRC.h, line 191
>>>> ERROR:
>>>> readMRC: Image number 126510 exceeds stack size 95443 of image
>>>> [log in to unmask]
>>>> slave 6 encountered error: === Backtrace ===
>>>> /usr/local/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77)
>>>> [0x5572cb364e27]
>>>> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE7readMRCElbRK8FileName+0x6ec)
>>>> [0x5572cb39ff8c]
>>>> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE5_readERK8FileNameR13fImageHandlerblbb+0x249)
>>>> [0x5572cb3a1889]
>>>> /usr/local/bin/relion_refine_mpi(_ZN11MlOptimiser24expectationSomeParticlesEll+0x438)
>>>> [0x5572cb51bd78]
>>>> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x227e)
>>>> [0x5572cb3824ce]
>>>> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc8)
>>>> [0x5572cb390a68]
>>>> /usr/local/bin/relion_refine_mpi(main+0x67) [0x5572cb34e1b7]
>>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)
>>>> [0x7f7c53471b97]
>>>> /usr/local/bin/relion_refine_mpi(_start+0x2a) [0x5572cb350c9a]
>>>> ==================
>>>> ERROR:
>>>> readMRC: Image number 126510 exceeds stack size 95443 of image
>>>> [log in to unmask]
>>>> [ubuntu:43838] 5 more processes have sent help message help-mpi-api.txt
>>>> /
>>>> mpi-abort
>>>> ########################################################################
>>>>
>>>> To unsubscribe from the CCPEM list, click the following link:
>>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>>>
>>>
>>>
>>
>>
>
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|