Hi Takanori,
Thanks for the quick reply!
Relion of course can’t do anything if another process starts writing - but perhaps another solution might be to at least print a warning to the log file? Saying that there is not enough space on scratch for the entire dataset, and explaining what relion will do in this instance? This would save some puzzlement I think.
Cheers
Oli
> On Oct 25, 2019, at 10:56 AM, Takanori Nakane <[log in to unmask]> wrote:
>
> Hi,
>
> Indeed, this seems to be a bug introduced in RELION 3.1.
> (Another person at LMB reported this few hours ago)
>
> Up to 3.0, this "copy to scratch as much as possible and read
> remaining from disk" was working fine. I will investigate
> this next week.
>
>> The behavior in this instance I think should be to throw a fatal error if
>> there is not enough space to copy the particles, not to copy as many as
>> possible and then proceed anyway
>
> No. We could check how much space left before copying, but what if
> another process starts writing too? And some people have HUGE datasets
> which cannot fit in a 300 GB scratch on our nodes.
>
> Best regards,
>
> Takanori Nakane
>
>> Hi,
>>
>> I have a run that completed successfully using “skip gridding”. It went to
>> rather high resolution, and I would now like to continue from the last
> opti
>> miser.star file with “skip gridding” off, to compare.
>>
>> I am running using the --scratch option, which worked fine during the
>> initial run. However, when I run using --continue, relion gives an error
>> after starting the first iteration (error appended below). The error
>> indicates that there are particles in the data that are not in the stack
>> on scratch. Indeed, when I look, I see that the stack on scratch has only
>> 95k particles, whereas there are 145k in the original stack.
>>
>> After puzzling over this for a while, I realized it was occurring
> because th
>> ere wasn’t enough space on scratch for the full stack - it is also used as
>> a scratch disk for cryosparc.
>>
>> The behavior in this instance I think should be to throw a fatal error if
>> there is not enough space to copy the particles, not to copy as many as
>> possible and then proceed anyway - would it be possible to alter this
>> behavior, so relion checks available disk space on scratch before getting
>> started?
>>
>> Cheers
>> Oli
>>
>> readMRC: Image number 145268 exceeds stack size 95443 of image
>> [log in to unmask]
>> in: /home/user/software/relion/src/rwMRC.h, line 191
>> ERROR:
>> readMRC: Image number 126510 exceeds stack size 95443 of image
>> [log in to unmask]
>> slave 6 encountered error: === Backtrace ===
>> /usr/local/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77)
>> [0x5572cb364e27]
>> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE7readMRCElbRK8FileName+0x6ec)
>> [0x5572cb39ff8c]
>> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE5_readERK8FileNameR13fImageHandlerblbb+0x249)
>> [0x5572cb3a1889]
>> /usr/local/bin/relion_refine_mpi(_ZN11MlOptimiser24expectationSomeParticlesEll+0x438)
>> [0x5572cb51bd78]
>> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x227e)
>> [0x5572cb3824ce]
>> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc8)
>> [0x5572cb390a68]
>> /usr/local/bin/relion_refine_mpi(main+0x67) [0x5572cb34e1b7]
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7c53471b97]
>> /usr/local/bin/relion_refine_mpi(_start+0x2a) [0x5572cb350c9a]
>> ==================
>> ERROR:
>> readMRC: Image number 126510 exceeds stack size 95443 of image
>> [log in to unmask]
>> [ubuntu:43838] 5 more processes have sent help message help-mpi-api.txt /
>> mpi-abort
>> ########################################################################
>>
>> To unsubscribe from the CCPEM list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>
>
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|