Hi,
Indeed, this seems to be a bug introduced in RELION 3.1.
(Another person at LMB reported this few hours ago)
Up to 3.0, this "copy to scratch as much as possible and read
remaining from disk" was working fine. I will investigate
this next week.
> The behavior in this instance I think should be to throw a fatal error if
> there is not enough space to copy the particles, not to copy as many as
> possible and then proceed anyway
No. We could check how much space left before copying, but what if
another process starts writing too? And some people have HUGE datasets
which cannot fit in a 300 GB scratch on our nodes.
Best regards,
Takanori Nakane
> Hi,
>
> I have a run that completed successfully using “skip gridding”. It went to
> rather high resolution, and I would now like to continue from the last
opti
> miser.star file with “skip gridding” off, to compare.
>
> I am running using the --scratch option, which worked fine during the
> initial run. However, when I run using --continue, relion gives an error
> after starting the first iteration (error appended below). The error
> indicates that there are particles in the data that are not in the stack
> on scratch. Indeed, when I look, I see that the stack on scratch has only
> 95k particles, whereas there are 145k in the original stack.
>
> After puzzling over this for a while, I realized it was occurring
because th
> ere wasn’t enough space on scratch for the full stack - it is also used as
> a scratch disk for cryosparc.
>
> The behavior in this instance I think should be to throw a fatal error if
> there is not enough space to copy the particles, not to copy as many as
> possible and then proceed anyway - would it be possible to alter this
> behavior, so relion checks available disk space on scratch before getting
> started?
>
> Cheers
> Oli
>
> readMRC: Image number 145268 exceeds stack size 95443 of image
> [log in to unmask]
> in: /home/user/software/relion/src/rwMRC.h, line 191
> ERROR:
> readMRC: Image number 126510 exceeds stack size 95443 of image
> [log in to unmask]
> slave 6 encountered error: === Backtrace ===
> /usr/local/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77)
> [0x5572cb364e27]
> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE7readMRCElbRK8FileName+0x6ec)
> [0x5572cb39ff8c]
> /usr/local/bin/relion_refine_mpi(_ZN5ImageIdE5_readERK8FileNameR13fImageHandlerblbb+0x249)
> [0x5572cb3a1889]
> /usr/local/bin/relion_refine_mpi(_ZN11MlOptimiser24expectationSomeParticlesEll+0x438)
> [0x5572cb51bd78]
> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x227e)
> [0x5572cb3824ce]
> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc8)
> [0x5572cb390a68]
> /usr/local/bin/relion_refine_mpi(main+0x67) [0x5572cb34e1b7]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7c53471b97]
> /usr/local/bin/relion_refine_mpi(_start+0x2a) [0x5572cb350c9a]
> ==================
> ERROR:
> readMRC: Image number 126510 exceeds stack size 95443 of image
> [log in to unmask]
> [ubuntu:43838] 5 more processes have sent help message help-mpi-api.txt /
> mpi-abort
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|