Hi,
> 3.0-icc2020-altcpu-symm
Please use version 3.1. 3.1 is more stable than 3.0.
Takanori
On 2020/12/14 11:16, Franken, Linda wrote:
> Dear all,
> My Refinement job gives the error message: "Zero sum of weights" after completing 4 iterations only.
> Looking through the archive I found that this could be due to groupsize, but my groups are pretty big already:
>
> data_model_groups
>
> loop_
> _rlnGroupNumber #1
> _rlnGroupName #2
> _rlnGroupNrParticles #3
> _rlnGroupScaleCorrection #4
> 1 group_003 2851 0.864271
> 2 group_001 6884 0.775772
> 3 group_005 2839 1.095779
> 4 group_004 5747 1.060882
> 5 group_002 5085 0.870342
> 6 group_007 2879 1.077545
> 7 group_006 1684 1.085275
> 8 group_008 2799 1.293449
> 9 group_009 1666 1.468089
>
>
> data_model_group_1
>
> loop_
> _rlnSpectralIndex #1
> _rlnResolution #2
> _rlnSigma2Noise #3
> 0 0.000000 9.997955e-04
> 1 7.167431e-04 2.544455e-04
> 2 0.001433 5.465705e-05
> 3 0.002150 1.119792e-05
> 4 0.002867 6.750597e-06
>
> Anybody an idea what could be the issue? If it is indeed a grouping issue, does anybody know of a way to regroup other than doing it from a model.star
> with subset selection? I don't have a model.star for this dataset yet as I cleaned partially manually. Ideally I'd also continue the job rather than
> recalculating it, since these first iterations were really slow due to the particle size. I tried simply changing the names of several groups to new
> groups in the _data.star, but that doesn't alter the GroupScaleCorrection or any other values and it didn't change the error message. Thank you for
> your time.
>
> Best wishes,
> Linda
>
>
> Here is the full error:
>
> exp_thisparticle_sumweight= 0
> part_id= 30286
> ipart= 0
> group_id= 5 mymodel.scale_correction[group_id]= 1.08855
> exp_ipass= 0
> sampling.NrDirections(0, true)= 49152 sampling.NrDirections(0, false)= 29
> sampling.NrPsiSamplings(0, true)= 384 sampling.NrPsiSamplings(0, false)= 6
> mymodel.sigma2_noise[group_id]=
> 0.00088
> 0.00032
> 5e-05
> 9.5e-06
> 5.6e-06
> 3.4e-06
> 2.1e-06
> 1.6e-06
> 1.5e-06
> 1.3e-06
> 1.2e-06
> 1.2e-06
> 1.2e-06
> 1.2e-06
> 1.2e-06
> 1.2e-06
> 1.4e-06
> 1.4e-06
> 1.4e-06
> 1.4e-06
> 1.5e-06
> 1.6e-06
> 1.6e-06
> 1.7e-06
> 1.8e-06
> 1.9e-06
> 2.1e-06
> 2.2e-06
> 2.3e-06
> 2.4e-06
> 2.5e-06
> 2.6e-06
> 2.6e-06
> 2.7e-06
> 2.8e-06
> 2.8e-06
> 2.9e-06
> 2.9e-06
> 3e-06
> 3.1e-06
> 3.2e-06
> 3.2e-06
> 3.3e-06
> 3.3e-06
> 3.3e-06
> 3.3e-06
> 3.3e-06
> 3.2e-06
> 3.1e-06
> 3e-06
> 2.9e-06
> 2.7e-06
> 2.5e-06
> 2.3e-06
> 2.1e-06
> 1.9e-06
> 1.8e-06
> 1.6e-06
> 1.5e-06
> 1.4e-06
> 1.3e-06
> 1.2e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1.1e-06
> 1e-06
> 1e-06
> 9.9e-07
> 9.5e-07
> 9.2e-07
> 8.9e-07
> 8.7e-07
> 8.4e-07
> 8.2e-07
> 8e-07
> 7.9e-07
> 7.9e-07
> 7.8e-07
> 7.9e-07
> 8e-07
> 8.1e-07
> 8.1e-07
> 8.2e-07
> 8.3e-07
> 8.3e-07
> 8.3e-07
> 8.3e-07
> 8.3e-07
> 8.2e-07
> 8.1e-07
> 7.9e-07
> 7.8e-07
> 7.7e-07
> 7.6e-07
> 7.5e-07
> 7.5e-07
> 7.5e-07
> 7.5e-07
> 7.5e-07
> 7.5e-07
> 7.6e-07
> 7.6e-07
> 7.7e-07
> 7.7e-07
> 7.7e-07
> 7.6e-07
> 7.5e-07
> 7.4e-07
> 7.3e-07
> 7.3e-07
> 7.3e-07
> 7.2e-07
> 7.2e-07
> 7.1e-07
> 7.1e-07
> 7.1e-07
> 7.1e-07
> 7.1e-07
> 7.2e-07
> 7.1e-07
> 7.1e-07
> 7.1e-07
> 7e-07
> 6.9e-07
> 6.9e-07
> 6.8e-07
> 6.8e-07
> 6.7e-07
> 6.7e-07
> 6.6e-07
> 6.6e-07
> 6.6e-07
> 6.5e-07
> 6.5e-07
> 6.5e-07
> 6.4e-07
> 6.4e-07
> 6.4e-07
> 6.3e-07
> 6.3e-07
> 6.2e-07
> 6.2e-07
> 6.1e-07
> 6.1e-07
> 6e-07
> 6e-07
> 6e-07
> 6e-07
> 6e-07
> 6e-07
> 5.9e-07
> 5.9e-07
> 5.9e-07
> 5.8e-07
> 5.8e-07
> 5.7e-07
> 5.7e-07
> 5.7e-07
> 5.7e-07
> 5.6e-07
> 5.6e-07
> 5.6e-07
> 5.6e-07
> 5.6e-07
> 5.5e-07
> 5.5e-07
> 5.5e-07
> 5.5e-07
> 5.5e-07
> 5.4e-07
> 5.4e-07
> 5.4e-07
> 5.3e-07
> 5.3e-07
> 5.3e-07
> 5.3e-07
> 5.3e-07
> 5.2e-07
> 5.2e-07
> 5.2e-07
> 5.2e-07
> 5.2e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5.1e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 5e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.9e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.8e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.7e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.6e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.5e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.4e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.3e-07
> 4.2e-07
> 4.2e-07
> 4.2e-07
> 4.2e-07
> 4.2e-07
> 4.2e-07
> 4.2e-07
> 4.1e-07
>
> mymodel.avg_norm_correction= 0.634436
> wsum_model.avg_norm_correction= 100.426
> written out Mweight.spi
> exp_thisparticle_sumweight= 0
> exp_min_diff2[ipart]= 64383.6
> in: /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/src/ml_optimiser.cpp, line 6871
> in: /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/src/ml_optimiser.cpp, line 6871
> slave 7 encountered error: === Backtrace ===
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x48) [0x531438]
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xde) [0x580f1e]
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_Z11_threadMainPv+0x70) [0x67c350]
> /lib64/libpthread.so.0(+0x7ea5) [0x2b0a3a17fea5]
> /lib64/libc.so.6(clone+0x6d) [0x2b0a3a4928dd]
> ==================
> ERROR:
> ERROR!!! zero sum of weights....
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1 <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1>
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1
This message was issued to members of www.jiscmail.ac.uk/CCPEM, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
|