JISCMail - CCPEM Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CCPEM Archives

CCPEM@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CCPEM Home
		CCPEM December 2020
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: error Zero sum of weights
From:
Takanori Nakane <[log in to unmask]>
Reply-To:
Takanori Nakane <[log in to unmask]>
Date:
Mon, 14 Dec 2020 11:25:57 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (424 lines)
Hi,

 > 3.0-icc2020-altcpu-symm

Please use version 3.1. 3.1 is more stable than 3.0.

Takanori

On 2020/12/14 11:16, Franken, Linda wrote:
> Dear all,
> My Refinement job gives the error message: "Zero sum of weights" after completing 4 iterations only.
> Looking through the archive I found that this could be due to groupsize, but my groups are pretty big already:
> 
> data_model_groups
> 
> loop_
> _rlnGroupNumber #1
> _rlnGroupName #2
> _rlnGroupNrParticles #3
> _rlnGroupScaleCorrection #4
>             1  group_003         2851     0.864271
>             2  group_001         6884     0.775772
>             3  group_005         2839     1.095779
>             4  group_004         5747     1.060882
>             5  group_002         5085     0.870342
>             6  group_007         2879     1.077545
>             7  group_006         1684     1.085275
>             8  group_008         2799     1.293449
>             9  group_009         1666     1.468089
> 
> 
> data_model_group_1
> 
> loop_
> _rlnSpectralIndex #1
> _rlnResolution #2
> _rlnSigma2Noise #3
>             0     0.000000 9.997955e-04
>             1 7.167431e-04 2.544455e-04
>             2     0.001433 5.465705e-05
>             3     0.002150 1.119792e-05
>             4     0.002867 6.750597e-06
> 
> Anybody an idea what could be the issue? If it is indeed a grouping issue, does anybody know of a way to regroup other than doing it from a model.star 
> with subset selection? I don't have a model.star for this dataset yet as I cleaned partially manually. Ideally I'd also continue the job rather than 
> recalculating it, since these first iterations were really slow due to the particle size. I tried simply changing the names of several groups to new 
> groups in the _data.star, but that doesn't alter the GroupScaleCorrection or any other values and it didn't change the error message. Thank you for 
> your time.
> 
> Best wishes,
> Linda
> 
> 
> Here is the full error:
> 
> exp_thisparticle_sumweight= 0
>   part_id= 30286
>   ipart= 0
>   group_id= 5 mymodel.scale_correction[group_id]= 1.08855
>   exp_ipass= 0
>   sampling.NrDirections(0, true)= 49152 sampling.NrDirections(0, false)= 29
>   sampling.NrPsiSamplings(0, true)= 384 sampling.NrPsiSamplings(0, false)= 6
>   mymodel.sigma2_noise[group_id]=
>     0.00088
>     0.00032
>       5e-05
>     9.5e-06
>     5.6e-06
>     3.4e-06
>     2.1e-06
>     1.6e-06
>     1.5e-06
>     1.3e-06
>     1.2e-06
>     1.2e-06
>     1.2e-06
>     1.2e-06
>     1.2e-06
>     1.2e-06
>     1.4e-06
>     1.4e-06
>     1.4e-06
>     1.4e-06
>     1.5e-06
>     1.6e-06
>     1.6e-06
>     1.7e-06
>     1.8e-06
>     1.9e-06
>     2.1e-06
>     2.2e-06
>     2.3e-06
>     2.4e-06
>     2.5e-06
>     2.6e-06
>     2.6e-06
>     2.7e-06
>     2.8e-06
>     2.8e-06
>     2.9e-06
>     2.9e-06
>       3e-06
>     3.1e-06
>     3.2e-06
>     3.2e-06
>     3.3e-06
>     3.3e-06
>     3.3e-06
>     3.3e-06
>     3.3e-06
>     3.2e-06
>     3.1e-06
>       3e-06
>     2.9e-06
>     2.7e-06
>     2.5e-06
>     2.3e-06
>     2.1e-06
>     1.9e-06
>     1.8e-06
>     1.6e-06
>     1.5e-06
>     1.4e-06
>     1.3e-06
>     1.2e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>     1.1e-06
>       1e-06
>       1e-06
>     9.9e-07
>     9.5e-07
>     9.2e-07
>     8.9e-07
>     8.7e-07
>     8.4e-07
>     8.2e-07
>       8e-07
>     7.9e-07
>     7.9e-07
>     7.8e-07
>     7.9e-07
>       8e-07
>     8.1e-07
>     8.1e-07
>     8.2e-07
>     8.3e-07
>     8.3e-07
>     8.3e-07
>     8.3e-07
>     8.3e-07
>     8.2e-07
>     8.1e-07
>     7.9e-07
>     7.8e-07
>     7.7e-07
>     7.6e-07
>     7.5e-07
>     7.5e-07
>     7.5e-07
>     7.5e-07
>     7.5e-07
>     7.5e-07
>     7.6e-07
>     7.6e-07
>     7.7e-07
>     7.7e-07
>     7.7e-07
>     7.6e-07
>     7.5e-07
>     7.4e-07
>     7.3e-07
>     7.3e-07
>     7.3e-07
>     7.2e-07
>     7.2e-07
>     7.1e-07
>     7.1e-07
>     7.1e-07
>     7.1e-07
>     7.1e-07
>     7.2e-07
>     7.1e-07
>     7.1e-07
>     7.1e-07
>       7e-07
>     6.9e-07
>     6.9e-07
>     6.8e-07
>     6.8e-07
>     6.7e-07
>     6.7e-07
>     6.6e-07
>     6.6e-07
>     6.6e-07
>     6.5e-07
>     6.5e-07
>     6.5e-07
>     6.4e-07
>     6.4e-07
>     6.4e-07
>     6.3e-07
>     6.3e-07
>     6.2e-07
>     6.2e-07
>     6.1e-07
>     6.1e-07
>       6e-07
>       6e-07
>       6e-07
>       6e-07
>       6e-07
>       6e-07
>     5.9e-07
>     5.9e-07
>     5.9e-07
>     5.8e-07
>     5.8e-07
>     5.7e-07
>     5.7e-07
>     5.7e-07
>     5.7e-07
>     5.6e-07
>     5.6e-07
>     5.6e-07
>     5.6e-07
>     5.6e-07
>     5.5e-07
>     5.5e-07
>     5.5e-07
>     5.5e-07
>     5.5e-07
>     5.4e-07
>     5.4e-07
>     5.4e-07
>     5.3e-07
>     5.3e-07
>     5.3e-07
>     5.3e-07
>     5.3e-07
>     5.2e-07
>     5.2e-07
>     5.2e-07
>     5.2e-07
>     5.2e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>     5.1e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>       5e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.9e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.8e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.7e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.6e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.5e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.4e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.3e-07
>     4.2e-07
>     4.2e-07
>     4.2e-07
>     4.2e-07
>     4.2e-07
>     4.2e-07
>     4.2e-07
>     4.1e-07
> 
>   mymodel.avg_norm_correction= 0.634436
>   wsum_model.avg_norm_correction= 100.426
> written out Mweight.spi
>   exp_thisparticle_sumweight= 0
>   exp_min_diff2[ipart]= 64383.6
> in: /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/src/ml_optimiser.cpp, line 6871
> in: /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/src/ml_optimiser.cpp, line 6871
> slave 7 encountered error: === Backtrace  ===
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x48) [0x531438]
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xde) [0x580f1e]
> /beegfs/cssb/software/em/relion/3.0-icc2020-altcpu-symm/bin/relion_refine_mpi(_Z11_threadMainPv+0x70) [0x67c350]
> /lib64/libpthread.so.0(+0x7ea5) [0x2b0a3a17fea5]
> /lib64/libc.so.6(clone+0x6d) [0x2b0a3a4928dd]
> ==================
> ERROR:
> ERROR!!! zero sum of weights....
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
> with errorcode 1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1 <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1>
> 

########################################################################

To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCPEM&A=1

This message was issued to members of www.jiscmail.ac.uk/CCPEM, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options