Dear EM community
I'm struggling with relion lately mostly with 3D classification, and I
would need your feedback to fix my problems...
I'm working with a dataset of 300k particles of 348x348 pixels window
size. My first problem is speed calculations.
The dataset was collected on a Krios microscope with a Falcon3 detector
I'm using relion2.1.0 with cuda 8.0 libraries and openmpi-2.1.2 on a
4xGTX1080 GPU and SSD scratch disks, 128 Gb RAM. I installed version 8
of cuda libraries because I saw on the nvidia website that, among other
improvements, the FFT calculation was much quicker with cuda 8 compared
to 7.5.
It takes ages to do complete 3d classification (25 cycles, 10pix search,
1 pix step, 1.09 pixel size) : around 48h-56h/cycle
I used a previous dataset, as a control, also collected on a Krios
microscope but with a K2 camera. This dataset is made of 330k particles,
348x348 pixels window size, 1.09 pixel size. I checked the data
processing : 25 cycles, 10pix search, 1 pix step, 1.09 pixel size. It
was made on the same computer, but is was relion 2 beta with cuda 7.5
libraries and same openmpi. It was at that time only 4h/cycle. I tried
to reprocess this old dataset with the last version of relion but I get
again way longer times of calculation.
--> Here are my first questions: did you encounter the same problem
recently with the last version of relion ? Is there in the last version
of relion new code that slow down things due to improved processing (I
didn't find any related info on the relion website, here or on github) ?
Do you find a difference in processing between data collected on Falcon3
or K2 camera ?
Trying to sort things out, I ran in another problem. I checked after
each 3D classification cycle, the particles distribution in the model
star file.
When you multiply this class distribution percentage (which sum of all
the classes distribution should be 100% if I am correct) by the amount
of particles of your dataset, the number that you obtain is different
from the number of lines that one can obtain with the awk command
(relion FAQ website) to extract the particles belonging to one class.
If you choose as a criteria the value of "_rlnMaxValueProbDistribution"
(data star file) variable to select particles that have a higher
probability to belong to one class, one could find the same value for
the 2 calculations. But this value differs for each class...
--> Here comes my second wave of questions: is this the right way to
monitor class distribution ? What is a good value that should one use
for _rlnMaxValueProbDistribution (0.01, 0.1, 0.5 ?) ? Is there a way to
know how many particles were used for each class 3D reconstruction ?
To speed up calculations, on relion tutorials or website, one could read
that data can be binned. If I understood correctly, in 3D calculations,
relion internally bin images to get the best compromise image
size/alignment precision/time. If you manually bin you particles with
the extract GUI rescale window, you don't quite get the same calculation
times for non-binned particles internally binned by relion and binned
particles also internally binned if needed by relion.
--> Did you observe the same behaviour with your own datasets ? If you
have enough RAM on your cluster, is there a reason to bin your dataset ?
I hope I wasn't too long and clear enough...
Thank you in advance for your feedback
Pierre-Damien
|