Thank you for allowing me into this list! I'm an HPC/scientific-computing person tasked with tuning future HPC systems for better Relion/EM support in future designs. My particular focus is to stress test and benchmark various large-scale storage offerings including very large parallel filesystems to see which platforms and which configuration options are most suitable for supporting large-scale Relion usage. I know coming from the genomics world that storage design has a large impact on research throughput and there are key metrics like small-file performance that are indicators for how a storage platform will handle a genomics-heavy workload -- I want to learn similar optimizations and key metrics for EM related scientific workflows. I've been reading the documentation, papers, tutorials and published benchmarks and it looks like: - The overwhelming focus of published benchmarks centers on CPU vs GPU performance on single-node and MPI-connected systems with little to no reported data about storage related benchmarks and optimizations - The standard benchmark data set used in various papers and sites online appears pretty small - small enough now to fit in RAM on larger nvlinked GPU or large memory compute systems and small enough to not really put much stress on a very large or very fast parallel filesystem when writing output or reading in particles or maps If this is not too intrusive of a query I'd welcome some advice and guidance onĀ ... 1) Relion-friendly datasets structured similarly to the popular benchmarking data where particles and maps are already present and can be easily fed into command-line invocations of relion so that I can go out and hammer some big filesystems with reproducible benchmarking runs 2) Guidance on which portions of the relion3 workflow are most storage-intensive (reads and writes, ideally). I think I have a good idea of this from the online tutorial and other published materials. Since others have already focused on GPU vs CPU vs Mixed I figured I can focus a bit more on storage and IO optimization And in the interest of reproducibility if someone has already done large/parallel filesystem testing and tuning I'd love to use the same methods & input data so that I can add more data to what has already been collected. Regards Chris ######################################################################## To unsubscribe from the CCPEM list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1