Dear Raheel,
There's no simple solution as it's always going to depend on the particular hypothesis / study. It also depends on the interpretation. The initial paper by Vul et al. (2009) mainly focused on the so-called nonindependence error, in the "worst" case detect a sig. cluster when testing against zero, extract estimates, test against zero once more, report statistics and estimates = double dipping resulting in vodoo correlations. While we can agree that it doesn't make sense to run another statistical test on voxels/clusters that have already reached significance it is not incorrect per se to extract e.g. beta estimates from selected voxels and report average values. However, effects might be overestimated then.
I'd say it is valid to run post-hoc tests on cluster-derived estimates in case you received e.g. a sig. interaction A x B or a sig. main effect C in the whole-brain analysis to find out whether the interaction is e.g. due to A1B1 being larger than A1B2, A2B1, A2B2, or whether main effect C reflects C1 being larger than C2, C3. Due to the voxel selection there's going to be a bias, but it seems to be acceptable to rely on a (biased) voxel selection to further explain an effect that has been detected for exactly these voxels. In that context we are not really interested in whether A1B1 is associated with an average beta estimate of 3 or 10, but in the statistics for the post-hoc tests.
Depending on context it might of course be very informative to not (just) report estimates from initial sig. clusters but e.g. for some independently defined ROIs. But if you turn to another type of analysis then this might not have much to do with the initial analysis. E.g. if we detect a sig. interaction for a small cluster located within a certain large anatomical region, and then extract estimates for the whole region, then likely the true effect is underestimated. Maybe even the ROI analysis results in completely different findings as we pool across many more voxels with their own pattern. So while we come up with a valid conclusion (region xyz is associated with effect 123, effect size $%&), this might not / only loosely be related to the initial findings. If we turned to LOSO this would also be biased in case ROIs are defined a-posteriori, and again, the outcome does not (really) explain the previously detected findings, but a (somewhat) different effect.
The main limitation seems to be that most of the time, there are only vague hypotheses, making it difficult to define regions a-priori. Which leads to "well, not what we were looking for, but hey, this also looks very interesting". Maybe in physics experiments are confirmatory (postulate some elementary particle, then acquire data for 10 years to confirm the hypothesis), but in neuroscience they are exploratory most of the time. Take a random paradigm, it would probably not be surprising to detect differences in executive/attention/salience networks, default mode networks, or motor networks. Vice versa, likely you could justify a wide range of a-priori regions based on literature.
Best
Helmut
|