I had a look at your data again. Looking at the unthresholded maps, it seems like there is a general tendency towards positive values throughout the cortex, which also means very large low-threshold clusters. This, together with the known "overboost" effect of TFCE with very large clusters, nicely explains your different results. It might be the case, that there are global intracranial volume differences in your data... Do you use "Global Normalisation" to correct for such differences?
I would not use TFCE with these oddly large, weak (sub-threshold) clusters.
Regarding pTFCE, it solves the problem of this "overboosting" (maybe that's why you intuitively "feel" your results more realistic with it), but I am still not sure about normality assumptions.
- On one hand, indeed, there seem to be a concordance that GRF can be safely applied in a VBM-context, see e.g. https://www.sciencedirect.com/science/article/pii/S0149763415000536.
- On the other hand, GRF-based cluster-wise inference was discouraged for VBM (already 19 years ago), see the famous paper: J. Ashburner and K. Friston, “Voxel-based morphometry - the methods,” NeuroImage, vol. 11, pp. 805–821, 2000.
While pTFCE incorporates cluster-wise p-values, it still seems to be immune to the "cluster failure" phenomenon (Ecklund et al, PNAS, 2016), (at least as suggested by Figure 8. B of the pTFCE paper) and thus might tolerate spatially inhomogeneous smoothness well.
To summarise, right now it is kind of an open question. My personal opinion: with a carful interpretation of the results (e.g. drawing your strongest conclusions on the voxel-level map instead of the pTFCE map) and with reporting possible limitations, pTFCE is already feasible for VBM analysis.
But that should be a decision of the authors. ;)
I plan to perform some analysis into this direction (that is VBM pTFCE false positives) in the near future, which might help us see clear.
Hope this helps.