Dear Mareike,
I wouldn't rely on statistical findings to try to find out whether something went wrong or not. Maybe it's a true effect or a chance finding.
However, the fact that correlations (cosine?) between preparation and execution regressors differ between conditions (.77 vs. the others with <.50) points to a bias between conditions (just guessing, maybe the two components were closer together for one of the four condition). In general it's no good idea to go with separate regressors for temporally close components anyway except if there is lots of jittter between the components (which wouldn't be perfect in your case, as jittered intervals between preparation and execution might result in an effect on its own, something like building-up activations) and/or if you rely on partial trials, as you won't be able to conclude whether the execution regressor detects true execution effects (with a BOLD response corresponding to the predictor) or whether it's a preparation effect with somewhat delayed BOLD response (different brain regions are associated with somewhat different time-to-peak).
Concerning your findings with activations/deactivations and mirrored pattern, it might be the finding reported in https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=spm;59d23523.1203 and following messages.
Hope this helps you for the moment
Helmut
|