you state that "the beta weight results for the single runs were each about half the size of those in the combined run". Thus the models are alright probably, it seems to be a scaling issue during contrast specification for the model containing both of the runs. You have to average across the two runs then and adjust the contrast vectors accordingly instead of vectors like [1 .... 1 ...], which corresponds to the sum of the beta estimates. A few examples:
Average activation of condition A: [1/2 ... 1/2 ...] (con1 run1 + cond1 run2)/2
Activation averaged across conditions A and B: [1/4 1/4 ... 1/4 1/4 ...] ((cond1 run1 + cond2 run1)/2 + (cond1 run2 + cond2 run2)/2)/2
(A - B) averaged across runs: [1/2 -1/2 ... 1/2 -1/2 ...] ((cond1 run1 - cond2 run1) + (cond1 run2 - cond1 run2))/2
Interaction Condition (A, B) x Run (1, 2): [1 -1 ... -1 1 ...] in this case no adjustments needed, as the interaction corresponds to the difference of the differences (cond1 run1 - cond2 run1) - (cond1 run2 - cond2 run2)
Concerning your second issue, in general it is okay to exclude bad runs and go with the remaining ones, although this should be reported of course. You just have to use properly scaled contrast vectors for the different subjects, then the results are valid. However, you should still have a comparable amount of data afterwards. In your case it seems to be only two runs, so rejecting one run means only half of the data. In that case I would prefer to exclude the whole subjects' data sets. This is especially an issue as there might also be some differences between the runs (learning, adaptation, ...). This would be less problematic if it's about 9 instead of 10 runs, or if you use simple stimuli (like retinotopy runs, if the subject feels alright you might just add another run).
Hope this helps,