The best thing I can think of is that you can average the two images to achieve better signal to noise. They would need registering together first though. Probably the most accurate approach for rigid-body alignment of such data using SPM12 would involve using the Longitudinal toolbox. You can make it do within-modality rigid alignment by setting the warping regularisation to [Inf Inf Inf Inf Inf]. For this, it doesn't matter what time intervals you specify, as these only enter into the nonlinear warping part.
One of the outputs should be a weighted average of the two aligned images. One thing to watch out for is that this average is bigger than the original images, and it will contain some regions where there is missing data. These regions can potentially interfere with other processing steps.
I can't comment so much on interesting questions to ask of the data. In regards to quality of combined 1.5T versus 3T, this will depend on the imaging sequences used. Without further software development related to correcting the additional artifacts in 3T data, I suspect that you may actually get more out of group comparisons using 1.5T data.
Best regards,
-John