It could be problematic to construct new volumes based on the fXX files. As you encounter motion it is likely subjects don't just jump back to their original position (in which averaging across e.g. vol. 1 and 3 should be sufficient) but that there's a relevant displacement between these two. The constructed new volume might thus be a rather poor replacement as it would be smeared due to motion between the two volumes it's based on. Solutions:
1) As you go with Realign & unwarp anyway you could simply turn to the uf files, the estimated motion parameters have already been applied to the voxel data during the reslice step, so no such problems arise when averaging across different volumes. This is was the Artrepair toolbox does automatically, based on resliced data it allows to generate a new interpolated volume from the neighbouring "good" ones.
2) If you want to work with the (raw) fXX files you could go with separate realignment steps to obtain a mean image based on e.g. vol. 1 and 3, the resulting meanfXX image would then serve as a replacement for vol. 2 and so on.
There are some limitations though.
3) Often there's fast head motion for several volumes, interpolated data might be no good replacement then.
4) It's questionable whether replacement data can be treated like "correct" data at all. With Artrepair one would usually add a single, binary regressor to the model with one value for the "correct" volumes and another value for the replacement volumes. In case the latter is always the same (e.g. because you don't interpolate corrected data from neighbouring volumes but replace them with a mean image) this should be sufficient. If they are not it would be imperfect, as you try to account for different volumes with a single predictor value, thus just accounting for something like the "average badness". Very likely you should "deweight" these volumes from analysis based on a series of dummy regressors.
5) With a correction as described in 1) and 2) the realignment parameters still reflect motion within the original data, including severe fast motion for bad time points. Thus you try to account for something which is no longer present in exactly this way, or put it differently, you don't properly account for those (head motion) effects which are present in the corrected data. I think there was a methods paper in the context of resting state fMRI / ICA, suggesting that you might (re)introduce artefacts this way.
As a consequence I would just go with the default preprocessing pipeline and add a series of dummy regressors to the design matrix, which corresponds to the motion censoring/"temporal mask" approach described in Siegel et al. (2014, Hum Brain Mapp, https://dx.doi.org/10.1002/hbm.22307 ), as it's a simple but probably quite effective technique. If you prefer to look at the data without Artrepair you can load the rp file with e.g. Excel, SQRT((A2-A1)^2+(B2-B1)^2+(C2-C1)^2+(65^2*((D2-D1)^2+(E2-E1)^2+(F2-F1)^2))) should do it to get their "fast motion" estimate between subsequent time points. A threshold of 0.5 mm/TR might be a reasonable starting point to detect occasional head jumps in otherwise stable subjects, usually you can also detect signal artefacts in the corresponding volumes. To stay on the safe side also deweight the pre- and succeeding volumes (which Artrepair does automatically). Alternatively you could try ART https://www.nitrc.org/projects/artifact_detect/ (fast motion is estimated somewhat differently, thus other thresholds would be required).