Dear Marek,
I would not call myself an expert, but maybe you will find my comments useful.
I see no general problem with transfering this task to the scanner (provided you put enough thought into general considerations like efficiency, jittering, etc.). The considerations you mentioned apply to blocked designs. However, I would not call your design a "blocked design", but rather an event-related design, with events organized in trials organized within blocks (maybe you'd like to call it a "mixed design", but that's just terminology). In your GLM analysis, you will definitely not want to model your blocks as events with durations of 5 minutes. Instead, you will probably want to model your choice and feedback phases as separate events. Depending on the hypotheses you want to test later, you might also separately model positive feedback and negative feedback trials, or add parametric modulators to your feedback regressors (for example, using prediction errors). Whatever you do, the more your regressors of interest consist of high frequency signal (or, the more "irregular" they are), the less they will be affected by the high-pass filter, and the less signal that should be explained by your task will be "lost" to the filter. (Side note: while with this, your single regressors should be minimally affected by the filter, I guess that contrasts of the regressors might still be affected, especially contrasts that sum up regressors of the same "block" condition. However, my intuition is that difference contrasts and/or contrasts regarding parametric modulators should not be significantly affected.)
I have a very similar task to yours. I conducted a reinforcement learning paradigm in the scanner, with 6 blocks divided into two conditions, and 16 trials per block. One trial consists of 3 seconds symbol presentation and choice, 1 sec break, 2 sec feedback, a jittered break (3-7 sec), a second kind of feedback (1s) and a jittered ITI (3-7s). Therefore, one block lasts about four minutes. I used the default 128s filter.
My GLM model for the task includes two regressors for the choice event (one per condition), two regressors for the feedback event (one per condition), and two regressors for the second feedback event (one per condition). Additionally, each feedback regressor is parametrically modulated by the trial-wise prediction error derived from a computational model. My analyses concentrate on the parametric modulator, and the results look nice enough. However, note that the filter did indeed "gnaw" away quite heavily from the unmodulated event regressors (about 30% of the variance of the originally constructed regressors is explained by the filter). But, importantly, the regressor for the parametric modulator is only slightly affected (about 2%).
So, in general, I think you should be fine. However, don't forget the general considerations of transfering a behavioral paradigm to the scanner (efficiency, jittering, etc.).
Best,
Lukas