JISCMail - PRONTO-USERS Archives

Hi Cristina,

Unfortunately, there is no good answer to your question, apart the one 
you don't want to hear (i.e. can you get more samples for the smaller 
group?).

If you try to leave -one-subject-per-group-out, it will not solve your 
problem because only the test set will be balanced, but not the training 
set:

The code will build 11 folds, leaving the first subject out of each 
group in the first fold, the 2nd in the second fold, and so on. In the 
last fold, the test set will comprise the 11th subject of the smaller 
group, and the 11th to 30th subjects (i.e. 20 subjects) of the larger group.

You could try to build a custom CV matrix, leaving 3 subjects of the 
larger group out, and only 1 from the smaller group. You can save it as 
a .mat and input it in PRoNTo. The flexible CV (with GUI) will also soon 
be available and you will be able to do that using an interface. I am 
not sure it will really help though...

Have you tried other classifiers? Sometimes GP can be better at dealing 
with unbalanced training sets than SVM for example. I know LIBSVM 
provides algorithms where you can input a weight on each class, but it 
is not implemented in PRoNTo unfortunately.

I hope this helps you a little.
Good luck.
Best,
Jessica

On 12/1/2014 4:18 AM, Cristina Blanco-Duque wrote:
>
> Hi Pronto Users,
>
> I am running a two-class classification, using structural MRI data as 
> input. My experimental groups are imbalanced (n=30 and n=11) and the 
> larger group (n=30) is being better classified than the smaller group 
> (n=11) (eg. 93% vs 42%). I already tried to downsample, but I am 
> loosing too much data, so I'd prefer to use all my data.  For the 
> cross-validation (CV), I am currently using leave-one-subject-out, 
> however, I would like to ask you:
>
>  1. Could the option "leave-one-subject-per group-out" help me to
>     prevent the bias towards a better classification of the larger group?
>  2. How does the CV "leave one subject per group out" work in Pronto?
>     I read in the manual that this approach is appropriate for paired
>     sample designs. But in the case of my unbalanced data set (n=30
>     and n=11): Would there be 30 CV iterations (repeating some
>     subjects on the smaller group)? Or the CV will stop at n=11 (n of
>     smaller group)?
>  3. Do you have any other suggestion to deal with the CV of this
>     imbalanced data set?
>
> Thank you very much for your help.
>
> Best regards,
>
> Cristina Blanco
>