Hello,
1. What does the number of support vectors depend on? or is it random -
just the data close to the hyperplane?
2. I don't understand why not all the datapoints near the optimal
hyperplane are not support vectors.
3. If you have, say 10000 features, one has to use a non-liner kernel.
One
would need to use a polynomial kernel because only then you get the
combinations of the different datapoints. However, because you have such
a large number of features, usign a polynomial kernel will map the data
into a feature space in which the points will be very far apart. Then
separating the 2 classes will be a very trivial thing as you can have many
hyperplanes with large margin as the data is so sparse. I udnerstadn that
some may call this overfitting..but in what other kind of kernel can you
get the combination of datapoints as you do with a polynomial
kernel...this is a kind of a priori knowledge you have about the problem.
Sincerely,
Monika Ray
***********************************************************************
The sweetest songs are those that tell of our saddest thought...
Computational Intelligence Centre, Washington University St. louis, MO
**********************************************************************
|