Greetings all,
Some months ago I posted a query on the www.kernel-machines.org message board. Since I did not receive any replies there, I try this list instead. Here goes.
>> Start of original post <<
Hello,
Let me first say that I am a relative newbie to SV learning -- I've scanned what has seemed the most central papers of the field, plus "The Nature of SLT". One type of learning task seems not to have been treated properly as far as I have seen, namely certain types of probability estimation tasks. Please excuse me if I'm rambling in the following, it's not _all_ clear to me, yet. :-)
I _have_ noticed that the standard SV regression does indeed deal with deviations from the means (often called "noise"); the problem is that in many cases such deviations lie in the nature of the problem itself, whence the concept of noise that should be removed is not really all that applicable. A simple example: a throw of six on a die is not more noisy than a three or four, though farther from the mean.
A more interesting example: say that we have an underlying probability p(x,y) = p(x)p(y|x) where the density p(x) of the input data is uniform on (0,1), and y is Bernoulli with mean (unknown to us), say, x^2 (i.e. Pr(y=1) = x^2, Pr(y=0) = 1-x^2). Using the standard epsilon-tube SV regression is clearly not very useful, as there is no concept of "ignorable" error here; the "error" or "noise" is always either x^2 or 1-x^2 for a given x. Given a hypothesis f(x), a max-likelihood approach yields that the proper cost function here is
-sum(y ln f(x) + (1-y) ln (1-f(x)))
with epsilon=0. The usual exposition of SV regression (and this is where the rambling starts) with its convexity requirements and corresponding optimization formulations, however, seems to presuppose a continuous density p(y|x). Also, does not epsilon=0 remove much of the advantage of SV methods, as all training data (x,y) become SVs?
Still, something tells me that SV methods should somehow be able to deal with this -- but this may just be because this case seems like an intermediate between regression (because we want to estimate real functions (0,1)->(0,1)) and classification (because all training and test patterns have y in {0,1}).
Bottom line: Is anyone aware of work that has been done in this or similar context?
Regards,
Ole Martin Halck
>> End of original post <<
|