Yes please do!
Thank You!
Gary
----- Original Message -----
From: "Johann Drexl" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Monday, January 29, 2001 4:41 AM
Subject: Re: SV probability estimation (longish)
> Ole Martin Halck wrote:
> >
> > Greetings all,
> >
> > Some months ago I posted a query on the www.kernel-machines.org message
board. Since I did not receive any replies there, I try this list instead.
Here goes.
> >
> > >> Start of original post <<
> >
> > Hello,
> >
> > Let me first say that I am a relative newbie to SV learning -- I've
scanned what has seemed the most central papers of the field, plus "The
Nature of SLT". One type of learning task seems not to have been treated
properly as far as I have seen, namely certain types of probability
estimation tasks. Please excuse me if I'm rambling in the following, it's
not _all_ clear to me, yet. :-)
> >
> > I _have_ noticed that the standard SV regression does indeed deal with
deviations from the means (often called "noise"); the problem is that in
many cases such deviations lie in the nature of the problem itself, whence
the concept of noise that should be removed is not really all that
applicable. A simple example: a throw of six on a die is not more noisy than
a three or four, though farther from the mean.
> >
> > A more interesting example: say that we have an underlying probability
p(x,y) = p(x)p(y|x) where the density p(x) of the input data is uniform on
(0,1), and y is Bernoulli with mean (unknown to us), say, x^2 (i.e. Pr(y=1)
= x^2, Pr(y=0) = 1-x^2). Using the standard epsilon-tube SV regression is
clearly not very useful, as there is no concept of "ignorable" error here;
the "error" or "noise" is always either x^2 or 1-x^2 for a given x. Given a
hypothesis f(x), a max-likelihood approach yields that the proper cost
function here is
> >
> > -sum(y ln f(x) + (1-y) ln (1-f(x)))
> >
> > with epsilon=0. The usual exposition of SV regression (and this is where
the rambling starts) with its convexity requirements and corresponding
optimization formulations, however, seems to presuppose a continuous density
p(y|x). Also, does not epsilon=0 remove much of the advantage of SV methods,
as all training data (x,y) become SVs?
> >
> > Still, something tells me that SV methods should somehow be able to deal
with this -- but this may just be because this case seems like an
intermediate between regression (because we want to estimate real functions
(0,1)->(0,1)) and classification (because all training and test patterns
have y in {0,1}).
> >
> > Bottom line: Is anyone aware of work that has been done in this or
similar context?
> >
> > Regards,
> >
> > Ole Martin Halck
> >
> > >> End of original post <<
>
> Are you interested in a-posteriori-probabilities and svm-classification?
> There has indeed been some work done in this area (e.g. by J. Platt and
> P. Solich).
> Reply my mail if you want me to send you the names of articles on this
> subject.
>
|