On the general issues of how to choose SVM kernels - it may be useful to
bear in mind that, if one interprets SVM classifiers (or indeed SVM
regression) from a Bayesian perspective, the kernel is just the covariance
function of a Gaussian process prior. Much is known about how such
covariance functions encode prior assumptions about a problem, and this
can be used as a guide in selecting appropriate kernels. See e.g. the
"Bayesian methods for SVMs" paper on
http://www.mth.kcl.ac.uk/~psollich/publications/node20.html - blatant
self-advertisement here but the paper also has lots of references to
related work by other people.
Regards,
Peter
--------------------------------------------------------------------------
Dr. Peter Sollich Department of Mathematics
Phone: +44 - (0)20 - 7848 2875 King's College
Fax: +44 - (0)20 - 7848 2017 University of London
E-mail: [log in to unmask] Strand
WWW: http://www.mth.kcl.ac.uk/~psollich London WC2R 2LS, U.K.
--------------------------------------------------------------------------
On Thu, 3 Jun 2004, Balaji Krishnapuram wrote:
> The following paper is relevant to the discussion on feature scaling, and
> what is the "right way" to do it: Please notice that I dont think there is a
> universal right way, but if you operate in an SVM kind of environment this
> might be a sensible thing to consider.
>
> http://research.microsoft.com/users/rherb/pubs/HerGrae02.htm
>
> In this paper Hrbrich and Graepel develop some interesting theoretical
> results (validated on some experiments) which show the benefits of scaling
> feature vectors to make each sample (x_i) of unit norm.
>
> Regarding the selection of the right kernel, it has been stated in the
> previous discussion that the linear kernel, RBF and (low order) polynomial
> kernel are the right candidates to consider. While notdisagreeing about the
> utility of these kernels I believe that the kernel really should reflect our
> intuition about what is a good feature space for representing the data. The
> appropriate feature space is thus very much a function of the application
> and the data used. For example what is applicable in signal/image processing
> contexts in representing images is clearly not sensible on say
> bio-informatics datasets.
>
> Thus while the SVM gives a good solution for designing classifiers once the
> feature space (and thus the kernel) has been fixed, the decision about what
> is the right feature space has been brushed under the carpet so to speak.
> After you get the right feature space the kerne itself is just a dot product
> in that space, and even simple nonlinear transformations such as that
> afforded by RBF kernels would probably help a bit more. However what I mean
> to point out is that there is no substitute for choosing a good way to
> represent the data in such a way that the classes can be disambiguated
> easily. We can summarize the intuition this way: if we try to discriminate
> between people based on their height you can only achieve a certain error
> rate, but if you use finer details such as their voice or picture of their
> face, you can do better.
>
> This realization has spawned a spate of papers that explain the design of
> kernels that are appropriate to specific problems (see for example the book
> "Kernel methods in computational biology" (Edited by B. Scholkopf, J-P. Vert
> and K. Tsuda, MIT Press, 2004) which has several examples of kernels
> specificallyt designed to exploit characteristics of specific types of
> data(such as graphs, protein sequences, etc).
>
> In a word I only want to point out that the SVM cant do magic and
> distinguish classes when the class conditional distributions completely
> overlap in the feature space induced by the kernel. We do still have to
> think carefully about the physics of the problem we are trying to solve:
> that is the right way to think about how to choose kernels, and it is the
> responsibility of the user of the SVM!
>
> Finally, the SVM is only one way to think of classifiers, and to be very
> clear it encodes a certain prior intuition that large margin separation
> helps improve classification. This is still just intuition. Even though we
> can use this intuition to prove large margin (so called radius/margin) bouns
> for the generalization, other intuitions (for example that sparseness leads
> to good classification, or that minimizing the 1-norm of the resulting
> classifier,w, leads to good classification) can also be used to derive
> equally good bounds.
>
> Thus the common *misconception* that the SVM is somehow superior to all
> other algorithms is clearly not justifiable. In a Bayesian sense each type
> of intuition is just a prior(though the SVM cant be encoded fully as a
> probabilistic system), and no prior is universally the best for all
> datasets! The responsibility still resides with the end user to encode the
> intuition that is appropriate. Statistically rigorous methods allow us to
> even compare which intuition (prior) is more sensible and validated in a
> specific example dataset (for example by comparing the "evidence" or
> marginal likelihood). What we can't do is to claim that we can skip this
> step and declare that there is a universally superior algorithm (such as the
> SVM) or a feature representation (kernel) which will work best in all
> circumstances.
>
> I hope that helps clarify things somewhat, but if you need specific
> references on any of the points I mentioned I can point to papers on that
> topic. I think this is a discussion that needs to be clearly explained and
> argued; all too often it has been ignored, and practitioners tend to
> overlook better alternatives for their own problems.
>
> Balaji
> ----------------------------------------------------------------
> Phone (home): 919-383-2069 | Off : 919-660-5233
> Email : [log in to unmask] | http://www.duke.edu/~balaji/
> ----------------------------------------------------------------
>
> --
> This message has been scanned on otto for
> viruses and dangerous content by MailScanner,
> and is believed to be clean.
>
>
--------------------------------------------------------------------------
Dr. Peter Sollich Department of Mathematics
Phone: +44 - (0)20 - 7848 2875 King's College
Fax: +44 - (0)20 - 7848 2017 University of London
E-mail: [log in to unmask] Strand
WWW: http://www.mth.kcl.ac.uk/~psollich London WC2R 2LS, U.K.
--------------------------------------------------------------------------
--
This message has been scanned on otto for
viruses and dangerous content by MailScanner,
and is believed to be clean.
|