>
> I have two problems:
>
> i) how to incorporate spatial variation, including jumps, into a kernel
> (other than a high order polynomial approximation which takes >10^7
> iterations for 1024 data points), some kind of combination of Gaussian
> and OU kernels might work, but this results in numerous hyperparameters
> for the weightings and widths
>
> (Gaussian K(x,z) = exp(-||x-z||^2/2s^2)
> OU K(x,z) = exp(-||x-z||/2s)
>
> with, e.g., an RBF, one could specify different s in different areas of
> space, but this leads to a non-pos-def kernel (so one /could/ use
> Mangasarian's Generalized SVM)
Another possibility to Mangasarian's generalized SVM is to use TheRelevance
vector Machine etc which again do not require positive semi-definite gram
matrices. Finally are you sure that sparsity of the solution is an important
criterion? If not you might consider the Gaussian process aproach or other
Bayesian models with advantage.
> ii) the performance is /extremely/ sensitive to the parameters C and s,
> is there any principled way to find these (other than grid search with
> CV, GCV, etc) or any heuristic guidelines?
> (there are quite a few for classification)
{The idea is from an interesting comment I saw in Ralf Herbich's new book
"Learning Kernel Classifiers", MIT Press 2002} The bayesian evidence
maximization can be used for determining optimal C and s as part of the
optimal hypothesis space (i.e. model) selection - This could prove very
interesting even if you dont use Gaussian processes (which was the context
of the comment) but used a generalized MCMC (Markov chain Monte
Carlo)approach.
I hope that helps at least somewhat and serves to at least get started.
Regards,
Balaji
|