One place to find good feature space representations of speech signals is in
the ICASSP papers for the last few years. You can view the fisher kernel(cf.
Jaakkola and Haussler) using the HMMs modeling speech as an example of a
kernel built by using the ideas commonly used in speech processing; it has
been shown that SVMs with a fisher kernel based on HMMs outperform simple
HMMs themselves and also outperform the SVMs with simple
RBF/Polynomial/other simple kernels for such applications where the HMM
based modeling of data makes sense.
I will point to one of my own (early) papers as an example (simply because I
know about it), but you can definitely find many better examples on the web
if you search on google. You can check out the following paper which simply
used a fisher kernel designed with HMMs (where the HMM makes sense
originally) if you want:
http://www.ee.duke.edu/~balaji/papers/ICASSP_Final.pdf
Another simple idea for signal processing applications is to use a matched
filter based kernel for the SVM. A matched filter is a traditional signal
processing mechanism for finding similarity between signals and thus it
easily motivates kernels which can be based on the same design principles. I
dont remember a reference off-hand, but i'm sure you can find some good
examples of papers that use this idea on the web (with a simple Google
search).
Yet another simple idea is to model data as a simple Gaussian distribution
(where this modeling is appropriate) and to use the Bhattacharya distance
induced by that Gaussian as a kernel. A simple extension is to consider an
entire image as pixels drawn iid from an underlying distribution. Then the
distance between two images is the distance between the 2 PDFs from which
the pixels are drawn iid. This idea was used in the bag-of-pixels
representation based kernel in Risi Kondor's paper in last year's ICML/COLT
You can easily come up with your own kernels as follows: take a standard
signal processing idea for how you would model the data when no SVMs are
around and it would automatically lead you to a good kernel for an SVM. In
these cases, the SVM provides a discriminative training framework, and the
original models (which used to be trained generatively) provide good feature
spaces for the SVM. I'm actually not a huge fan of the SVM in these cases
since a better alternative would be to try to find a discriminative training
procedure for the old probabilistic model instead of artificially overlaying
an SVM on top of the old generative model, but this is a quick fix. To do
things right (from first principles) will require you to take the old
generative model and to train it by maximizing the conditional likelihood.
Often, this can be done quite easily if you used ideas such as those in this
paper:
http://web.media.mit.edu/~jebara/bounds/
Again I repeat, there is no alternative to thinking hard about your own
specific applications and writing good probabilistic models that describe
how your data is generated, or those that describe how to measure
"similarity"(effectively providing you with a kernel). I pointed to the
above only as examples; the best choices for your own specific problem are
likely to be different from these, which is why at the end of the day you
are going to write a paper about your research. If all we (as a community)
did was to simply plug in the SVM on our data without thinking about whether
the assumptions apply, we probably dont have much reason to write it up as a
research paper that we want our colleagues to read! For example, the SVM
assumes the samples are independent of each other which is clearly not true
if your samples have some systematic structure; similarly if your features
(from a single sample) have some statistical structure and are not simply
independent variables, then you should attempt to model them explicitly.
Balaji
|