ALTA Logo Proceedings of ALTSS/ALTW, Melbourne, December 2003

SVMs and kernel methods in NLP

Jim Hogan, QUT


ABSTRACT:

Support vector approaches have been around since the mid 1990s, initially as a binary classification technique, with later extensions to regression and multiple class classification. At its core is the idea of structural risk minimisation, a principled technique for selecting a model which minimises generalisation error. As a result of its success in controlling model capacity and of the availability of remarkably fast quadratic programming approaches to training, the technique has been adopted widely and used across a variety of applications.

Within the SV framework, similarity between patterns is defined through the use of kernel functions, usually some kind of generalisation of the scalar product for real vectors. It is often possible to tailor kernel functions to a particular problem domain, with the use of string, syllable and tree-structure kernels particularly important in NLP. Moreover, for some classes of functions known as Mercer kernels, it is even possible to get the benefits of transforming to a higher dimensional feature space without ever leaving the original pattern space. This property is shared by the three most common approaches: the linear, polynomial and radial basis function kernels.

This course begins with a detailed, but accessible, introduction to the theory of the SV approach, before considering in turn a variety of NLP applications and the kernels which underpin their success. These will include text mining, topic spotting, authorship attribution, tagging and specialised sructural analysis in both NLP and bioinformatics. While much of our focus will be upon developments in specialised string kernels, we will also highlight the success of the 'vanilla' approaches, and the key role of scaling in ensuring adequate discrimination.

BIO :

Jim Hogan is a senior lecturer in QUT's school of software engineering and data communications, where among other things he works on machine learning problems in bioinformatics (SVMs for location of regulatory regions), NLP (authorship and cohort analysis, spatial semantics) and vision (SVM face classification, Bayesian top-down visual attention). [http://sky.fit.qut.edu.au/~hogan/]

RESOURCES:

Day One - Lectures

Lecture 1: The course introduction
Lecture 2: The theory of the SVM

Day One - Resources and Exercises

Most of the papers and books mentioned in the lectures are accessible from the kernel machines home page. Some are nevertheless listed separately due to their importance.
The Kernel Machines Home Pages
The Cristianini and Shawe-Taylor book pages
The Schölkopf and Smola book pages
The Burges Tutorial (citeseer)
The Schölkopf Tutorial (MS Research)
The Schölkopf and Smola Regression Tutorial
Software: The SVM-light home pages
The first day's exercise guide

Day Two - Lectures

Materials for this lecture can be found at Lecture 3: NLP-Specific Kernels

Day Two - Resources and Exercises

As before, most of the papers and books mentioned in the lectures are accessible from the kernel machines home page. Some are nevertheless listed separately due to their importance. Most of the papers listed are focused on string and structure kernels.
Text Classification using String Kernels (2002) (citeseer) Lodhi, Saunders, Cristianini, Shawe-Taylor, Watkins
Syllables and other String Kernel Extensions (citeseer) Saunders, Tschach, Shawe-Taylor
Latent Semantic Kernels (2001) (citeseer) Cristianini
Convolution Kernels for Natural Language (citeseer) Collins, Duffy
New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures ... (citeseer) Collins, Duffy

As a final demonstration of the utility of this approach, we offer a bioinformatics oriented work on mismatch kernels, and some effective noun phrase chunking software.
Mismatch String Kernels for SVM Protein Classification (2002) Leslie, Eskin, Weston, Noble
Software: YamCha: Yet Another Multipurpose CHunk Annotator