Proceedings of ALTSS/ALTW, Melbourne, December 2003
Support vector approaches have been around since the mid 1990s, initially as a binary classification technique, with later extensions to regression and multiple class classification. At its core is the idea of structural risk minimisation, a principled technique for selecting a model which minimises generalisation error. As a result of its success in controlling model capacity and of the availability of remarkably fast quadratic programming approaches to training, the technique has been adopted widely and used across a variety of applications.
Within the SV framework, similarity between patterns is defined through the use of kernel functions, usually some kind of generalisation of the scalar product for real vectors. It is often possible to tailor kernel functions to a particular problem domain, with the use of string, syllable and tree-structure kernels particularly important in NLP. Moreover, for some classes of functions known as Mercer kernels, it is even possible to get the benefits of transforming to a higher dimensional feature space without ever leaving the original pattern space. This property is shared by the three most common approaches: the linear, polynomial and radial basis function kernels.
This course begins with a detailed, but accessible, introduction to the theory of the SV approach, before considering in turn a variety of NLP applications and the kernels which underpin their success. These will include text mining, topic spotting, authorship attribution, tagging and specialised sructural analysis in both NLP and bioinformatics. While much of our focus will be upon developments in specialised string kernels, we will also highlight the success of the 'vanilla' approaches, and the key role of scaling in ensuring adequate discrimination.
Jim Hogan is a senior lecturer in QUT's school of software engineering and data communications, where among other things he works on machine learning problems in bioinformatics (SVMs for location of regulatory regions), NLP (authorship and cohort analysis, spatial semantics) and vision (SVM face classification, Bayesian top-down visual attention). [http://sky.fit.qut.edu.au/~hogan/]