ALTA Logo Proceedings of ALTSS/ALTW, Melbourne, December 2003

Validation and Evaluation in NLP and IR

David Powers, Flinders University


ABSTRACT:

So you're developing a Natural Language system? You've developed a model and want to train it up and prove how good it is, but how? The development and training of a model can be undertaken in many ways, and may be theoretically driven or empirically derived. It may involve statistical learning or neural networks. It may use a supervised or an unsupervised paradigm. In all cases there are pitfalls in training and testing the system, and many approaches to validation and evaluation lead to invalid or misleading comparisons with other approaches.

The first step to setting up a model is to correctly sample the target corpus and provide the appropriate number of datasets for the chosen development paradigm.

The second step is to ensure that appropriate manipulations of the raw data are performed systematically to ensure reproducible results from the algorithms employed.

The third step is to ensure that the output distributions from the model match the probability distribution of the target corpus or application.

The fourth step is to ensure that appropriate evaluation techniques are used to determine how well your system is doing compared to chance.

This course will go through each of these stages and identify common mistakes and sneaky manipulations that lead to the publication of meaningless or misleading results.

BIO :

David Powers has been working in the area of Machine Learning of Natural Language for over 25 years, and has published over 100 papers as well as a monograph and a number of proceedings in the area. Powers organized the first events in MLNL in 1991 and founded SIGNLL in 1993 and CoNLL in 1997.

Currently Powers is Head of the AI Lab at Flinders University an supervises a dozen projects relating to the learning of natural language and ontology, falling under two major research areas making use of a range of learning, analysis and data fusion techniques: The robot baby and the intelligent room (commercialized by I2Net), including audiovisual speech/speaker recognition/location, spelling/grammar checking, transcription of Asian languages, brain/speech control of computers/devices. Advanced web search and visualization (commercialized by YourAmigo), including search of the hidden web, syntactic and semantic tagging of web pages for more accurate search and ranking, and intuitive display of multidimensional data. [http://www.infoeng.flinders.edu.au/people/pages/powers_david/]

RESOURCES:

Lecture Slides

Readings

Links