ALTA logo Australasian Language Technology Summer School
SST 2004
Information for Presenters

ALTSS 2004 Program

Full courses will have four sessions of 1.5 hours each, usually distributed in two days. Half courses will have two sessions of 1.5 hours each. The program is scheduled so that introductory courses run in parallel with advanced courses.

Introductory Course Advanced Course
Saturday 9:00-10:30
VoiceXML Information Retrieval
4 Dec2:00-3:30
Speech Annotation with EMU Multiword Expressions
Sunday 9:00-10:30
VoiceXML Information Retrieval
5 Dec 2:00-3:30
Speech Annotation with EMU Multiword Expressions
Monday 9:00-10:30
Speech Processing Maximum Entropy Modelling
6 Dec 2:00-3:30
Grammar Formalisms Maximum Entropy Modelling
Tuesday 9:00-10:30
Speech Processing Text Categorisation
7 Dec 2:00-3:30
Grammar Formalisms Prosody and Intonation in Australian English

Course Details Introductory Courses


Rolf Schwitter - Macquarie University, Sydney

[Course Notes]

This course provides an introduction to VoiceXML for telephony-based spoken language dialog systems. VoiceXML is a markup language designed for creating audio dialogs between a user and a machine. It uses speech recognition and touch-tone for input and text-to-speech synthesis and pre-recorded audio for output. Any telephone can be used to access a VoiceXML application via a VoiceXML browser that is running on a voice server. VoiceXML relies on other markup languages for describing recognition grammars, speech synthesis, and call control constructs. This entire suite of markup languages is known as the W3C Speech Interface Framework. Apart from VoiceXML, we will briefly touch on the main components of this framework and introduce a number of freely available VoiceXML development tools so that the students can start building their own VoiceXML applications by the end of this course.

The course will cover the following topics:

  • Background: Introduction to Spoken Language Dialog Systems;
  • VoiceXML and the W3C Speech Interface Framework;
  • VoiceXML:
    • Dialogs, Forms, and Fields;
    • Development Tools;
    • Control Flow;
    • Grammars;
    • Scripting;
    • Mixed Initiative.

Note that we will not focus in this course on the speech recognition and speech synthesis process but we will discuss related issues involved in building spoken language dialog systems such as dialog and prompt design.

Bio Rolf Schwitter is a Senior Lecturer in the Computing Department at Macquarie University and a member of the Centre for Language Technology at Macquarie. Rolf received his doctorate degree in Computational Linguistics from the University of Zurich in 1998. Rolf's area of teaching includes Web Technology, Spoken Language Dialog Systems as well as Advanced Topics in Natural Language Processing. In the context of a Spoken Language Dialog Systems unit, he recently gave introductory lectures to VoiceXML at the University of Zurich, at the University of Stockholm, and at Macquarie University. His research interests focuses on controlled natural language processing, question-answering, knowledge representation and automatic reasoning. Personal page (external link)


Speech Annotation with EMU

Steve Cassidy - Macquarie University, Sydney

This course will provide a general introduction to collecting, annotating and working with speech data using the EMU Speech Database System and related tools. The course assumes a general familiarity with speech and the desire to make speech databases work in some context. No particular knowledge of phonetics, prosody or other annotation system is required.

The course will cover the following topics:

  • Recording speech signals
  • Making data available to EMU.
  • Developing an annotation framework, writing a database template.
  • Automating annotation tasks.
  • EMU signal processing tools.
  • Querying your annotations.
  • Strategies for data analysis with R and other tools.
  • Building on EMU -- what to do when EMU doesn't work for you

Students will carry out practical work as part of the course which will involve recording some data, annotating it from scratch and performing some simple analysis. If you have your own data, please bring it along on CDROM and we will try to help you work with it.

Bio Steve Cassidy is a Computer Scientist who has worked in various areas relating to language and cognition over the last 15 years. He completed a PhD in Wellington, New Zealand on computer models of reading development and then moved to Macquarie University, Sydney to work in the Speech Hearing and Language Research Centre (SHLRC). At SHLRC he worked on applying statistical models to acoustic phonetics problems and on the development of the EMU Speech Database System. His work on EMU has led to an involvement with groups in the US and Europe who are aiming to define standards for Linguistic annotation. Steve is now working in the Computing department at Macquarie where he is pursuing research in meeting room speech processing and Linguistic annotation. Steve is currently a member of the Executive of the Australian Speech Science and Technology Association. Personal page (external link)


Grammar Formalisms

Ash Asudeh - University of Canterbury, Christchurch

[Course Notes]

This course is an introduction to three grammar formalisms developed in theoretical linguistics that have clear applications to computational linguistics and natural language processing. The formalisms considered are Categorial Grammar (CG), Head-Driven Phrase Structure Grammar (HPSG), and Lexical Functional Grammar (LFG). The course presupposes no background in linguistic theory.

We will begin with an introduction to linguistics that gives students some background on modern perspectives in theoretical linguistics, goals of the field and resulting issues. The grammatical architectures of CG, HPSG, and LFG will then be introduced in relation to this background. We will then proceed to look at how these three theories address the following topics:

  1. Syntactic categories and basic combinatorics
  2. The role of the lexicon: heads, agreement, and complementation
  3. Modifiers

We will only have time to touch on each topic briefly. In each case we will concentrate on understanding the intuitions that the formalisms seek to capture, rather than details of analysis. The aim is to give students enough background that they can confidently further explore the formalisms on their own. The course will end with a quick demo of a grammar engineering environment for large-scale grammar development.

Bio Ash Asudeh is a Fellow in the School of Classics and Linguistics at the University of Canterbury. He received an M.Phil. in cognitive science from the University of Edinburgh and a Ph.D. in linguistics from Stanford University. While at Stanford, he also worked on the constraint-based semantics project at Xerox PARC. His primary research interests are syntax, its relationship to semantics, and the implications of this relationship for linguistic theory and grammatical architecture. He has also worked on computational linguistics, psycholinguistics, and the syntax--phonology interface. Asudeh's current work focuses on applications to linguistic theory of resource logics developed in formal logic and theoretical computer science.


Speech Processing

David Grayden - The Bionic Ear Institute, Melbourne

[Course Notes]

This course is an introduction to the speech signal and how it is processed by humans and by machines. We begin with the production of speech, the properties of the acoustic signal and how it is perceived by humans. Then we look at the methods of analysing the speech signal. Speech signal analysis and human perception are tied together by looking at speech coding, in particular perceptual coding of sound using MPEG-1 psychoacoustic models, such as MP3. We touch on data embedding and watermarking and then look at automatic speech recognition in some detail. Finally there is an introduction to speech synthesis and areas of ongoing speech processing research.

Bio Dr David Grayden has been working as a Research Fellow at the Bionic Ear Institute in Melbourne since 1997. His main research involves examination of phoneme confusions made by people using cochlear implants with the view to designing strategies that will improve perception by the users. He is currently developing and evaluating a number of advanced sound processing strategies. He is also involved in other research areas, including automatic speech recognition and speech enhancement using auditory models, auditory physiology, integration of auditory and visual input, and models of spike-timing dependent plasticity for adaptive learning of spatiotemporal patterns. Personal page (external link)


Course Details Advanced Courses

Multiword Expressions

Timothy Baldwin - University of Melbourne, Melbourne

Multiword expressions (MWEs) are word amalgams which are semantically, syntactically and/or statistically idiosyncratic in some way, and occur in a wide range of configurations including verbal idioms (e.g. "kick the bucket"), verb particle constructions (e.g. "throw up") and coordinate structures (e.g. "dull and boring"). In recent years, there has been increasing awareness in computational linguistics of the need for specialised methods to detect and capture the syntactic flexibility, semantic generalities and productivity of MWEs. In this course, I will document some of the difficulties posed by MWEs for real-world NLP applications, and outline a range of methods which have been proposed to tackle these issues. I will also describe crosslingual commonalities and divergences of MWEs, and devote some time to discussing their multilingual implications.

Bio Timothy Baldwin is a Senior Lecturer in the Department of Computer Science and Software Engineering at the University of Melbourne (effective September, 2004), and also a Senior Researcher in the CSLI LinGO Laboratory, Stanford University. His research interests are in the extraction and syntactico-semantic classification of multiword expressions, and also machine translation, computational lexical semantics, the interface between theoretical and computational linguistics, and computer-assisted language learning applications for computational linguistics. Personal page (external link)


Information Retrieval

Mark Sanderson - University of Sheffield, Sheffield, UK

[Course Notes]

Across the four sessions, the field of Information Retrieval will be introduced. The workings of a traditional IR system as well as an overview of Web search will be covered. In addition the evaluation of IR systems will be described. One session will be focused on the retrieval of documents written in different languages covering the user needs for such technology, use of translation resources and interface design for such cross-language retrieval systems. Finally retrieval of speech documents will also be covered featuring a demonstration of a working system and a discussion of why speech recognition systems are now operating at a sufficient level of accuracy to allow almost perfect document retrieval to take place.

Bio Mark Sanderson is a senior lecturer within the Information Studies department at the University of Sheffield since 1999 where he has taught Information Retrieval and advanced Information Retrieval. He ran the introduction to the IR tutorial at ACM SIGIR 2000 and 2001. He is on the editorial board of ACM TOIS (Transactions On Information Systems), JASIST (Journal of the American Society of Information Science and Technology), IP&M (Information Processing and Management), and IR (Information Retrieval). He is also the TREC advisory panel. Prior to his current post in Sheffield, he was a research assistant for four years, working first at the University of Glasgow and then at the Center for Intelligent Information Retrieval in the University of Massachusetts. His PhD on Word Sense Disambiguation was carried out at the University of Glasgow. Currently he is co-investigator on two EU funded projects, SPIRIT researching geographic-based retrieval and BRICKS a 7 million Euro Integrated Project exploring digital library provision to Europe's cultural heritage community. Personal page (external link)


Maximum Entropy Modelling

James Curran - University of Sydney, Sydney

[Course Notes]

This course will provide a detailed introduction to Maximum Entropy (maxent) modelling for Natural Language Processing. The course assumes only familiarity with basic probability and statistics, but will include a quick refresher of the necessary background. It aims to give a strong intuitive understanding of maxent modelling which will allow students to use maxent models effectively, but will also cover some of the deeper mathematics.

The course will cover:

  • Necessary probability and statistics refresher
    • Statistical modelling
    • Naive Bayes models
  • Information theory concepts: information, entropy
  • Maximum entropy models
    • Features and constraints
    • Training algorithms (GIS, IIS, conjugate gradient, ...)
  • Sequence modelling with maxent
  • Recent advances in maxent models:
    • Smoothing techniques
    • Conditional random fields
  • Applications of maximum entropy models in NLP:
    • Classification tasks: pp-attachment, question classification, ...
    • Tagging tasks: POS tagging, chunking, named entity recognition, ...
    • Parsing: C&C CCG parser

Bio James Curran is an ARC Postdoctoral Fellow in the Language Technology Research Group in the School of Information Technologies at the University of Sydney. He has just returned to Australia after completing his Ph.D. in computational lexical semantics at the University of Edinburgh.

His ARC funded project, 'Ask the Net: Intelligent Natural Language Learning', involves automatically asking contributors simple questions via email which will be collected to create annotated data for standard NLP problems, e.g. Named Entity Recognition. An interesting challenge is finding ways of eliciting linguistic knowledge from those without linguistic training. His other research interests range from standard statistical NLP problems such as tagging and parsing, through to system building such as question answering systems.


Text Categorisation (half course)

Prof. Jon David Patrick - University of Sydney, Sydney

[Course Notes]

This course will cover the principal topics important to creating a working text categorisation system. It will focus on the components of such a system and processes required to create it based on the practical experiences of the Scamseek project. The role of computational linguistics will be the centre of the discussion but the surrounding tasks of language modelling, machine learning and software engineering will all be discussed to varying degrees.

The course will be grounded in the experience of implementing the Scamseek system. Scamseek has achieved the automation of identification of financial scams over a wide range of Internet texts to a high level of accuracy. The first system for detecting scams on web pages has been operational since September 2003 and it successfully discovered cases on its first days of operation that have gone to prosecution. The complete system to cover all Internet traffic has been operational since June 2004. These systems are unique in that they use a linguistic model for computing meaning via Systemic Functional Grammar. This model of language meaning also solves some of the problems of identifying very small target sample sizes in very large corpora, that is texts with <1% footprint in a corpus. Discussion of some aspects of the Scamseek project are restricted under secrecy agreements with ASIC.

Bio Professor Jon Patrick currently holds the Chair of Language Technology at the University of Sydney. He has 5 degrees and is also a registered psychologist. His early research was in developing information systems for the real-time capture of language descriptions of human behaviour. In this work he created the first systems for recording human behaviour by real-time verbal descriptions. These systems were applied to many sports such as Rugby, AFL, waterpolo, and surfing. In the late 1980s he created the first systems for the automatic capture and on-screen presentation of player statistics in real-time for television broadcasts. In later systems research he particularly concentrated on the use of subliminal language and its effectiveness at influencing personal and group behaviour. He continues this work in research on the identification of tacit knowledge in IS development through language usage analysis and the nature of language in psychotherapy. He collaborates with computational linguists at the University of the Basque Country and has published the first substantial student grammar of Basque. He is currently acting as the Director of the Scamseek project, a scam detection system developed for ASIC by the Capital Markets Co-operative Research Centre (CMCRC). Personal page (external link)


Prosody and Intonation in Australian English (half course)

Janet Fletcher - University of Melbourne, Melbourne

[Course Notes]

This course provides a practical introduction to the study of prosody and intonation in English. The first part of the course will provide a brief introduction to current prosodic theory. The second and major part of the course will focus on practical aspects of the widely-used model of intonational analysis- E-TOBI (English Tones and Break Indices). The primary objective is to teach participants how to interpret acoustic properties of speech relevant to the understanding of the higher levels of prosodic structure, as well as to provide a hands-on approach to ToBI transcription.

Bio Janet Fletcher is an Associate Professor of Linguistics in the School of Languages at the University of Melbourne. She completed her PhD in experimental phonetics at the University of Reading, and was a research associate in the Centre for Speech Technology Research at the University of Edinburgh from 1986-1988. After 2 years of postdoctoral study with Mary Beckman at the Ohio State University, she worked in the Speech Hearing and Language Research Centre, Macquarie University on an industry-funded project on speech synthesis. She has been at the University of Melbourne since 1993. Her research interests include intonation and prosody in Australian English, and the segmental/prosody interface in Northern Australian Indigenous languages.



For any comments or questions about these pages please contact the ALTA secretary.

Copyright 2003 ALTA. Last updated: Fri, 26 Nov 2004 02:44:20 +0000