Proceedings of ALTSS/ALTW, Melbourne, December 2003
In this course we will introduce two hot areas of Language Technology: information extraction and question answering. Both are key areas for tasks that require the recovery of specific information from text documents. Due to the current availability of increasingly large volumes of text stored in digital form (e.g. in the World Wide Web), an increasing number of organisations and companies are becoming interested in applications from these areas. Information Extraction (IE) systems populate databases with specific information extracted from text documents. IE systems typically operate in closed domains (e.g. news of terrorist attacks) and the type of information to be extracted is predetermined by the system administrator (e.g identify the nature of the attack, the perpetrator, the time, the location, and the effect of the attack). In contrast, Question Answering (QA) systems return the answers to arbitrary questions asked in a human language by searching through the source documents. Now the type of information to be found is not predetermined and the source documents may belong either to closed domains (e.g. a computer manual) or to open domains (e.g. the World Wide Web). Both information extraction and question answering systems use an array of technologies that will be explored in this course. Topics to cover include document retrieval, named-entity recognition, question classification, linguistic resources, and logical inference. These topics will be introduced and their application to information extraction and question answering will be unveiled.
Diego Mollá is a lecturer in the Centre for Language Technology at Macquarie University in sydney, Australia. His research focuses on bridging the gap between theoretical linguistics, especially semantics and logical forms, and practical natural language processing applications. His current projects center around AnswerFinder, a question-answering system. He received an MSc in speech and language processing and PhD in the formal semantics of aspectual composition from the University of Edinburgh. He is currently secretary of the Australasian Language Technology Association. His teaching duties in Macquarie University's undergraduate Language Technology program include a 3rd-year unit in intelligent text processing and an Honours unit in question answering. [http://www.ics.mq.edu.au/gen/person/diego.html]