|Australasian Language Technology Workshop 2007|
Dr Sophia Ananiadou, University of Manchester
Title: Text mining techniques for building a Biolexicon
Abstract: My talk will focus on building a biolexicon by leveraging existing bio-resources, combining them within a common, standardized lexical, terminological, conceptual representation framework and employing advanced NL technologies to discover new terms, concepts, relations and linguistic lexical information from text. In particular I will discuss term normalisation techniques, named entity recognition and a smart dictionary look up. This research forms part of the National Centre for Text Mining (www.nactem.ac.uk) and the project BOOTStrep.
Bio: Sophia Ananiadou is Reader in Text Mining in CS and Deputy Director of the National Centre for Text Mining (NaCTeM). She has worked in NLP since 1983 and is the main developer of the terminology management services provided by NaCTeM. Her current research includes building bio-lexica and bio-ontologies for gene regulation (FT6 IST, BOOTStrep), the text-mining-based visualisation of the provenance of biochemical networks (REFINE), text mining for systematic reviews (ASSERT) and integrating Life Science data (ONDEX). She is recipient of the 2004 Daiwa Adrian prize (jointly with Prof. Tsujii) for her research in Knowledge Mining for Biology, and in 2006 of the IBM UIMA innovation award for leading work on the interoperability of text-mining (TM) tools.
Nick Thieberger, PARADISEC, University of Melbourne / University of Hawai'i at Manoa
Title: Does language technology offer anything to small languages?
Abstract: The effort currently going into recording the smaller and perhaps more endangered languages of the world may result in computationally tractable documents in those languages, but to date there has not been a tradition of corpus creation for these languages. In this talk I will outline the language situation of Australia's neighbouring region and discuss methods currently used in language documentation, observing that it is quite difficult to get linguists to create reusable records of the languages they record, let alone expecting them to create marked-up corpora. I will highlight the importance of creating shared infrastructure to support our work, including the development of Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), a facility for curation of linguistic data.
Bio: Nick Thieberger studied French and linguistics at LaTrobe University and submitted his MA thesis titled Aboriginal Language Maintenance: Some Issues and Strategies in 1988. He worked at the School of Australian Linguistics in Batchelor then at the Institute of Applied Aboriginal Studies in Perth in the mid 1980s and then established Wangka Maya, the Pilbara Aboriginal language centre, in Port Hedland, supporting documentation of local Aboriginal languages. He worked at the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) from 1991-1995 developing an archive of electronic data related to Aboriginal studies. From 1995-1997 he was an Australian Volunteer at the Vanuatu Cultural Centre where he began documenting the language of South Efate. His PhD dissertation was a grammatical description of that language, together with a corpus of linked video, audio and text files, glossed texts and a dictionary. Nick came to Melbourne University to take up PhD studies in 1998. Since then he has also worked as a consultant on Native Title issues and on several joint projects with AIATSIS. His research interests focus on language documentation, metadata for archiving linguistic data, computer tools for linguistic data management, and audio and text linkage for representation of linguistic information. He took up an ARC Postdoctoral Fellowship in July 2004, and continues to work as Project Manager of the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC).
Professor Justin Zobel, NICTA
Title: Measures of Measurements: Robust Evaluation of Search Systems
Abstract: A good search system is one that helps a user to find useful documents. When building a new system, we hope, or hypothesise, that it will be more effective than existing alternatives. We apply a measure, which is often a drastic simplification, to establish whether the system is effective. Thus the ability of the system to help users and the measurement of this ability are only weakly connected, by assumptions that the researcher may not even be aware of. But how robust are these assumptions? If they are poor, is the research invalid? Such concerns apply not just to search, but to many other data-processing tasks. In this talk I introduce some of the recent developments in evaluation of search systems, and use these developments to examine some of the assumptions that underlie much of the research in this field.
Bio: Professor Justin Zobel is leading the Computing for Life Sciences initiative within National ICT Australia's Victorian Laboratory. He received his PhD from the University of Melbourne and for many years was based in the School of CS&IT at RMIT University, where he led the Search Engine group. He is an Editor-in-Chief of the International Journal of Information Retrieval, an associate editor of ACM Transactions on Information Systems and of Information Processing & Management, and was until recently Treasurer of ACM SIGIR. In the research community, he is best known for his role in the development of algorithms for efficient text retrieval. He is the author of "Writing for Computer Science" and his interests include search, bioinformatics, fundamental data structures, and research methods.
The Australasian Language Technology Workshop is being organised by ALTA, the Australasian Language Technology Association.
For any comments or questions about the workshop please contact the organizers (workshop AT alta DOT asn DOT au).