Tutorials at ALTA 2013

We are pleased to announce that ALTA2013 will include pre-workshop tutorials on 4 December 2013. Schedules for these tutorials are available in the programme.

Working with the HCS vLab

Dominique Estival (University of Western Sydney) and Steve Cassidy (Macquarie University)

The Human Communication Science Virtual Laboratory (HCSvLab) will run a half-day workshop. The HCSvLab provides an on-line infrastructure for accessing human communication corpora (speech, text, music, sounds, video, etc.) and for using specialised tools for searching, analysing and annotating that data. The aims of the HCS vLab are to:

facilitate access of the Australian and international HCS communities to data and analysis tools;
afford new tool-corpus combinations and new emergent research;
allow analysis and annotation results to be stored and shared, thus promoting collaboration between institutions and disciplines;
improve scientific replicability by moving local and idiosyncratic desktop-based tools and data to an accessible, in-the-cloud, environment that standardises, defines, and captures procedures and data output so that research publications can be supported by re-runnable re-usable data and coded procedure.

The HCS vLab is designed to make use of national infrastructure - including data storage, discovery and research computing services. It incorporates existing eResearch tools, adapted to work on shared infrastructure, with a data-discovery interface to connect researchers with data sets, orchestrated by a workflow engine with both web and command line interfaces to allow use by technical and non-technical researchers, via a Web interface.

Applying Wikipedia as a machine-readable knowledge base

David Milne (CSIRO)

What if your search engine, recommender or clustering algorithm could consult Wikipedia as easily as we do, to understand more about the documents they encounter? This is not a far-fetched idea. While clearly intended for human readers, the raw structure of the Wikipedia bears striking resemblance to traditional knowledge bases and provides many footholds for algorithms to extract machine-readable knowledge. In this tutorial, we will work with Wikipedia to augment and enhance other textual information sources. This is broken down into three key problems:

Extracting structured knowledge from Wikipedia;
Connecting it to textual documents; and
Allowing people to easily, effectively and intuitively tap into it while searching and browsing.

For each of the three problems described above we will provide live demonstrations and hands-on activities. For the extraction problem, we present an extremely large thesaurus-like structure that has been automatically generated from Wikipedia, and show how it can be reasoned over (in a rough fashion) by machines. For the connection task, we demonstrate an algorithm that can automatically detect and disambiguate Wikipedia topics when they are mentioned in any textual document, and intelligently predict those that are most likely of interest to the reader. For the final problem, we present several end-user applications that combine the work described above with slick visualisation techniques, to provide enhanced browsing and searching experiences.

All of the presented systems are open source and publicly available on the web.