ALTA 2023 Shared Task Description

Language Technology Programming Competition 2023

Home

Task Description

Useful Information

Submission

2023 Shared Task Description

Basic Task Description

Background

The recent advancements in Large Language Models (LLMs) represent a paradigm shift in the field of human-computer interactions. However, akin to any groundbreaking technology, LLMs are a double-edged sword for our society. Beyond disseminating distorted news, the potential misappropriation of LLMs may engender a myriad of social and ethical dilemmas, including academic malfeasance and election manipulation. This incident underscores the escalating urgency within scholarly communities to devise strategies for the detection and thorough scrutiny of synthetic text.

Goal

The goal of this task is to build automatic detection systems that can discriminate between human-authored and synthetic text generated by Large Language Models (LLMs). The generated synthetic text will come from a variety of sources, including different domain sources (e.g., law, medical) and different LLMs (e.g., T5, GPT-X). The performance of the models will be evaluated based on their accuracy, robustness in detecting synthetic text.

Formally, this is a binary classification problem, as each candidate sentence can be generated by human or LLMs. Participants are encouraged to utilize a wide range of methodologies and approaches to build their own detection system.

Data Files and Submission

We will use CodaLab for this year's competition (ALTA Shared Task 2023). The details about data formats and the submission will be provided in the competition website.

Important Dates

Release of training data	On registration
Release of test data	27 September 2023
Deadline for submission of results over test data	3 October 2023
Notification of results	6 October 2023
Deadline for submission of system description	25 October 2023
Presentation of results at ALTA 2023	29 November - 1 December 2023