|Language Technology Programming Competition 2023
2023 Shared Task Description
Basic Task Description
The recent advancements in Large Language Models (LLMs) represent a paradigm shift in the field of human-computer interactions. However, akin to any groundbreaking technology, LLMs are a double-edged sword for our society. Beyond disseminating distorted news, the potential misappropriation of LLMs may engender a myriad of social and ethical dilemmas, including academic malfeasance and election manipulation. This incident underscores the escalating urgency within scholarly communities to devise strategies for the detection and thorough scrutiny of synthetic text.
The goal of this task is to build automatic detection systems that can discriminate between human-authored and synthetic text generated by Large Language Models (LLMs). The generated synthetic text will come from a variety of sources, including different domain sources (e.g., law, medical) and different LLMs (e.g., T5, GPT-X). The performance of the models will be evaluated based on their accuracy, robustness in detecting synthetic text.
Formally, this is a binary classification problem, as each candidate sentence can be generated by human or LLMs. Participants are encouraged to utilize a wide range of methodologies and approaches to build their own detection system.
Data Files and Submission
We will use CodaLab for this year's competition (ALTA Shared Task 2023). The details about data formats and the submission will be provided in the competition website.