|Language Technology Programming Competition 2012|
2012 Shared Task Description
Basic Task Description
July 18, 2012 | Version 1
The goal of this task is to build automatic sentence classifiers that can map the content of biomedical abstracts into a set of pre-defined categories, which are used for Evidence-Based Medicine (EBM). EBM practitioners rely on specific criteria when judging whether a scientific article is relevant to a given question. They generally follow the PICO criterion: Population (P) (i.e., participants in a study); Intervention (I); Comparison (C) (if appropriate); and Outcome (O) (of an Intervention). Variations and extensions of this classification have been proposed, and for this task we will extend PICO by adding the classes Background (B) and Study Design (S); and including sentences that have no relevant content: Other (O). Therefore, the goal will be to classify the provided sentences according to the PIBOSO schema. Such information could be leveraged in various ways: e.g., to improve search performance; to enable structured querying with specific categories; and to aid users in more quickly making judgements against specified PICOSO criteria.
In order to build classifiers, 800 expert-annotated training abstracts will be provided, and the goal will be to build classifiers to annotate 200 test abstracts with the relevant labels. This is a multi-label classification problem, since each sentence can have more than one label. The tagset is defined as follows:
More information about this problem, the construction of the dataset, and a benchmark can be found in Kim et al. (2011)
Data Files and SubmissionWe will use Kaggle in Class for this year's competition (look for the ALTA-NICTA Challenge). The details about data formats and the submission will be provided in the competition website.