|Language Technology Programming Competition 2011|
2011 Shared Task Description
Basic Task Description
June 21, 2011 | Version 1
The basic task is to build an automatic evidence grading system for evidence-based medicine. Evidence-based medicine is a medical practice which requires practitioners to search medical literature for evidence when making clinical decisions. The practitioners are also required to grade the quality of extracted evidence on some chosen scale. The goal of the grading system is to automatically determine the grade of an evidence given the article abstract(s) from which the evidence is extracted.
You will be provided with: (1) a set of training documents; (2) a set of development documents; and (3), closer to the submission deadline, a set of test documents. For the training and development sets, you will additionally have access to the evidence grades. For the test documents, this will not be provided.
The grading scale used for this task is the Strength of Recommendation Taxonomy (SORT). This taxonomy has 3 grades - A (strong), B (moderate) and C (weak). The grade of an evidence depends on multiple factors and information about this grading scale can be found in the paper by Ebell et al. (2004)
The grades used for this task have been generated by medical experts. Your task is to implement a grading system based on the training and development datasets, to then run over the test documents to determine the grade of each evidence.
Data Files and Format
The training and development sets will contain:
41711 B 10553790 15265350
The test set will contain:
41711 10553790 15265350
The results of your evidence grading system should be submitted in a single text file with each line containing:
The first few lines of a submission file may look as follows:
Here are the results of all participants who submitted a poster. Each participant was allowed to submit up to three runs. The evaluation meaure is the accuracy (number of correct classifications divided by the total number of classifications). Since none of the participants obtained results that were statistically significantly better than the baseline, no prizes were awarded.
The baseline is a simple majority baseline: classify all elements with class "B". The results with 5% confidence intervals are: