ALTA logo Language Technology Programming Competition 2016
Task Description
Useful Information

Useful Information about the ALTA 2016 Shared Task

This page contains some links that you may find useful when attempting the task. More information may be added later on.

Related Papers

  1. Chisholm et al. (2016). Discovering entity knowledge bases on the web. In NAACL Workshop on Automated Knowledge Base Construction. [PDF]

Some Ideas

  • Train a classifier using features of the URL itself (e.g., cosine similarity between tfidf-weighted URL word vectors);
  • Derive features from the provided search result title and snippet information (e.g., cosine similarity between tfidf-weighted title word vectors);
  • Download the web pages and extract additional features from the full text or markup;
  • Use a large web crawl like Common Crawl or ClueWeb to collect links to these URLs and build mention context features

Feel free to post questions, comments, etc. at the Kaggle in Class competition page. In order to access the Kaggle in Class pages, you need to register with this shared task.


© ALTA 2016. Competition Organisers.