The goal in each task is the same: to replicate results described in the paper for that task. Specific resources and information are given below in each task. Participants may choose to take part in one, two or all three tasks. Participants must also post their code to the CENTRE Bitbucket repository, and follow the track guidelines for the format of the notebook paper.
Bitbucket repository
Bitbucket is basically just commercial Github, so if you are used to Github, Bitbucket will be very familiar. To request your repo, contact INFO HERE. Your code must be submitted to Bitbucket prior to the run submission deadline. For convenience, please put a tag in the repository indicating the submitted version of your code; otherwise, we will be forced to assume the latest commit prior to the deadline (Not good for replicability!).
Notebook paper format
Notebook papers should follow this general outline:
- Specify the open source system used, by version
- Identify algorithmic details and parameters obtained from the "target" group's TREC report.
- Identify details obtained from other sources, including other publications from the "target" group and personal communications
- Identify assumptions made.
- Indentify departures from the "target" group's approach, for example due to missing proprietary data or missing/different system components.
- Lastly, analyze the performance of your runs vs. the "target" runs. Are results consistent across topics? Is there anything in common among documents found by the "target" and not by the replication run? Etc.
Task 1: Clinical Decision Support 2016
The TREC Clinical Decision Support (CDS) track featured an adhoc task where queries took the form of a patient note from a health record and a query "goal": determine a diagnosis, identify a treatment, or propose a test. Systems search medical journal articles for relevant information for that patient's clinical situation.
Task 1 will attempt to reproduce the runs from the MERCKKGAA group from the TREC 2016 Clinical Decision Track:
[Gurulingappa et al, 2016] Gurulingappa, H., Toldo, L., Schepers, C., Bauer, A., and Megaro, G. (2016) Semi-Supervised Information Retrieval System for Clinical Decision Support. In TREC 2016.
The MERCKKGAA group runs are built on a Solr base, but add pseudorelevance feedback, query expansion using UMLS, and a supervised learning-to-rank model. Four runs are described in the paper, and participants in task 1 can choose to recreate two or more of them. The metrics for the CDS track were P@10, R-precision, inferred NDCG, and inferred AP.
Resources for this task:
Task 2: Web Track 2013
The Web Track in 2013 had an adhoc web search task with an unusual measurement approach: risk-sensitivity, which is the tradeoff between improving the overall score for a system against performing poorly on individual topics. We will be ignoring the risk-sensitivity portion of the task to focus on the traditional adhoc task. The search topics were split into a head-query diversity ranking subset and a tail-query single intent subset. The collection is ClueWeb12. Because the pools for this track were originally quite shallow, if resources allow we will pool CENTRE submissions and do new relevance judgments.
Task 2 will attempt to reproduce the runs from the University of Delaware (Fang) group from TREC Web Track 2013:
[Yang et al, 2013] Yang, P., and Fang, H. (2013). Evaluating the Effectiveness of Axiomatic Approaches in Web Track. In TREC 2013.
The University of Delaware group's system takes an initial retrieval from Indri and reranks it using models from the axiomatic approach to IR. Two runs are described in the paper, and participants in task 2 will need to attempt to recreate both of them.
Resources for this task:
Task 3: Web Track 2014
The Web Track in 2014 featured an adhoc task identical to the 2013 formulation of the task (the changes in the track were on the risk sensitivity metrics), so see above.
Task 3 will attempt to reproduce runs from the Terrier team at UGlagow:
[McCreadie et al, 2014] McCreadie, R., Deveaud, R., Albakour, M., Mackie, S., Limsopatham, N., Macdonald, C., Ounis, I., and Thonet, T. (2014). University of Glasgow at TREC 2014: Experiments with Terrier in Contextual Suggestion, Temporal Summarisation and Web Tracks. In TREC 2014.
The Glasgow group submitted three runs to the adhoc task, each using a learning-to-rank techniques which are implemented to some degree in released versions of Terrier.
Resources for this task:
Submission requirements
Runs for all tasks must be submitted in the classic TREC format:
30 Q0 ZF08-175-870 0 4238 prise1
30 Q0 ZF08-306-044 1 4223 prise1
30 Q0 ZF09-477-757 2 4207 prise1
30 Q0 ZF08-312-422 3 4194 prise1
30 Q0 ZF08-013-262 4 4189 prise1
...
Field 1 is the topic ID. Field 2 is a literal "Q0". Field 3 is the document ID. Field 4 is the rank for this document, but it is not used for scoring in trec_eval. Field 5 is the score for this documents for this topic, and is the sort field for evaluation. Field 6 is your run tag.
Your run tag should follow the format [group-id]-[target-run]-[n], where [group-id] is your group ID that you registered with for TREC, [target-run] is the runtag of the run you are replicating, and [n] is a submission count. You can submit up to 3 replication sets for each task. That is, for task 1, the SNIT group could submit 'SNIT-MRKPrfNote-1', 'SNIT-MRKPrfNote-2', 'SNIT-MRKPrfNote-3', 'SNIT-MRKUmlsSolr-1', 'SNIT-MRKUmlsSolr-2' and 'SNIT-MRKUmlsSolr-3'.
We will report the score range for replications, which you will already know because the relevance judgments and scoring scripts are linked above, with the possible wrinkle of new relevance judgments for web track replicates if resources allow. The real track result is the notebook paper.