CENTRE@TREC 2018

CLEF   NTCIR   TREC
REproducibility

Aims and Scope

The goal of CENTRE@TREC 2018 is to run a joint CLEF/NTCIR/TREC task on challenging participants to reproduce best results of the most interesting systems submitted in previous editions of CLEF/NTCIR/TREC and to contribute back to the community the additional components and resources developed to reproduce the results.

The CENTRE@TREC 2018 track will target three replication scenarios:

  • Task 1 - Replicating runs from the Merck group participation in the TREC 2016 Clinical Decision Support track.
  • Task 2 - Replicating runs from the University of Delaware (Fang) group in the TREC 2013 Web track.
  • Task 3 - Replicating runs from the University of Glasgow group in the TREC 2014 Web track.

Tasks 2 and 3 are also replication tasks in the CENTRE@CLEF edition of the track, which will offer some opportunities for crossover between TREC and CLEF

Of particular interest in CENTRE@TREC is now the replication is to be reported. There will be a specified notebook paper format, and between the TREC meeting in November and the final TREC proceedings, participants in the track will review each others' papers for clarity.

If you have not already signed up to participate in TREC, you should sign up following the link below.

Sign Up

Important Dates

Runs for all tasks due from participants: September 24, 2018

TREC Notebook papers due: TBA (October sometime)

TREC 2018 conference: November 14-16, 2018

Guidelines

The goal in each task is the same: to replicate results described in the paper for that task. Specific resources and information are given below in each task. Participants may choose to take part in one, two or all three tasks. Participants must also post their code to the CENTRE Bitbucket repository, and follow the track guidelines for the format of the notebook paper.

Bitbucket repository

Bitbucket is basically just commercial Github, so if you are used to Github, Bitbucket will be very familiar. To request your repo, contact INFO HERE. Your code must be submitted to Bitbucket prior to the run submission deadline. For convenience, please put a tag in the repository indicating the submitted version of your code; otherwise, we will be forced to assume the latest commit prior to the deadline (Not good for replicability!).

Notebook paper format

Notebook papers should follow this general outline:

Task 1: Clinical Decision Support 2016

The TREC Clinical Decision Support (CDS) track featured an adhoc task where queries took the form of a patient note from a health record and a query "goal": determine a diagnosis, identify a treatment, or propose a test. Systems search medical journal articles for relevant information for that patient's clinical situation.

Task 1 will attempt to reproduce the runs from the MERCKKGAA group from the TREC 2016 Clinical Decision Track:
[Gurulingappa et al, 2016] Gurulingappa, H., Toldo, L., Schepers, C., Bauer, A., and Megaro, G. (2016) Semi-Supervised Information Retrieval System for Clinical Decision Support. In TREC 2016.
The MERCKKGAA group runs are built on a Solr base, but add pseudorelevance feedback, query expansion using UMLS, and a supervised learning-to-rank model. Four runs are described in the paper, and participants in task 1 can choose to recreate two or more of them. The metrics for the CDS track were P@10, R-precision, inferred NDCG, and inferred AP.

Resources for this task:

Task 2: Web Track 2013

The Web Track in 2013 had an adhoc web search task with an unusual measurement approach: risk-sensitivity, which is the tradeoff between improving the overall score for a system against performing poorly on individual topics. We will be ignoring the risk-sensitivity portion of the task to focus on the traditional adhoc task. The search topics were split into a head-query diversity ranking subset and a tail-query single intent subset. The collection is ClueWeb12. Because the pools for this track were originally quite shallow, if resources allow we will pool CENTRE submissions and do new relevance judgments.

Task 2 will attempt to reproduce the runs from the University of Delaware (Fang) group from TREC Web Track 2013:
[Yang et al, 2013] Yang, P., and Fang, H. (2013). Evaluating the Effectiveness of Axiomatic Approaches in Web Track. In TREC 2013.
The University of Delaware group's system takes an initial retrieval from Indri and reranks it using models from the axiomatic approach to IR. Two runs are described in the paper, and participants in task 2 will need to attempt to recreate both of them.

Resources for this task:

Task 3: Web Track 2014

The Web Track in 2014 featured an adhoc task identical to the 2013 formulation of the task (the changes in the track were on the risk sensitivity metrics), so see above.

Task 3 will attempt to reproduce runs from the Terrier team at UGlagow:
[McCreadie et al, 2014] McCreadie, R., Deveaud, R., Albakour, M., Mackie, S., Limsopatham, N., Macdonald, C., Ounis, I., and Thonet, T. (2014). University of Glasgow at TREC 2014: Experiments with Terrier in Contextual Suggestion, Temporal Summarisation and Web Tracks. In TREC 2014.

The Glasgow group submitted three runs to the adhoc task, each using a learning-to-rank techniques which are implemented to some degree in released versions of Terrier.

Resources for this task:

Submission requirements

Runs for all tasks must be submitted in the classic TREC format:


30 Q0 ZF08-175-870  0 4238 prise1
30 Q0 ZF08-306-044  1 4223 prise1
30 Q0 ZF09-477-757  2 4207 prise1
30 Q0 ZF08-312-422  3 4194 prise1
30 Q0 ZF08-013-262  4 4189 prise1
...
				  
Field 1 is the topic ID. Field 2 is a literal "Q0". Field 3 is the document ID. Field 4 is the rank for this document, but it is not used for scoring in trec_eval. Field 5 is the score for this documents for this topic, and is the sort field for evaluation. Field 6 is your run tag.

Your run tag should follow the format [group-id]-[target-run]-[n], where [group-id] is your group ID that you registered with for TREC, [target-run] is the runtag of the run you are replicating, and [n] is a submission count. You can submit up to 3 replication sets for each task. That is, for task 1, the SNIT group could submit 'SNIT-MRKPrfNote-1', 'SNIT-MRKPrfNote-2', 'SNIT-MRKPrfNote-3', 'SNIT-MRKUmlsSolr-1', 'SNIT-MRKUmlsSolr-2' and 'SNIT-MRKUmlsSolr-3'.

We will report the score range for replications, which you will already know because the relevance judgments and scoring scripts are linked above, with the possible wrinkle of new relevance judgments for web track replicates if resources allow. The real track result is the notebook paper.

Organizers

Nicola Ferro, University of Padua, Italy
ferrodei.unipd.it

Tetsuya Sakai, Waseda University, Japan
tetsuyasakaiacm.org

Ian Soboroff, National Institute of Standards and Technology (NIST), US
ian.soboroffnist.gov