CENTRE@CLEF 2019

Aims and Scope

The goal of CENTRE@CLEF 2019 is to run a joint CLEF/NTCIR/TREC task on challenging participants to reproduce best results of the most interesting systems submitted in previous editions of CLEF/NTCIR/TREC and to contribute back to the community the additional components and resources developed to reproduce the results.

The CENTRE@CLEF 2019 lab will offer two pilot tasks:

Task 1 - Replicability: the task will focus on the replicability of selected methods on the same experimental collections;
Task 2 - Reproducibility: the task will focus on the reproducibility of selected methods on the different experimental collections;
Task 3 - Generalizability: the task will focus on collection performance prediction and the goal is to rank (sub-)collections on the basis of the expected performance over them.

Since CENTRE is a joint CLEF/NTCIR/TREC activity, participants in Tasks 1 and 2 will be challenged to reproduce methods and systems developed in all the evaluation campaigns, i.e. CLEF will challenge against NTCIR and TREC results in addition to CLEF results.

Furthermore, CENTRE@CLEF 2019 collaborates with the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. Thus, participants in CENTRE@CLEF 2019 can consider to submit their runs also to OSIRRC 2019.

To participate in CENTRE@CLEF 2019, the groups need to register (by April, 26) at the following link:

Sign Up

Important Dates

Registration closes: April 26, 2019

Runs due from participants: May 10, 2019

Submission of participant papers: May 24, 2019

Notification of acceptance: June 14, 2019

Camera ready due: June 29, 2019

CLEF 2019 conference: September 09-12, 2019

Tasks Description

For Task 1 - Replicability and Task 2 - Reproducibility we selected the following papers from TREC Common Core Track 2017 and 2018:

[Grossman et al, 2017] Grossman, M. R., and Cormack, G. V. (2017). MRG_UWaterloo and Waterloo Cormack Participation in the TREC 2017 Common Core Track. In TREC 2017.
[Benham et al, 2018] Benham, R., Gallagher, L., Mackenzie, J., Liu, B., Lu, X., Scholer, F., Moffat, A., and Culpepper, J. S. (2018). RMIT at the 2018 TREC CORE Track. In TREC 2018.

The following table reports, for each paper, the name of the runs to be replicated and/or reproduced and the datasets and topics to be used for the replicability and reproducibility tasks.

Paper	Run Name	Replicability Task	Reproducibility Task
[Grossman et al, 2017]	WCrobust04 and WCrobust0405	New York Times Annotated Corpus, with TREC 2017 Common Core Topics	TREC Washington Post Corpus, with TREC 2018 Common Core Topics
[Benham et al, 2018]	RMITFDA4 and RMITEXTGIGADA5	TREC Washington Post Corpus, with TREC 2018 Common Core Topics	New York Times Annotated Corpus, with TREC 2017 Common Core Topics

CENTRE@CLEF 2019 teams up with OSIRRC for Task 1 and Task 2, thus runs for Task 1 and Task 2 can be submitted both to CENTRE@CLEF 2019 and OSIRRC. For further information, please have a look at submission guidelines.

Task 3 - Generalizability is a new task and will work as follows:

Training: participants need to run plain BM25 and, if they wish, also their own system on the test collection used for TREC 2004 Robust Track (they are allowed to use the corpus, topics and qrels). Participants need to identify features of the corpus and topics that allow them to predict the system score with respect to Average Precision (AP).
Validation: participants can use the test collection used for TREC 2017 Common Core Track (corpus, topics and qrels) to validate their method and determine which set of features represent the best choice for predicting AP score for each system. Note that the TREC 2017 Common Core Track topics are an updated version of the TREC 2004 Robust track topics.
Test (submission): participants need to use the test collection used for TREC 2018 Common Core Track (only corpus and topics). Note that the TREC 2018 Common Core Track topics are a mix of "old" and "new" topics, where old topics were used in TREC 2017 Common Core tracks. Participants will submit a run for each system (BM25 and their own system) and an additional file (one for each system) including the AP score predicted for each topic. The score predicted can be a single value or a value with the corresponding confidence interval.

Corpora:

The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007. The text in this corpus is formatted in News Industry Text Format (NITF), which is an XML specification that provides a standardized representation for the content and structure of discrete news articles. The dataset is available here.

The TREC Washington Post Corpus contains 608,180 news articles and blog posts from January 2012 through August 2017. The articles are stored in JSON format, and include title, byline, date of publication, kicker (a section header), article text broken into paragraphs, and links to embedded images and multimedia. The dataset is available here.

The TREC 2004 Robust Corpus corresponds to the set of documents on TREC disks 4 and 5, minus the Congressional Record. This document set contains approximately 528,000 documents. The dataset is available here.

Topics and Qrels:

TREC 2017, Common Core Track: topics, qrels;
TREC 2018, Common Core Track: topics, qrels;
TREC 2004, Robust Track: topics, qrels.

Submission Guidelines

Participating teams should satisfy the following guidelines:

Preferably use an open source IR system, e.g. Lucene, Terrier, Indri, Atire;
The runs should be submitted in TREC format;
For the replicability task, each group can submit a maximum of three runs replicating the given original run;
For the reproducibility task, each group can submit just one run reproducing the given original run;
The code used to produce the runs should be uploaded in a Bitbucket repository provided by the organizers upon the registration to CENTRE@CLEF 2019.

Trec Format:

Runs should be submitted with the following format:


30 Q0 ZF08-175-870  0 4238 prise1
30 Q0 ZF08-306-044  1 4223 prise1
30 Q0 ZF09-477-757  2 4207 prise1
30 Q0 ZF08-312-422  3 4194 prise1
30 Q0 ZF08-013-262  4 4189 prise1
...

where:

Columns are separated by a white space;
The first column is the topic number;
The second column is the query number within that topic, which is currently unused and should always be Q0;
The third column is the official document number of the retrieved document, which is the number found in the "docno" field of the document;
The fourth column is the rank at which the document is retrieved;
The fifth column shows the score (integer or floating point) that generated the ranking. This score must be in descending (non-increasing) order. It is important to include the score so that we can handle tied scores (for a given run) in a uniform fashion (trec_eval sorts documents by these scores, not your ranks);
The sixth column is called the "run tag" and should be an unique identifier for your group AND for the method used. That is, each run should have a different tag that identifies the group and the method that produced the run.

It is important to include all the columns and have a white space delimiter between the columns.

Submission Upload:

Runs should be uploaded in the Bitbucket repository provided by the organizers. Participants need to create a folder named submission containing 2 folders named official and unofficial. Each sub-folder will contain 3 folders, one for each task, named: task1, task2, and task3. Thus, a run submitted for replicability should be included in submission/official/task1.

Runs should be uploaded with the following name convention: <teamname>_<task>_<system>_<freefield> where teamname is the name of the participating team, task can be either task1, task2, or task3 depending on the task, system is the name of the reproduced/replicated run or the system used for Task 3, and freefield is a free field that participants can use as they prefer.

Finally, CENTRE@CLEF 2019 is teaming up with the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. Participating groups can consider to submit their runs (for Task 1 and Task 2) both to CENTRE@CLEF 2019 and OSIRRC 2019, where the second venue requires to submit the runs as Docker images. For further information you can have a look at OSIRRC website and please contact OSIRRC organizers before submitting CENTRE runs to the OSIRRC challenge.

Evaluation:

The quality of the submitted runs will be evaluated with different measures depending on the task:

Task 1 - Replicability:

Effectiveness: how close are the performance scores of the reproduced systems to those of the original ones. This will be measured using the RMSE between the new and original MAP scores;
Correlation: since different result lists may produce the same effectiveness score, we will also measure how close are the ranked results list of the reproduced systems to those of the original ones. This will be measured using Kendall’s tau.

Task 2 - Reproducibility:

Improvement over the baseline: since for the reproducibility task we do not have the original run, we will compare the reproduced run score with a baseline score to see whether the improvement over the baseline is comparable between the original and the new dataset.

Task 3 - Generalizability:

Absolute error: we will compare the predicted run score with the original run score. This will be measured with Mean Absolute Error and RMSE between the predicted and original MAP scores.

Organizers

Nicola Ferro, University of Padua, Italy
ferrodei.unipd.it

Norbert Fuhr, University of Duisburg-Essen, Germany
norbert.fuhruni-due.de

Maria Maistro, University of Padua, Italy
maistrodei.unipd.it
and University of Copenhagen, Denmark
mmdi.ku.dk

Tetsuya Sakai, Waseda University, Japan
tetsuyasakaiacm.org

Ian Soboroff, National Institute of Standards and Technology (NIST), US
ian.soboroffnist.gov