CENTRE@CLEF 2019

CLEF   NTCIR   TREC
REproducibility

Aims and Scope

The goal of CENTRE@CLEF 2019 is to run a joint CLEF/NTCIR/TREC task on challenging participants to reproduce best results of the most interesting systems submitted in previous editions of CLEF/NTCIR/TREC and to contribute back to the community the additional components and resources developed to reproduce the results.

The CENTRE@CLEF 2019 lab will offer two pilot tasks:

  • Task 1 - Replicability: the task will focus on the replicability of selected methods on the same experimental collections;
  • Task 2 - Reproducibility: the task will focus on the reproducibility of selected methods on the different experimental collections;
  • Task 3 - Generalizability: the task will focus on collection performance prediction and the goal is to rank (sub-)collections on the basis of the expected performance over them.

Since CENTRE is a joint CLEF/NTCIR/TREC activity, participants in Tasks 1 and 2 will be challenged to reproduce methods and systems developed in all the evaluation campaigns, i.e. CLEF will challenge against NTCIR and TREC results in addition to CLEF results.

Furthermore, CENTRE@CLEF 2019 collaborates with the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. Thus, participants in CENTRE@CLEF 2019 can consider to submit their runs also to OSIRRC 2019.

To participate in CENTRE@CLEF 2019, the groups need to register (by April, 26) at the following link:

Sign Up

Important Dates

Registration closes: April 26, 2019

Runs due from participants: May 10, 2019

Submission of participant papers: May 24, 2019

Notification of acceptance: June 14, 2019

Camera ready due: June 29, 2019

CLEF 2019 conference: September 09-12, 2019

Tasks Description

For Task 1 - Replicability and Task 2 - Reproducibility we selected the following papers from TREC Common Core Track 2017 and 2018:

The following table reports, for each paper, the name of the runs to be replicated and/or reproduced and the datasets and topics to be used for the replicability and reproducibility tasks.

Paper Run Name Replicability Task Reproducibility Task
[Grossman et al, 2017] WCrobust04 and WCrobust0405 New York Times Annotated Corpus, with TREC 2017 Common Core Topics TREC Washington Post Corpus, with TREC 2018 Common Core Topics
[Benham et al, 2018] RMITFDA4 and RMITEXTGIGADA5 TREC Washington Post Corpus, with TREC 2018 Common Core Topics New York Times Annotated Corpus, with TREC 2017 Common Core Topics

CENTRE@CLEF 2019 teams up with OSIRRC for Task 1 and Task 2, thus runs for Task 1 and Task 2 can be submitted both to CENTRE@CLEF 2019 and OSIRRC. For further information, please have a look at submission guidelines.

Task 3 - Generalizability is a new task and will work as follows:

  1. Training: participants need to run plain BM25 and, if they wish, also their own system on the test collection used for TREC 2004 Robust Track (they are allowed to use the corpus, topics and qrels). Participants need to identify features of the corpus and topics that allow them to predict the system score with respect to Average Precision (AP).
  2. Validation: participants can use the test collection used for TREC 2017 Common Core Track (corpus, topics and qrels) to validate their method and determine which set of features represent the best choice for predicting AP score for each system. Note that the TREC 2017 Common Core Track topics are an updated version of the TREC 2004 Robust track topics.
  3. Test (submission): participants need to use the test collection used for TREC 2018 Common Core Track (only corpus and topics). Note that the TREC 2018 Common Core Track topics are a mix of "old" and "new" topics, where old topics were used in TREC 2017 Common Core tracks. Participants will submit a run for each system (BM25 and their own system) and an additional file (one for each system) including the AP score predicted for each topic. The score predicted can be a single value or a value with the corresponding confidence interval.

Corpora:

The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007. The text in this corpus is formatted in News Industry Text Format (NITF), which is an XML specification that provides a standardized representation for the content and structure of discrete news articles. The dataset is available here.

The TREC Washington Post Corpus contains 608,180 news articles and blog posts from January 2012 through August 2017. The articles are stored in JSON format, and include title, byline, date of publication, kicker (a section header), article text broken into paragraphs, and links to embedded images and multimedia. The dataset is available here.

The TREC 2004 Robust Corpus corresponds to the set of documents on TREC disks 4 and 5, minus the Congressional Record. This document set contains approximately 528,000 documents. The dataset is available here.

Topics and Qrels:

Submission Guidelines

Participating teams should satisfy the following guidelines:

Trec Format:

Runs should be submitted with the following format:


30 Q0 ZF08-175-870  0 4238 prise1
30 Q0 ZF08-306-044  1 4223 prise1
30 Q0 ZF09-477-757  2 4207 prise1
30 Q0 ZF08-312-422  3 4194 prise1
30 Q0 ZF08-013-262  4 4189 prise1
...
   						
where: It is important to include all the columns and have a white space delimiter between the columns.

Submission Upload:

Runs should be uploaded in the Bitbucket repository provided by the organizers. Participants need to create a folder named submission containing 2 folders named official and unofficial. Each sub-folder will contain 3 folders, one for each task, named: task1, task2, and task3. Thus, a run submitted for replicability should be included in submission/official/task1.

Runs should be uploaded with the following name convention: <teamname>_<task>_<system>_<freefield> where teamname is the name of the participating team, task can be either task1, task2, or task3 depending on the task, system is the name of the reproduced/replicated run or the system used for Task 3, and freefield is a free field that participants can use as they prefer.

Finally, CENTRE@CLEF 2019 is teaming up with the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. Participating groups can consider to submit their runs (for Task 1 and Task 2) both to CENTRE@CLEF 2019 and OSIRRC 2019, where the second venue requires to submit the runs as Docker images. For further information you can have a look at OSIRRC website and please contact OSIRRC organizers before submitting CENTRE runs to the OSIRRC challenge.

Evaluation:

The quality of the submitted runs will be evaluated with different measures depending on the task:

Organizers

Nicola Ferro, University of Padua, Italy
ferrodei.unipd.it

Norbert Fuhr, University of Duisburg-Essen, Germany
norbert.fuhruni-due.de

Maria Maistro, University of Padua, Italy
maistrodei.unipd.it
and University of Copenhagen, Denmark
mmdi.ku.dk

Tetsuya Sakai, Waseda University, Japan
tetsuyasakaiacm.org

Ian Soboroff, National Institute of Standards and Technology (NIST), US
ian.soboroffnist.gov