As part of the Incident Streams track – in its 4th year at TREC – we have put together a leaderboard for this summer, and it’s now available via GitHub ( and with a site ( 

At a high level, this leaderboard measures performance of machine learning systems in classifying and prioritizing tweets posted during disasters. For training, we provide a large set of manually labeled social media content, and are maintaining a held-out test set for evaluation. Over the past four years, both Incident Stream organizers and participants have had good success in publishing at the annual ISCRAM conference, so this leaderboard is a good opportunity to engage with that community.

To participate, download the training data, labels, and 2021-A events from the website, run your system on the 2021-A data, and produce a run file according to the directions on GitHub  Then, you add those files to our repo and submit a pull-request with your system’s output, and we’ll run the evaluation scripts and update the leaderboard with your results. You can do anonymized runs if you like as well!

We will update the leaderboard as new submissions come in, though we do request you limit yourselves to no more than 2 runs in any given period of 7 days. 

In August, we intend to make the manual assessments we’ve used for evaluation available to all participants as we have in prior years in preparation for the official 2021-B run. That run will culminate the 2021 TREC-IS year, with final results presented at TREC in November.

We’re looking forward to your participation, and I’m of course happy to answer any questions you may have about the leaderboard, anonymous submissions, etc.

