NTCIR Temporal Information Access (Temporalia) Task

Howto and FAQ

[Important] Why do we need to freeze system development before running formal run data?

It is critical that all participants stop the system development before running formal run data, either for TQIC or TIR subtasks. Formal run data should be treated as unseen queries or topics in the evaluation. Therefore, any manual changes should NOT be made to system components or setting that are learnt from formal run queries or topics. Dry run data can be used to train or tune your system, but not formal run data. However, automatic techniques that are applied to formal run data at query time on the fly are fine.

Of course, there is nothing we can physically force you to follow this rule, but we trust our participants to conduct fair and professional experimentation with the test collection.

Please contact us if you're not sure to what extent is allowed and not.

What is the difference between automatic runs and manual runs?

A similar but different issue is that, if your runs contain any form of manual intervention to produce the output, that will classified as "manual runs" rather than "automatic runs". A typical manual run might be that a human reads the topic description and formulate a query, submit it to the system. In the submission page, you should have a selection to indicate a type of runs. Whether your run is automatic or manual, system development must be frozen before looking at formal run data, as explained above.

How to remove all tags in the body

temporalia_solrify.pl is a Perl script used by an organiser to strip all tags from the collection and save it to a document format that can be used for Solr. Even if you don't use Solr as a backend system, the script might be useful for those who want to remove the tags.

Here is a quick instruction. Assuming that you have uncompressed the collection into a folder "FOLDER", then all have to do is

perl temporalia_solrify.pl FOLDER/input.xml

This will create a new file called "input_solr.xml" under the same folder as the original input file. We would suggest to try the script with a couple of input files first. If they are successful, you can run it on all files.

How to correct some markup inconsistency

CheckSyntax.class is a Java class to turn the original document collection files into sanitised XML format. The program uses JSoup library as follows.

java -Xmx5G -cp .:jsoup-1.7.3.jar CheckSyntax INPUT_FILE > OUTPUT_FILE

Note that the output tags will be -lowercased-, the doc ids between quotes and symbols should be properly xml-escaped (" to " and so on).

Can I use external resources?

Yes, you can. However, please make sure to indicate the use of external resources when submitting a run. Please also make sure to provide the details of external resources in your report. For more detail, please read our task description.

An external resource is anything but those are generated from topic descriptions and document collection. A temporal tagger, dictionary, Wikipedia, WordNet, or any statistics calculated from other resources are external resources.

How should I compile the output of my system?

Please read the run format to understand the layout of output files and their name.