PASCAL Challenges 2

Agnostic Learning vs. Prior Knowledge Challenge

1 October 2006 - 1 August 2007

"When everything fails, ask for additional domain knowledge" is the current motto of machine learning. Therefore, assessing the real added value of prior/domain knowledge is a both deep and practical question. Most commercial data mining programs accept data pre-formatted as a table, each example being encoded as a fixed set of features. Is it worth spending time engineering elaborate features incorporating domain knowledge and/or designing ad hoc algorithms? Or else, can off-the-shelf programs working on simple features encoding the raw data without much domain knowledge put out-of-business skilled data analysts?
In this challenge, the participants are allowed to compete in two tracks:
- The “prior knowledge” track, for which they will have access to the original raw data representation and as much knowledge as possible about the data.
- The “agnostic learning” track for which they will be forced to use a data representation encoding the raw data with dummy features.

Third Recognising Textual Entailment Challenge

1 December 2006 - 1 June 2007

RTE 3 follows the same basic structure of the previous campaign, in order to facilitate the participation of newcomers and to allow "veterans" to assess the improvements of their systems in a comparable test exercise. Nevertheless, the following innovations are introduced to make the challenge more stimulating and, at the same time, to encourage collaboration between system developers:

-a limited number of longer texts, i.e. up to a paragraph- in order to move toward more comprehensive scenarios which incorporate the need for discourse analysis. However, the majority of examples will remain similar to those in the previous challenges, providing pairs with relatively short texts.
-an RTE Resource Pool has been created where contributors have the possibility to share the resources they use.
-an optional task, "Extending the Evaluation of Inferences from Texts", which explores two other tasks closely related to textual entailment: differentiating unknown from false/contradicts and providing justifications for answers.

Learning when Test and Training Inputs have Different Distributions Challenge

1 June 2005 - 30 April 2007

The goal of this challenge is to attract the attention of the Machine Learning community towards the problem where the input distributions, p(x), are different for test and training inputs. A number of regression and classification tasks are proposed, where the test inputs follow a different distribution than the training inputs. Training data (input-output pairs) are given, and the contestants are asked to predict the outputs associated to a set of validation and test inputs. Probabilistic predictions are strongly encouraged, though non-probabilitic "point" predictions are also accepted. The performance of the competing algorithms will be evaluated both with traditional losses that only take into account "point predictions" and with losses that evaluate the quality of the probabilistic predictions.

Computer-Assisted Stemmatology Challenge

6 October 2006 - 14 April 2007

Stemmatology (a.k.a. stemmatics) studies relations among different variants of a document that have been gradually built from an original by copying and modifying earlier versions. The aim of such study is to reconstruct the family tree of the variants.
We invite applications of established and, in particular, novel approaches, including but of course not restricted to hierarchical clustering, graphical modeling, link analysis, phylogenetics, string-matching, etc.

The objective of the challenge is to evaluate the performance of various approaches. Several sets of variants for different texts are provided, and the participants should attempt to reconstruct the relationships of the variants in each data-set. This enables the comparison of methods usually applied in unsupervised scenarios.

Type I and Type II Errors for Multiple Simultaneous Hyppothesis Testing Challenge

1 January 2006 - 1 February 2007

Multiple Simultaneous Hypothesis Testing is a main issue in many areas of information extraction:
• rule extraction,
• validation of genes influence,
• validation of spatio-temporal patterns extraction (e.g. in brain imaging),
• other forms of spatial or temporal data (e.g. spatial collocation rule).
• other multiple hypothesis testing,
In all above frameworks, the goal is to extract patterns such that some quantity of interest is significantly greater than some given threshold.

Letter-to-Phoneme Conversion Challenge

1 February 2006 - 31 January 2007

Letter-to-phoneme conversion is a classic problem in machine learning (ML), as it is both hard (at least for languages like English and French) and important. For non-linguists, a 'phoneme' is an abstract unit corresponding to the equivalence class of physical sounds that 'represent' the same speech sound. That is, members of the equivalence class are perceived by a speaker of the language as the 'same' phonemes: the word 'cat' consists of three phonemes, two of which are shared with the word 'bat'. A phoneme is defined by its role in distinguishing word pairs like 'bat' and 'cat'. Thus, /b/ and /k/ are different phonemes. But the /b/ in 'bat' and the /b/ in 'tab' are the same phoneme, in spite of their different acoustic realisations, because the difference between them is never used (in English) to signal a difference between minimally-distinctive word-pairs.

Although we intend to give most prominence to letter-to-phoneme conversion, the community is challenged to develop and submit innovative solutions to these related problems.