The learning track will have
two distinct phases: a learning phase and an evaluation phase. These
phases will involve planning problems drawn from two distinct
distributions: the target distribution and the bootstrap distribution.
Below these distributions are described followed by a description of
the learning and evaluation phases.
This schema is not yet finalized and we welcome feedback.
Problem Distributions: For each planning domain there will be two distinct
distributions over problem instances: the target distribution and the bootstrap distribution. The
ultimate goal of the competiton is to learn knowledge that allows a
planner to perform well on problems drawn from the target distribution.
The target distributions will be designed so as to generate problems
that are difficult for state-of-the-art non-learning planners to solve
within the evaluation timeframe. The bootstrap distribution will
generate significantly easier problems, in that they can be solved by a
number of state-of-the-art planners in a reasonable amount of time.
This distribution will be used to generate problems for the learning
phase of the competition, with the idea that they will be more
tractable to solve and learn from. It is difficult in general to
specify an exact relationship between the target and bootstrap
distributions. However, informally, the organizers will scale the
number of objects involved in the planning problems to move from the
bootstrap to target distributions, but keep other problem
characteristics the same. Since the ultimate goal is to do well on the
target distribution, we will plan to provide the learners with a set of
problems from both the bootstrap distribution and target distribution
during the learning phase. The learners are free to use problems from
either or both distributions. Naturally the set of target problems used
in the actual evaluation will not be made available to the learners
during the learning phase.
Learning Phase:
The learning phase will begin after the participants
deliver the final version of their code to the organizers. At this
point the participants must freeze their code. The tentative date for
this is June 2, 2008.
After the code freeze the organizers will distribute the set of
competition domains. Along with each domain will be a set of 30
problems drawn from the bootstrap distribution and a set of 30 problems
drawn from the target distribution, which will constitute the training
set for the learning phase. The bootstrap and target problems will be
in separate
directories. The
participants may choose to use either or both problem sets, or
choose to not use any example problems (in the case of domain
analysis).
After the domains and training problems are distributed each
participant will run their learning algorithm for each domain to
produce a "domain-specific knowledge file" for each one. The knowledge
files will then be sent to the organizers. The timeframe for running
the learning algorithms remains to be determined, but we expect to
provide participants with at least a week.
The participants must run the same learner that was submitted
during the code freeze. The organizers will randomly select domains in
which to run the learning algorithms locally to ensure that the frozen
learner produces the same knowledge as submitted by participants.
Evaluation Phase:
The organizers will conduct the evaluation phase on their local
machines. The planners will be evaluated in each domain while being
given access to the appropriate learned knowledge file. The evaluation
will be conducted on a set of problems drawn from the target
distribution. The number of problems has not yet been determined. Also,
if time permits, planners
that can run without learned knowledge files will be evaluated
without the knowledge on the same problem set. The no-knowledge
evaluation will help provide insight into the impact that learning had
for each participant. The
winners, however, will be determined based only on the results with the
learned
knowledge files.
The amount of time that each planner will be given to solve each
problem remains to be determined and depends on the final number of
systems participanting. The organizers will record both the time
required to solve
each problem and the solution quality.
Scoring:
We will crown two winners: one based on a planning-time metric and one based
on a plan-quality metric.
Planning Time Metric:
For a given problem let T* be the minimum time required by any
planner to solve the problem. (When no planner solves the problem then
we ignore it for evaluation.)
A planner that solves the problem in time T will get a score of
T*/T for the problem. Those that do not solve the problem get a score
of 0.
The planning time metric for a planner is simply the sum of
scores received over all evaluation problems.
Plan Quality Metric:
For a given problem let N* be the minimum number of actions in
any solution returned by a participant. (When no planner solves
the problem
then
we ignore it for evaluation.)
A planner that returns a solution with N actions will
get a score
of N*/N for the problem. Those that do not solve the problem get a
score
of 0.
The plan quality metric for a planner is simply the sum of
scores received over all evaluation problems.