Perturbation task

Mutant perturbation prediction

Predict how a mutant embryo develops differently from wild type — both gene expression and 3D spatial organization — under genetic knock-outs the model has never seen. Train on one perturbation, validate on a second, test on a third with a different collection time. This is the causal-generalisation core of the challenge.

Modality

3D MERFISH from genetically perturbed embryos plus matched wild-type controls. Three knock-outs across two developmental time points: β-catenin (E8.75), Gata4 (E8.75), Mab21l2 (E9.5). All three are cardiac developmental regulators with characterised loss-of-function phenotypes.

Dataset splits

What the model gets, what it must predict

Trainβ-catenin KO at E8.75 + wild-type controls

One observed perturbation. Wild-type MERFISH from the full window is also available as background.

ValGata4 KO at E8.75

Same developmental time as training perturbation but a different gene. Tests within-stage perturbation generalisation.

TestMab21l2 KO at E9.5

Different gene AND different developmental stage from training. Tests full perturbation × time generalisation. Ground truth hidden until competition close.

Input

Wild-type MERFISH from the full developmental window plus the training mutant (β-catenin at E8.75) paired with its wild-type control. The model is asked to predict the held-out mutant spatial snapshot given only the gene identity and the matched wild-type stage data.

Output

A 4D prediction of the mutant embryo: per-cell (x, y, z), predicted expression, predicted cell-type composition. Same format as Task 2.

Evaluation

Metrics & formulas

Final ranking is a composite over these sub-scores; sub-scores are also reported individually so participants can diagnose where their model is strong and weak. Hidden labels are never released; all evaluation runs on the organisers' platform.

Per-perturbation pseudobulk Pearson

r (x_{mut}^{pred}, y_{mut}^{obs}) = \frac{cov ( x _{mut}^{pred} , y _{mut}^{obs} )}{σ _{x_{mut}^{pred}} σ _{y_{mut}^{obs}}}

Restricted to the cell types affected by the perturbation. Reported globally and per affected cell type. Per-perturbation reporting is the headline scoreboard for this task.

Cell-type composition shift (Jensen-Shannon divergence)

JSD_{2} (p_{mut}^{pred}, p_{mut}^{obs}) = \frac{1}{2} KL (p_{mut}^{pred} ∥ m) + \frac{1}{2} KL (p_{mut}^{obs} ∥ m)

Base-2 JSD between predicted and observed mutant cell-type proportions. Uses the same frozen probe classifier as Task 2; reported at global and regional levels.

Fused Gromov-Wasserstein

FGW_{α} (P_{mut}, Q_{mut})

Same definition as Task 2, applied to the mutant snapshot. Penalises predictions that miss the spatial pattern of the perturbation (e.g. heart-field shape change).

WT-vs-mutant change consistency

cos θ = \frac{Δ ^{pred} \cdot Δ ^{obs}}{Δ ^{pred} Δ ^{obs}}, Δ = pseudobulk_{mut} - pseudobulk_{wt}

Rewards models that get the direction of the perturbation right, even when the absolute magnitudes are off.

Submission format

File contract

Submissions are validated against this schema before scoring; mismatches result in a validation error rather than a low score. The starter kit ships a self-check script.

Format: AnnData .h5ad — same schema as Task 2 spatial submission
Perturbation: .uns["perturbation"] = "Mab21l2_KO" (test set)
Stage: .uns["stage"] = "E9.5" (test set)
Cells: Match released schema for the held-out mutant condition
Coordinates: µm, embryo-centred, axes documented

Baselines

Starting points the starter kit will provide

▸WT-copy baseline: predict mutant = matched-stage wild type (sanity floor).
▸Linear perturbation delta: learned (mut − wt) shift on β-catenin transferred to test conditions.
▸Causal perturbation autoencoder (CPA-style) with gene-embedding conditioning.
▸Diffusion model on (position, expression) with perturbation-token conditioning.
▸LLM-prompted retrieval: condition on perturbation gene function and known phenotype from literature.

Why it's hard

!Only one observed perturbation in training — models must factor the wild-type developmental trajectory from the genetic-perturbation effect with very little supervision.
!Test perturbation is at a different stage (E9.5 vs E8.75 training) so any time-confounded perturbation representation will overfit.
!The three KO genes target different cardiac developmental regulators with non-overlapping affected cell types — a model that just averages perturbation effects will be wrong everywhere.
!Spatial perturbation phenotypes are not always cell-autonomous: a KO can change where cells are without dramatically changing their expression, and FGW exposes that.

Common pitfalls

×Treating β-catenin as a "generic perturbation" and copying its delta to other genes — will fail badly on FGW for Mab21l2.
×Ignoring the E9.5 stage shift in the test set — predicting an E8.75-shaped embryo at E9.5.
×Hand-engineering a gene-function feature from public databases without crediting the source — reproducibility checks will catch this.
×Reporting WT-vs-mutant deltas globally without controlling for embryo-to-embryo variability.

Other tasks

T1Temporal

Temporal gene-expression distribution prediction

T2Spatial-temporal

Spatial-temporal multiscale prediction

Have questions?

Detailed task documents, the starter-kit repo, the submission portal, and the discussion forum land alongside the P1 launch (2026-07-30). Until then, reach the organisers at [email protected].